Are you interested how we managed to combine Mutual TLS, The Update Framework, Over The Air updates and Release on Demand in one feature? And also how did that help Foundries.io to provide an easy and controlled way for our customers to provision secure updates to millions of devices over public networks?
This blog post introduces a FoundriesFactory Waves feature that will be publicly available at the start of Q2 2021.
The current method Foundries.io uses to rollout updates to your Factory devices is simple: devices poll the server to see if there is a new update every few minutes (10 by default). If there is an update - it is applied. This simplicity is good for a small number of devices but isn’t that good at scale. It can pose challenges for an operator to have thousands or even millions of devices update at once.
Another potential issue is that the CI/CD system (used by your Factory) produces tagged Targets which devices may update to. Without additional protection it is therefore possible that an accidental Factory configuration mistake could unexpectedly update your Factory's fleet of devices. Again, that simplicity is good for a small company but does not work as good at scale.
For larger deployments, phased rollouts are preferred where an update is gradually released to devices in the field. At each stage the rollout may pause for a period of time so that the operator can look at their backend for deployment issues. If things go wrong, an operator can cancel the rollout before the entire fleet of devices is affected.
This idea is closely tied to a CI/CD process and DevOps culture and is usually named as a Release on Demand. Now, powered by TUF and OTA best practices, we have developed a new feature - FoundriesFactory Waves.
First, we need to define an audience fleet of devices that need to be subject to a Release on Demand policy i.e. use the FoundriesFactory Waves feature.
For "test" devices an ability to instantly update to a new target as soon as it is produced by a CI/CD process is neat, and we don't want to abandon that. Thus, we have defined the concept of "Production Devices" i.e. those devices that should only receive authoritative updates which are carefully selected by authorized personnel. Those can be end-user devices or mission critical devices inside your laboratory; or other business use cases not covered here.
In the previous blog post about FoundriesFactory Device Gateway PKI we described how Foundries.io envisions a process of generating secure client certificates for devices in order to leverage the full power of mutual TLS in the most secure way. Within the scope of this blog post I assume this as a preferred way to register Production devices.
In the scope of a Waves feature we've added new option
PRODUCTION=ON to our lmp-device-register tool, used to register devices (shipped as a part of the Foundries.io LmP). When set this option will inject a special attribute into all device client certificates designating such devices into a Production category. This way production devices are easily defined, and that information is extremely hard to alter as it is protected by RSA and a Factory's root certificate chain which is owned by you.
Once we have a distinct group of Test versus Production devices - let's talk about creating a Wave.
A wave is a special object inside your Factory which allows you to specify which of your CI Targets (built by Foundries.io CI/CD) should be the next version used to Release on Demand to your Factory's production devices. Note that one CI targets version can contain one or more individual targets e.g. if your Factory supports several architectures.
A new Wave can be created using a
fioctl wave init <wave-name> <version> [<tag>] command. For example,
fioctl wave init beta-42 42 beta creates a new wave with name "beta-42". It only contains selected target(s) for a "beta" tag: those that were provisioned to production devices using previous waves for a "beta" tag, and newly added target(s) with version 42. Newly added target(s) are re-tagged to only contain a tag "beta". Note that existing CI targets are not altered, a Wave is a separate copy of targets specially crafted to be delivered to specific production devices.
At this point no updates are yet delivered to Production devices, but new TUF roles are added to the Foundries.io OTA backend and associated with this Wave, ready to be deployed on demand.
In the previous blog post we introduced FoundriesFactory Device Groups - a powerful utility to control your devices configuration in a managed way. This is now the second use of Device Groups, in the context of Waves.
With the help of Device Groups an operator can granularly control how a certain update is being provisioned to production devices in the scope of a single Wave. For that, an operator can run a
fioctl wave rollout <wave-name> <group-name> command, and all devices in that group will start receiving updates present in a given Wave's TUF roles.
For example, a
fioctl wave rollout beta-42 beta-emea will rollout target(s) of version 42 (as defined in a Wave "beta-42") to all devices in a Device Group "beta-emea" via a secure OTA protocol. When an operator is happy with how an update worked out on this group, they can now run a
fioctl wave rollout beta-42 beta-apac in order to provision Wave target(s) to the next device group.
A process of rolling out a Wave is completely under an operator's control. The Foundries.io OTA servers will ensure that updates associated with a Wave are only delivered to the correct Production devices, as defined by an operator.
When an operator is happy with how an update progresses, they can decide to make an update generally available to all Production devices. That can be achieved by running a
fioctl wave complete <wave-name> command. At this time a Wave is no longer active and its target(s) are available to all devices that are configured to fetch updates for a given Wave's Tag.
On the other hand, an operator may decide that an update is not performing as expected and stop further provisioning by running
fioctl wave cancel <wave-name>. At this point a Wave is specially marked in your Factory, so that it will no longer deliver updates to Production devices.
To better understand Tags, please, first look at What is a Target blog post as well as a series of posts for Foundries.io aktualizr-lite daemon (that enables OTA updates on the device side) here and here.
For production you may choose different strategies for using Tags. Below we describe two viable alternatives, although the potential for Tags in production is much bigger and this topic deserves a separate blog bost.
For small companies it's enough to stick to the default "master" tag i.e. use a tagless version of command
fioctl wave init <wave-name> <version>. This is suitable if all your Production devices are the same and you need no categorization for them other than a simple split onto Device Groups.
Many companies, however, would like to either deliver special devices which should receive a slightly different set of targets, or want to deliver updates separately for different audiences. One such scenario is when a company wants Beta users to receive updates long before regular users, potentially several Waves ahead. This is easily achieved, if a company configures those Beta users' Production devices to receive updates from a "beta" tag and all other Production devices from a "master" or "public" tag. Now, an operator can deliver several updates to Beta users before delivering a single update to the public:
wave init beta-42 42 beta wave rollout beta-42 emea wave rollout beta-42 apac wave rollout beta-42 americas wave complete beta-42 wave init beta-43 43 beta wave rollout beta-43 emea # Something went wrong with this update wave cancel beta-43 wave init beta-44 44 beta wave rollout beta-44 emea wave rollout beta-44 apac wave complete beta-44 # Only now, deliver an old update to general public # while Beta users are ahead in time receiving newer updates. wave init ga-42 42 public wave rollout ga-42 emea wave rollout ga-42 apac wave rollout ga-42 americas wave complete ga-42
At Foundries.io we like simplicity and convenience. For that reason there can only be one active wave at a time. To some it might seem to be a limitation, but the whole idea is to prevent an operator from unwanted mistakes and unnecessary effort associated with managing several simultaneous deployments at once. This way we encourage an operator to plan ahead and schedule each Wave for a dedicated accurate time slot in a robust start-complete-repeat process.
The FoundriesFactory Waves feature is powerful and will need more than a single blog post to fully explain. In future blog posts and documentation we will go into more detail on the following topics: