Future Technology

Versatile Steady Integration for iOS

9 min read
Michael Bachand
The Airbnb Tech Blog

How Airbnb leverages AWS, Packer, and Terraform to replace macOS on a whole lot of CI machines in hours as an alternative of days

A person leans over the edge of a balcony. In the background are trees.

By: Michael Bachand, Xianwen Chen

At Airbnb, we run a complete suite of steady integration (CI) jobs earlier than every iOS code change is merged. These jobs be certain that the principle department stays secure by executing crucial developer workflows like constructing the iOS software and operating exams. We additionally schedule jobs that carry out periodic duties like reporting metrics and importing artifacts.

A lot of our iOS CI jobs execute on Macs, which permits operating developer instruments offered by Apple. CI jobs for all different platforms at Airbnb execute in containers on Amazon EC2 Linux cases. To meet the macOS requirement of iOS CI jobs we’ve traditionally maintained alternate CI infrastructure outdoors of AWS particularly for iOS improvement. The introduction of Macs to AWS offered a chance for us to rethink our method to iOS CI.

We designed the following iteration of our iOS CI system in late 2021, completed the migration to the brand new system in mid 2022, and polished the system by way of the tip of 2022. CI for iOS and all different platforms at Airbnb already leveraged Buildkite for dispatching jobs. Now, we deploy iOS CI infrastructure to AWS utilizing Terraform, which helps align CI for iOS with CI for different platforms at Airbnb.

On this article, we’re excited to share with you particulars of the versatile and easy-to-maintain iOS CI system that we’ve applied with Amazon EC2 Mac cases.

Traditionally we ran Airbnb iOS CI on bodily Macs. We loved the pace of operating CI with out virtualization however we paid a considerable upkeep price to run CI jobs immediately on bodily {hardware}. An iOS infrastructure engineer individually logged into over 300 machines to carry out administrative duties like enrolling the Mac in our MDM (Cellular Machine Administration) device and upgrading macOS. Guide upkeep necessities restricted the scalability of the fleet and consumed engineer time that may very well be higher spent on higher-value initiatives.

A screenshot of a macOS desktop with many open VNC sessions to remote Mac machines.
An engineer remotely updates a number of bodily Macs to macOS Large Sur. EC2 macOS AMIs have eradicated this guide work.

Our outdated CI machines had been hardly ever restarted and too usually drifted into a foul state. When this occurred, the best-case situation was that an engineer may log into the machine, diagnose what configuration drift was inflicting points, and manually convey the machine again to an excellent state. Extra generally, we shut down the corrupted machine in order that it may not settle for new CI jobs. Periodically, we requested the seller who managed our bodily Macs to revive the corrupted machines to a clear set up of macOS. When the machines finally got here again on-line, we manually re-enrolled every machine in MDM to convey our fleet again to its full capability.

Updating to a brand new model of Xcode was fairly error-prone as effectively. We attempt to roll out new Xcode variations repeatedly since many iOS engineers at Airbnb comply with Swift and Xcode releases intently and are desperate to undertake new language options and IDE enhancements. Nevertheless, the fastened capability of our Mac fleet made it tough for us to confirm iOS CI jobs completely towards new variations; any machine allotted to testing a brand new model of Xcode may not settle for CI jobs from the earlier Xcode model. The chance of tackling every Xcode replace was elevated by the truth that rolling again to a earlier model of Xcode throughout our fleet was not sensible.

When evaluating AWS, we had been excited by the potential of launching cases from Amazon Machine Photographs (AMIs). An AMI is a snapshot of an occasion’s state, together with its file system contents and different metadata. Amazon gives base AMIs for every macOS model and permits clients to create their very own AMIs from operating cases.

AMIs enable us so as to add new cases to our fleet with out human intervention. An EC2 Mac bare-metal occasion launched from a correctly configured AMI is straight away prepared to just accept new work after initialization. When updating macOS, we not have to log into each machine in our fleet. As an alternative, we log right into a single occasion launched from the Amazon base AMI for the brand new macOS model. After performing a handful of guide configuration steps, like enabling automatic login, we create an Airbnb base AMI from that occasion.

Initially, we powered our EC2 Mac fleet with manually created AMIs. An engineer would configure a single occasion and create an AMI from that occasion’s state. Then we may launch any variety of extra cases from that AMI. This was a significant enchancment over managing bodily machines since we may spin up a whole fleet of equivalent cases after configuring solely a single occasion efficiently.

Now, we build AMIs using Packer. Packer programmatically launches and configures an EC2 occasion utilizing a template outlined within the HashiCorp configuration language (HCL). Packer then creates an AMI from the configured EC2 occasion. A Ruby wrapper script invokes Packer persistently and performs useful validations like checking that the consumer has assumed the correct AWS function. We test the HCL template code into supply management and all modifications to our Packer template and companion scripts are made by way of GitHub pull requests.

Timing statistics for creating a brand new Arm AMI with Packer. This command ran on an EC2 mac2.steel occasion.

We initially ran Packer from developer laptops, however the laptop computer wanted to be awake and on-line all through the Packer construct. Ultimately, we created a devoted pipeline to construct AMIs within the cloud. A developer can set off a brand new construct on this pipeline with a few clicks. A profitable construct will produce freshly baked and verified AMIs for each the x86 and Arm (Apple Silicon) CPU architectures inside just a few hours.

Our new CI system leveraging these AMIs consists of many environments, every of which might be managed independently. The central AWS element of every CI atmosphere is an Auto Scaling group, which is accountable for launching the EC2 Mac cases. The variety of cases within the Auto Scaling group is set by the desired capacity property on the group and is bounded by min and max measurement properties.

An Auto Scaling group creates new cases utilizing a launch template. The launch template specifies the configuration of every occasion, together with the AMI, and permits a “consumer information” script to run when the occasion is launched. Launch templates might be versioned, and every Auto Scaling group is configured to launch cases from a particular model of its launch template.

Though the introduction of environments has made our CI topology extra complicated, we discover that complexity manageable when our infrastructure is outlined in code. All of our AWS infrastructure for iOS CI is laid out in Terraform code that we test into supply management. Every time we merge a pull request associated to iOS CI, Terraform Enterprise will robotically apply our modifications to our AWS account. We now have outlined a Terraform module that we are able to name at any time when we need to instantiate a brand new CI atmosphere.

Calling a Terraform module to create a CI atmosphere of Arm Mac Minis with Xcode 14.2 put in.

An inner scaling service manages the specified capability of every atmosphere’s Auto Scaling group. This service, a modified fork of buildkite-agent-scaler, will increase the specified capability of an atmosphere’s Auto Scaling group as CI job quantity for that atmosphere will increase. We specify a most variety of cases for every CI atmosphere partly as a result of On-Demand EC2 Mac Devoted Hosts presently have a minimal host allocation and billing period of 24 hours.

A diagram showing the relationship between CI environments, the scaling service, and Buildkite.
A sketch of Airbnb’s new iOS CI system.

Every CI atmosphere has a singular Buildkite queue title. Particular person CI jobs can goal cases in a particular atmosphere by specifying the corresponding queue title. Jobs will fall again to the default CI atmosphere when no queue title is explicitly specified.

CI Environments Are Extremely Versatile

With this new Terraform setup we’re capable of assist an arbitrary variety of CI environments with minimal overhead. We create a brand new CI atmosphere per CPU structure and model of Xcode. We are able to even duplicate these environments throughout a number of variations of macOS when performing an working system replace throughout our fleet. We use devoted staging environments to check CI jobs on cases launched from a brand new AMI earlier than we roll out that AMI broadly.

Once we are not repeatedly utilizing a CI atmosphere, we are able to specify a minimal capability of zero when calling the Terraform module, which is able to set the identical worth on the underlying Auto Scaling group. Then the Auto Scaling group will solely launch cases when its desired capability is elevated by the scaling service. In follow, we are inclined to delete older environments from our Terraform code. Nevertheless, even as soon as an atmosphere has been wound down, reinstating that atmosphere is so simple as reverting a few commits in Git and redeploying the scaling service.

Rotation of Cases Will increase CI Consistency

To attenuate the chance for EC2 cases to float, we terminate all cases every night time and change them day by day. This fashion, we might be assured that our CI fleet is in a identified good state initially of every day.

When an occasion is terminated, the underlying Devoted Host is scrubbed earlier than a brand new occasion might be launched on that host. We terminate cases at a time when CI demand is low to permit for the EC2 Mac scrubbing course of to finish earlier than we have to launch recent cases on the identical hosts. When an occasion terminates itself in a single day, it’ll decrement the specified capability of the Auto Scaling group to which it belongs. As engineers begin pushing commits the following day, the scaling service will increment the specified capability on the suitable Auto Scaling teams, inflicting new cases to be launched.

A chart showing CI capacity relative to job volume over more than one week.
Cases terminate themselves in a single day. We cut back our most capability over weekends. The spikes in job quantity that elevated capability on the 2nd, sixth, and seventh have been hidden by smoothing within the chart.

When an occasion does expertise configuration drift, we are able to disconnect that occasion from Buildkite with one click on. The occasion will stay operating however will not settle for new CI jobs. An engineer can log into the occasion to analyze its state till the occasion is finally terminated on the finish of the day. To maintain total CI capability secure, we are able to manually add an extra occasion to our fleet, or a substitute will likely be launched robotically if we terminate the occasion early.

We Ship Xcode Variations Extra Shortly

We respect the brand new capabilities of our upgraded CI system. We are able to lease extra Devoted Hosts from Amazon on demand to climate surprising spikes in CI utilization and to check software program updates completely. We roll out new AMIs regularly and may roll again painlessly if we encounter surprising points.

A chart showing CI capacity relative to job volume for two simultaneous versions of Xcode.
CI jobs shift from Xcode 14.1 to 14.2. On the twenty fourth, we briefly elevated 14.2 capability to accommodate a spike in jobs.

Collectively, these capabilities get Airbnb iOS builders entry to Swift language options and Xcode IDE enhancements extra shortly. Actually, with the tailwind of our new CI system, we’ve seen the tempo at which we replace Xcode enhance by over 20%. As of the time of writing, we’ve internally rolled out all obtainable main and minor variations of Xcode 14 (14.0–14.3) as they’ve been launched.

Our new CI system ran over 10 million minutes of CI jobs within the final three months of 2022. After upgrading to EC2, we spend meaningfully fewer hours on upkeep regardless of a rising codebase and persistently excessive job quantity. Our newfound potential to scale CI to satisfy the evolving wants of the Airbnb iOS group justifies the elevated complexity of the rebuilt system.

After the migration to AWS, iOS CI advantages extra from shared infrastructure that’s already getting used efficiently inside Airbnb. For instance, the brand new iOS CI structure enabled us to keep away from implementing an iOS-specific resolution for robotically scaling capability. As an alternative, we leverage the aforementioned fork of buildkite-agent-scaler that Airbnb engineers had already transformed to an inner Airbnb service full with a devoted deployment pipeline. Moreover, we used current Terraform modules which are maintained by different groups to combine with IAM and SSM.

We now have discovered that EC2 Mac cases launched from customized AMIs present lots of the advantages of virtualization with out the efficiency penalty of executing inside a digital machine. We contemplate AWS, Packer, and Terraform to be important applied sciences for constructing a versatile CI system for large-scale iOS improvement in 2023.

Copyright © All rights reserved. | Newsphere by AF themes.