As a part of elevating the mobile developer experience at Goldman Sachs, we wanted to modernize how we build and deliver mobile applications. Our goal was to provide a system that would build code, run tests, package, and deliver applications to the Google Play Store or Apple App Store. The current system has evolved over time. It consists of Mac Minis and Mac Pros in data centers that are setup and managed by our IT team. There are different sets of machines dedicated to build, sign, and upload to either of the app stores, each of which has its own pipeline/process. Software on these machines has been installed or upgraded using a manual management solution. Adding capacity or upgrading hardware could take weeks to months. We sought to build a solution that would be fully automated, from procuring hardware, to delivering distribution-signed applications to the Google Play Store or Apple App Store.
Macs are an essential part of a CI/CD solution for Apple platforms. We needed a solution that would allow us to add, remove, and manage Macs remotely. Our mobile application codebase resides on GitLab, which supports the CI/CD workflow and pipeline by registering appropriate runners. We found that macOS offerings on Amazon Web Services (AWS) Elastic Cloud Compute (EC2) fit all our requirements. We also chose to use macOS machines for Android builds.
With this in mind, we'll now explore how we setup our CI/CD solution.
Amazon Machine Image (AMI) provides the information required to launch an EC2 instance. This includes a snapshot of all software and configuration required on any instance. We started with the base macOS Catalina and Big Sur AMIs provided by AWS and then customized them with:
We created a customized AMI using Packer and shell provisioners. We then spawned an instance of this AMI to run CIS-CAT Pro and Qualys scans which verifies that system settings are configured as recommended by our security team. The scanned AMI is then distributed to the AWS account that we used to setup the CI/CD infrastructure.
We use Terraform to define our Infrastructure as Code (IaC). Our Terraform configuration defines the network topology, macOS instances, and connectivity to other services. These definitions are transformed to infrastructure within AWS by an automated pipeline. During this process macOS instances are programmatically provisioned within our AWS account and launched with the customized AMI, eliminating the manual steps required to procure and provision Macs.
We have set up automated deployments of development, staging, and production environments of the CI/CD system. Each environment is deployed in its own private subnet without direct Internet access. VPC endpoints are configured to allow access to GitLab, Nexus and other AWS services. Here is a high-level overview of our infrastructure:
Nexus scans for a compatible license and known vulnerabilities in vendor-provided and open-source dependencies. Nexus also caches the artifacts, which speeds up subsequent fetches and also provides resilience in case the external repository is down.
Builders and Signers have their own IAM instance profiles which control access to elements in the Secrets Manager. Specifically, only Signers have access to distribution signing keys. The launch sequence of a macOS instance is handled by AWS-provided ec2-macos-init. The final stage of initialization executes a user-provided script referred to as user-data. We use inline scripts in user-data for some initial setup, and fetch scripts stored in source control to perform some tasks when a new Builder/Signer is provisioned. These tasks include:
Note: User-data is executed as root. Some scripts create files and directories that will be accessed by accounts with lower privilege and ensure to set the right ownership and permission.
Once setup steps in, user-data is completed successfully, and the gitlab account is logged into, and gitlab-runner, which is started by LaunchAgent, begins polling for any available jobs.
Now we'll explore how we set up an environment for every build. We'll also explore additional details on building and signing the mobile application.
A build in a CI/CD system has to run in a clean environment, with a version of the operating system and build toolchain specified in the build configuration. A clean environment means that a copy of the code to be built is checked out from version control, and any build time dependencies are set up each and every time. Any artifacts from a previous run of the same or different application build must not be available. This is to ensure any test failures can be attributed to changes in code rather than the build environment. Usually, containers like Docker or virtual machines are used to provide such ephemeral build environments. iOS builds cannot be containerized within Docker. We also do not use virtual machines in order to keep things simple. We use the custom executor provided by the gitlab-runner to provide a sandbox for each build. An executor has four stages:
Each stage in a custom executor maps to an executable or a shell script that is launched by gitlab-runner. Recall that we have set up our AMI to launch gitlab-runner as a user account called gitlab, and each of the above stages would be executed as user gitlab. This results in the build being able to store artifacts in any part of the filesystem the gitlab user has write access to, including the home folder. This means that the home folder for gitlab has to be cleared before and after each build to provide a clean build environment. This isn't possible, as we need gitlab-runner to be running and certain configuration files to be retained in the home folder. To solve this, we created another user account named cibuilder and arranged for all commands during the run stage to be executed as this user. cibuilder is restricted to have write access only within its home folder. Configuration, preparation, and cleanup run as user gitlab, and are configured to terminate all jobs running as cibuilder and recreate the home folder, effectively sandboxing the build.
We use fastlane to drive the build process. Signing certificate, provision profile, and keystore required for a build are set up using fastlane actions. Application teams use a YAML file to specify:
A set of scripts managed by our team generate a Fastfile based on this YAML and kick off the build. Only development-signing elements are made available during this phase. As described earlier, configure, prepare, and cleanup stages of the custom executor run as gitlab and run stage as cibuilder, restricting access for application-provided scripts to cibuilder's home folder. Application packages, test results, and log files are uploaded to the GitLab artifact cache at the end of each build.
The iOS application package generated by the build phase is signed using the Development Certificate, hence can only run on a few devices within the development teams. The Android application package is signed with a key scoped only for the builders, and is not eligible for upload to Google Play Store. We use fastlane action resign to re-sign artifacts produced by the build phase to prepare them for distributions on the respective app stores. We follow the build and re-sign approach to limit the distribution signing keys to only a few hosts that are configured to only run scripts provided by the Goldman Sachs Mobile CI/CD team. The custom executor for gitlab-runner on the hosts used to re-sign is configured with a run stage that allows only these sub-stages:
The after_script stage is skipped and the build_script sub-stage is overridden to re-sign the artifact downloaded by the download_artifacts sub-stage. See the description for custom executor's run stage to learn more about each of the sub-stages above. The scripts used to re-sign store artifacts are in well-known locations and can be cleaned up without having to erase the home folder. Hence we don't need the cibuilder account leading to configure, prepare, run, and cleanup running as gitlab.
Further down in the build pipeline, uploaders transfer the signed artifacts from GitLab cache to the artifact repository for long-term storage.
We went from a system that was set up manually, that took significant time to procure and provision the hardware, that was difficult to upgrade hardware or software, and that required coordination of multiple teams, to a setup that is streamlined, centralized, and automated. We maintained the security model of limiting the distribution keys to only a few designated hosts, and provided simple yet effective sandboxing for builds. The new system was set up in a matter of months and was only possible due to well-accepted processes and tools associated with AWS EC2.
We are actively working on completing the continuous delivery part of the pipeline, and have many more exciting plans for macOS on EC2. We are also hiring for several exciting roles.
See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.