With the proliferation of reusable libraries, one of the challenges that has come about is how to properly manage library dependencies. Lack of transparency into the composition of software systems impacts the software development lifecycle (SDLC) process for a variety of important tasks that developers perform every day, from using external libraries to upgrading them across one or more software products and beyond. For teams and developers that vend software libraries, it is challenging to track usage and version spread across software products that use these libraries. Manually gathering and keeping this information up-to-date is error prone, not scalable, and time-consuming. As we do not know which libraries or dependencies are used where, our ability to track and assess the impact of derived software components that are buggy, stale, have critical vulnerabilities, or need to be demised entirely are limited. For the developer, library version upgrades are done manually, requiring tedious and coordinated changes. This issue compounds as software products need continuous refactoring to reflect an engineering team’s learnings from user feedback and operational issues, which often involves changes to software dependencies.
The Continuous Integration/Continuous Delivery (CICD) Platform team at Goldman Sachs (GS) is focused on improving the day-to-day experience of developers via CICD platform solutions. In order to solve the dependency tracking challenge for developers, we launched support for dependency tracking with the help of Software Bill of Materials (SBOMs) artifacts for internal, vendor, and open source software libraries. An SBOM is effectively a nested inventory - a list of ingredients that make up software components. SBOMs provide answers to questions such as: "What are my software components dependencies?" and "Which software components depend on library X?". Our team uses CycloneDX as the SBOM implementation given the rich tooling across all the build tools/programming languages used within Goldman Sachs. SBOM manifests are extracted from and published in the CICD pipeline by using build artifacts/build tools specific to programming languages (e.g. pom.xml, requirements.txt, packages.json). This is available to developers through a GitLab project badge. Now, a library provider can answer which software product is dependent on their component, grouped by version. This dependency graph is captured and kept up-to-date automatically on a recurring basis, without any developer interaction or changes to individual software products, since the SBOM generation is included in CI pipelines by default. Periodic SBOM generation is also executed across all our software products to ensure that we also capture SBOMs for products that are not under active development. We currently have SBOMs captured for software products across Java (Maven and Gradle), Python (pip), JavaScript (npm), C#, and a few other languages and build tools.
Developers have found dependency information captured via SBOMs useful in a variety of ways:
In Q1 2021, a functional bug was discovered in a version of an internally produced library used by potentially hundreds of internal software products. By the time the issue was identified, the library had long been in use and analyzing the blast radius was a not a trivial task. Using SBOM dependency data and the nature of deployment of these applications (externally hosted or internet facing applications are more sensitive than internally hosted and used), the team was able to identify the list of impacted products within a minute, and using that, completed an impact analysis in a matter of hours. This analysis would otherwise have been a multi-month exercise. The team used this information to reach out to more than 100 software product owners and requested an upgrade to the patched version of the impacted library. The image below shows version spread and number of dependents for each version of an internal library; all this data is available to engineers at the click of a button. More recently, in Q4 2021 when the log4j vulnerability became public, data captured via SBOMs was instrumental in identifying and remediating the impacted software products.
Another engineering team was moving their build pipelines from on-premise to the cloud. On-premise and cloud build ecosystems have their own artifact repositories, and the dependencies available on these repositories might not be the same. This team used the dependencies information captured in SBOM to identify all the missing (direct and transitive) dependencies in the on-premise artifact repository, and added them prior to their migration, saving many hours of analysis for developers.
The dependency graph is extremely critical for another vital aspect of modern software development: continuous refactoring. Code needs to be kept up-to-date as it accumulates debt, ends up using deprecated classes/APIs, or introduces risks due to incomplete or invalid API usage. Equally, an internal library might want to push a critical bug fix or remediate a vulnerability associated with a specific version that needs to be addressed across all dependent software products. In the past, this was typically done by sending emails and chasing individual teams to fix their code. To help developers address this challenge and provide a mechanism to execute improvements across products, the CICD Platform team will be introducing support for automated code refactoring for dependency version upgrades and configuration artifacts (for pre-defined and widely used files such as log4j.properties) in 2022. This feature is similar to the Dependabot feature provided by GitHub. Comprehensive automated test coverage really improves the efficacy of these automated code changes, increasing the confidence of developers who are accepting the merge requests. See below the sample automated merge request created for a version upgrade of an internally used library.
We hope you found this blog post informative! If you would like to learn more about exciting opportunities at Goldman Sachs, we invite you to explore our careers page.
See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.