Using NLP to Purposefully Articulate Software Changes

The Problem

Engineers must design, prioritize, implement, and release changes to the systems we manage. The problem we often face is impactfully and succinctly articulating these changes to our stakeholders (our clients and fellow engineers). That is, what the change is and why it is being made (the commercial benefit).

Engineers strive to explain complex concepts using concise language to make them more digestible. This conciseness becomes so engrained in our work that we get so good at explaining what we are doing in the fewest words possible; "I am upgrading a dependency", or "I am parallelizing the execution of a function" - however, in the quest for brevity, important details can easily be lost in translation, often, those providing context on why we are making particular changes. Imagine if instead of "I am upgrading a dependency", we wrote "I am updating a dependency in preparation for a Java 21 upgrade”. Or instead of "I am parallelizing the execution of a function", we wrote "I am parallelizing the execution of a function to add an additional check in the future and still operate within the function's latency SLO".

Herein lies the problem; how do we succinctly describe what we are changing, and why we are making the change. By explaining the "what” and “why” on our change requests, we would yield potential benefits that include:

Focus on the rationale of every change resulting in a more meaningful and commercial prioritization decision.
Engineers acquire a greater sense of accountability, purpose, and pride in the impact their contribution will entail as they understand the impact clearly (and therefore they are more vested in their contribution).
Through clear change descriptions, closer relationships are forged between product managers and engineers (and clients with meaningful release notes)
Change reviewers have an improved sense of what is being changed, and why - making for more impactful, less frictional change reviews, with meaningful feedback based on change context.
A broader shared understanding across the team of what changes are being made and the prioritization process.

With all this in mind, we embarked upon an effort focused on ensuring that within our team, each change request title (bound to 300 characters) describes the what and why of the change. Through this effort of questioning each change request title, not only did we realize the benefits, but also found that change requests became more granular. We believe that the probable cause of more granular changes is - "if I can't describe the what and why of my change succinctly in a single 300-character sentence, then perhaps it's too big". When we shared our newfound approach with other teams within our organization, they had similar sentiments about the yielded benefits – which got us thinking: how could we scale this successful practice across an entire organization?

The Solution

Our solution tackled the problem in two ways:

Through continued sharing of guidance on the benefits of the practice of ensuring that every change describes the what and why (advocation of best practice)
The development of tooling, weaved into our software development life cycle (SDLC), to govern how change request titles are formed (enforcement of best practice)

Though we believe the most effective way of bringing about change in the way we do things is via culture (carrot as opposed to stick), we do appreciate that often practices and controls require enforcement, and in turn these enforcements can help positively influence the desired outcome.

We advocated our best practices through engaging with other teams at all levels, setting up timeboxed pilots to gather feedback, using this feedback to improve the practice, and using testimonials to help further propagate the tooling.

The enforcements came in the form of an API we developed that took as input a change request title and responds with a Boolean output indicating whether the title explained what is happening and why. With many engineers on the same page about the good practices of well-informed change request titles, we weaved the title analyzer API into our continuous integration pipelines as a control to block titles that do not adhere to the standard. This break in the continuous integration pipeline ensures that not only titles meet our standards before they can be reviewed, merged, and released, but also reinforce the positive culture.

How does the analyzer API work?

The title analyzer is written with Java and uses natural language processing (NLP) methods (sentence splitting, tokenization, and grammatical tagging) to break down the input sentence(s) and look for constructs within the sentence(s) like nouns, verbs, and conjunctions to help answer the question "does this sentence explain what is happening and why?".

Figure 1. Merge request title analyzer overview.

When we pass a sentence to the model, it first determines if the sentence can be broken down into multiple sentences according to the punctuation. This is referred to as sentence splitting (SS) and is the entry point for our algorithm. Each sentence (if multiple) is then tokenized into single words that are further tagged (part-of-speech (POS) or grammatical tagging) using the Penn Treebank POS corpus tags to identify the type of word based on definition and context within the sentence. Finally, the tokens undergo lemmatization where each token (word) is reduced to its common root form (e.g., updated to update, issues to issue, etc.). To ascertain if a sentence contains WHAT, we locate a NOUN (i.e., dependency, Java) and VERB (i.e., updating, prepare, upgrade) in the sentence, and the WHY component is signaled through a fixed set of conjunctions (as, because, due to, in order to, since, so that, so we can, therefore, to ensure). We also have a special case where “to” can be considered a valid conjunction if the subsequent token is a VERB (i.e., to prepare). This is an example of how we have tuned the title analyzer algorithm.

Consider the following title: Updated app configuration with stop and start to avoid issues while renaming host. Added trap to ensure clean end of script.

The title is first split into separate sentences, and for each sentence, we extract annotated tokens (Figure 2) that are used to determine whether the merge request is good or bad. Taking “updated” and “app” as our example tokens, the model annotates them as verb and noun, respectively. The lemmatization process transforms “updated” into its simplest form “update” while “app” remains the same. Because the title contains a noun and a verb, we ascertain that it describes what the changes are.

Figure 2. Parsed sections of example good merge request title.

Further, we note that the title meets the criteria of containing a conjunction in two ways. The first being the special case of the token “to” immediately followed by the verb “avoid”. The second case is the presence of “to ensure” in the latter title fragment. With this, we can confidently state that this is a good change request title.

Now consider the following title: Updated the old application QA data server instance.

This is a simpler case in which SS does not apply because we only have a single sentence. As shown in Figure 3, the sentence undergoes tokenization and lemmatization where “updated” and “application” are annotated as verb and noun, and then transformed in their base forms “update” and “application”, respectively.

Figure 3. Parsed sections of example bad merge request titles.

Despite the title clearly indicating what the changes are, it does not contain any of the required conjunctions or qualify for the special case, hence, regarded as a bad change request title.

Performance of analyzer

Overall, the analyzer performs well with an accuracy of 93%, respectively. This evaluation is based on 3443 change request titles that we collected from our organization. We use this metric as a reference whenever we tune the algorithm, ensuring that we either increase or maintain the accuracy.

What makes the analyzer special?

The simplicity of the analyzer offers the following benefits.

Transparency: Because we have simplified the analysis process, it becomes easier for engineers to understand the decision-making process. Everyone can easily grasp the criteria used to evaluate change request title quality, facilitating learning, and driving improvement overtime.
Cost-effective: The analyzer only requires a fraction of standard computing to run, which enables us to leverage our existing resources with no additional investment. In addition, the algorithm is simple with only a few dependencies resulting in limited maintenance cost. While there is some upfront investment required at the integration stage, the long-term return on investment is substantial.
Ease of integration: As highlighted earlier, we integrated the analyzer into our SDLC as a control that breaks build pipelines if the change request title is not acceptable. This was a straightforward process that required minimal dependencies and configurations making adoption straightforward for other teams. The ease of integration ensures that the analyzer becomes an integral part of the development process with minimal disruption.

The Outcome

After introducing this control within an organization comprising several hundred engineers making a large volume of change requests, we saw an immediate improvement in the quality of change request titles given titles that did not adhere to the standard were prohibited from being merged. We continue to feed any false negatives/positives back into our algorithm to further improve the reliability of the check. We are now in a place where developers can expect a higher standard of change summaries, can succinctly articulate the rationale, and therefore feel a greater sense of purpose for the changes they are making. Reviewers can also more effectively analyze the changes and contribute more meaningful feedback, and stakeholder release notes are focused on the commercial benefits of each change.

We asked a few people to share some thoughts on the title analyzer, and here is what they had to say:

Thanks to the change request title analyzer, our teams now perform streamlined code reviews with comprehensive context upfront. The tooling promotes a positive culture of clarity and collaboration across teams. It's a simple yet impactful addition to our SDLC.
Ankhuri Dubey, Tech Fellow, Managing Director

The change request title analyser ensures that the MR author adds a thoughtful and meaningful title to the change request and provides functional context summarizing the relevance. This helps reviewers as well as other engineers who are looking at the code later, providing a much better context of the reason why a change was implemented.
Sachindra Nath, Tech Fellow

The change request title analyzer encourages our dev team to provide more thoughtful and contextual MR titles. It has been particularly helpful for those of us reviewing code across many disparate projects and at times unfamiliar codebases. A quality MR title that concisely explains its contents and purpose helps us reviewers with context switching and reduces the cognitive load in interpreting the purpose of the code under change.
Mayer Salzer, Tech Fellow

See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.

Goldman Sachs DeveloperPrivacy and CookiesGS Terms & ConditionsRegulatory DisclosuresSecurity

GS DAP® is owned and operated by Goldman Sachs. This site is for informational purposes only and does not constitute an offer to provide, or the solicitation of an offer to provide access to or use of GS DAP®. Any subsequent commitment by Goldman Sachs to provide access to and / or use of GS DAP® would be subject to various conditions, including, amongst others, (i) satisfactory determination and legal review of the structure of any potential product or activity, (ii) receipt of all internal and external approvals (including potentially regulatory approvals); (iii) execution of any relevant documentation in a form satisfactory to Goldman Sachs; and (iv) completion of any relevant system / technology / platform build or adaptation required or desired to support the structure of any potential product or activity. All GS DAP® features may not be available in certain jurisdictions. Not all features of GS DAP® will apply to all use cases. Use of terms (e.g., "account") on GS DAP® are for convenience only and does not imply any regulatory or legal status by such term.

¹ Real-time data can be impacted by planned system maintenance, connectivity or availability issues stemming from related third-party service providers, or other intermittent or unplanned technology issues.

Transaction Banking services are offered by Goldman Sachs Bank USA (“GS Bank”) and its affiliates. GS Bank is a New York State chartered bank, a member of the Federal Reserve System and a Member FDIC. For additional information, please see Bank Regulatory Information.

Certain solutions and Institutional Services described herein are provided via our Marquee platform. The Marquee platform is for institutional and professional clients only. This site is for informational purposes only and does not constitute an offer to provide the Marquee platform services described, nor an offer to sell, or the solicitation of an offer to buy, any security. Some of the services and products described herein may not be available in certain jurisdictions or to certain types of clients. Please contact your Goldman Sachs sales representative with any questions. Any data or market information presented on the site is solely for illustrative purposes. There is no representation that any transaction can or could have been effected on such terms or at such prices. Please see https://www.goldmansachs.com/disclaimer/sec-div-disclaimers-for-electronic-comms.html for additional information.

Mosaic is a service mark of Goldman Sachs & Co. LLC. This service is made available in the United States by Goldman Sachs & Co. LLC and outside of the United States by Goldman Sachs International, or its local affiliates in accordance with applicable law and regulations. Goldman Sachs International and Goldman Sachs & Co. LLC are the distributors of the Goldman Sachs Funds. Depending upon the jurisdiction in which you are located, transactions in non-Goldman Sachs money market funds are affected by either Goldman Sachs & Co. LLC, a member of FINRA, SIPC and NYSE, or Goldman Sachs International. For additional information contact your Goldman Sachs representative. Goldman Sachs & Co. LLC, Goldman Sachs International, Goldman Sachs Liquidity Solutions, Goldman Sachs Asset Management, L.P., and the Goldman Sachs funds available through Goldman Sachs Liquidity Solutions and other affiliated entities, are under the common control of the Goldman Sachs Group, Inc.

May 9, 2024

Using NLP to Purposefully Articulate Software Changes

Richard Mwaba, Associate; Raj Patel, Vice President; Hoshil Sejpal, Vice President and Tech Fellow

The Problem

The Solution

How does the analyzer API work?

Performance of analyzer

What makes the analyzer special?

The Outcome

Solutions

Docs

Learn More