At Goldman Sachs, payments processing sits at the heart of our Transaction Banking (TxB) business. This post will focus on one of TxB’s services that enables instant payments for our corporate clients. First, we will give a high-level overview and present technical challenges associated with instant payments. We will then provide an explanation of the technical architecture, focusing on how TxB utilizes various products and services offered by Amazon Web Services (AWS) to achieve a highly scalable architecture with millisecond latency per transaction and 24x7 availability.
An instant payment (also called a real-time payment) is any payment that settles between both the sender and the recipient within a few seconds and enables the near instantaneous transfer of funds between two bank accounts. An example of an instant payment is when one company pays another company immediately upon delivery of a product. The receiving company may release the product only after confirmation that the payment has been received.
An advantage of an instant payment scheme over a batch-based payments scheme (e.g., ACH payments) is that some batch-based payment schemes may take days to settle funds. Instant payments require greater throughput, lower latency, and higher availability of computing systems vs. these other payment schemes and are typically available 24x7x365 and can run across multiple platforms (e.g., mobile devices or traditional computing devices such as computers or laptops).
According to a study from the Federal Reserve, 80% of businesses in the United States are using a method of faster payments. As a result, there is significant demand for advanced technical solutions such as reliable payment processing infrastructure and scalable cloud-based platforms capable of handling high payments volumes. Instant payments create opportunities for innovative services and applications, pushing businesses to adopt advanced technologies to meet corporate demands for fast, secure, and seamless payment transactions.
There are five main actors involved in a typical instant payment processing scheme:
The high-level diagram below illustrates how an instant payment flows between sender and receiver.
In the illustration above, both the sender and the receiver happen to be corporate entities; however, nothing in the above flow precludes either actor from being individual consumers, government entities or other banks.
Most notably, the sending and receiving banks are where TxB plays the pivotal role of facilitating any instant payment transaction. TxB may act as both a sender and receiver of instant payments. This role is critical for interfacing with the Payment Exchange Network. TxB must ensure latency is minimized, and high throughput and high availability are achieved when payments are settled between these actors.
One of the biggest challenges in instant payments processing is that funds are required to move (also called settlement) between sender and receiver through a Payment Exchange Network within seconds. Once the payment is settled, the payment is considered final and irrevocable.
As stated above, the three primary goals of any instant payment technology solution are as follows:
Below is a hypothetical scenario between two fictitious corporations and two fictitious banks where each corporation maintains their respective bank account.
Imagine WidgetTechCo has an Account with SummitPeakBank. WidgetTechCo manufactures widgets for InnovateTechCo. InnovateTechCo has an account at VistaTrustBank. It is the end of the month and InnovateTechCo would like to pay an invoice that is due to WidgetTechCo. This invoice is $2,000. The diagram below illustrates how InnovateTechCo will pay WidgetTechCo the invoice using an instant payments scheme.
VistaTrustBank will send this payment on behalf of InnovateTechCo to the Payment Exchange Network. The Network will identify that WidgetTechCo, who keeps an account at SummitPeakBank, needs to receive a $2,000 payment; hence, the Network instructs SummitPeakBank to deposit $2,000 into WidgetTechCo's Account.
The moment the payment instruction is published to SummitPeakBank by the Network, the Network starts a timer for SummitPeakBank. Within a few seconds (e.g., 10 seconds), SummitPeakBank must perform the following functions (at a minimum):
If SummitPeakBank cannot respond within the given time limit to the Network, the Network will cancel the payment instruction - this effectively "unwinds" the entire payment. The Network will then instruct VistaTrustBank that the $2,000 payment to WidgetTechCo has failed to be completed and therefore the cycle must repeat itself (if needed). There are two scenarios forward after that: 1) VistaTrustBank may retry the payment instruction (it will be a new payment instruction); or 2) VistaTrustBank and SummitPeakBank communicate offline (via email or phone, for example) to investigate why the payment transaction has failed.
What makes this more interesting is that the Network may be sending payments at an extremely high transaction rate (e.g., 300 payments per second or more) during peak times and at any hour of the day, highlighting the need for high throughput, low latency and high availability.
The diagram above illustrates TxB's cloud first architecture for processing instant payments. In the following example, SummitPeakBank wants to send TxB a payment transaction; TxB in turn will then ensure that the payment is deposited into one of TxB's client's account.
The three technologies chosen by TxB to help us tackle the high throughput, low latency, and high availability challenges discussed previously are as follows:
To achieve high throughput and high availability for our instant payments architecture, the compute layer needs to scale based on the architecture’s resource demands with no developer intervention. AWS Fargate supports automatic scaling ("auto scaling") that fits our instant payments use case. This feature has the following advantages for our architecture:
DynamoDB is a fully managed service offering NoSQL capabilities and supports regional replication natively. For our instant payments architecture, DynamoDB was an ideal fit for the following reasons:
When an instant payment is being processed by TxB, some of the primary functions that need to be performed on the payment include security validations, payment-level functional checks including fraud checks, and payment enrichments. These functional checks will be handled by other TxB Services as shown in the diagram.
As discussed in the DynamoDB section, these call-outs to other TxB Services will generate lifecycle events per call. When calls by the Payment Gateway to other TxB Services are being made via Kafka, the request/response latency must be fast (single-digit milliseconds or lower) to ensure high throughput is achieved, especially during bursts of activity. We can achieve these low latency requirements by using Kafka.
When TxB exchanges payments with the Payment Exchange Network, we need to ensure that we are processing these payments exactly once to avoid duplicate payments processing. To achieve exactly once processing on producers, we take advantage of Kafka’s native idempotency feature. Contrasted with other messaging technologies (such as traditional queueing technologies), these messaging schemes do not support producer idempotency natively like Kafka without added engineering effort.
With respect to performance (message latency), in our lab environment we found that Kafka outperforms traditional queuing technologies by about 175% for our instant payments use case. In other words, given the same number of consumers and producers in a multi-threaded environment and the same payment messages load, the time taken (latency) for a message to be consumed by a queue-based consumer was far greater than that of a Kafka consumer.
One final consideration on why Kafka was chosen for our instant processing use case: Kafka’s ecosystem natively supports a Schema Registry. Due to the strict time requirements of processing instant payments and the importance of ensuring that Kafka producers and consumers adhere to the ISO20022 specifications, having native support in Kafka for schema validation is greatly beneficial. Since payments processing is a critical function for TxB, we must ensure that if there are schema (contract) violations between consumers and producers, we fail-fast. In this context, fail-fast means that Kafka producers or consumers reject the payment message before further processing of the payment. For our use case, failing fast on contract violations between consumer and producers is preferable over processing potentially erroneous payments. In addition, since the Schema Registry feature is natively supported by the Kafka ecosystem, no added latency is introduced to payments processing where otherwise we would have had to contend with latency if we had to validate the message contract using other means (such as consumers and producers building their own custom validations). In instant payments processing, validating the contract between producers and consumers using an external system (such as another TxB-built custom service) may introduce more latency. Therefore, the native enforcement of the data contract was a critical deciding factor for us.
Our instant payments architecture using Amazon MSK (Kafka), Amazon DynamoDB and AWS Fargate/Amazon ECS, presents a solution capable of addressing the challenges inherent in high-throughput, low-latency and 24x7 availability requirements. By using these services, our instant payment architecture can scale to larger payment volumes without compromising on throughout, latency or availability.
By using these managed services and their respective scaling automation, we have achieved cost savings in terms of both infrastructure and engineering effort.
See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.