If you have worked closely with self-hosted/non-cloud databases, you will likely be familiar with the challenge of upgrading, load balancing, and failing over relational databases without having to touch client applications. At Goldman Sachs (GS), we work extensively with relational databases which do data processing, batches, and UI connect via JDBC (TCP layer). Some of these databases do not provide a suitable load balancing and failover mechanism out of the box that works for our use cases. To deal with load balancing and failovers, the best solution would be to place the database behind a Domain Name System (DNS) layer and have the applications connect via the DNS. However, this can cause problems when the port of the database needs to change. Changing the port or the host requires extensive coordination and testing efforts across multiple client applications. The level of effort is directly proportional to the nature/breadth of the database usage. Our databases are used extensively, so changing ports and testing them is not very efficient or feasible. Thus, in order to improve resource utilization, save time during database failovers, and improve developer efficiency, we looked into using HAProxy to solve this challenge.
HAProxy has been used extensively in the industry as a web application layer load balancer and gateway for the past 20 years. It is known to scale well, widely used, and well documented. The fact that it has been open sourced means that you can use, contribute, and adapt it for your own purposes. This blog post focuses on using HAProxy as a JDBC/TCP layer database load balancer and fast failover technique. We will be using Sybase IQ as an example here but the approach can be extended as-is to other databases. Please note that this is just one solution to this specific problem; there are other available solutions.
Sybase IQ provides suitable query performance without doing much database optimization making it capable of serving internal reporting dashboards with sub-second-long response times along with ad-hoc querying purposes with minute-long response times for terabytes of data. For this use case, we have two multiplex databases being used in a live-live mode; each multiplex has 3 nodes, one on each cluster has been specifically earmarked for write purposes. Sybase IQ multiplexes have a hybrid cluster architecture that involves both shared and local storage. Local storage is used for catalog metadata, temporary data, and transaction logs. The shared IQ store and shared temporary store are common to all servers. This means that you can write via one node and the data will be available across all nodes on the cluster.
For this deployment, two nodes on each cluster are used for read-only purposes. The writer framework guarantees eventual consistency across the clusters housed in different data centers. Only the data that has been written to both of the clusters can be read via the reader nodes. The data from these reader nodes is read via queries/visualization platforms such as: Tableau, Ad-hoc SQL Queries, or Legend (open source data modeling platform - referred to as readers going forward).
Figure 1: Tight coupling of applications to specific reader nodes.
Sybase IQ allows queries to run across multiple nodes at once, enabling parallelism. In our use case, we wanted load segregation and fault tolerance. This requires individual nodes to serve their own queries, which would require an external load balancer. The need to use external load balancers for query/load segregation and faster failovers has been well documented here and here on the SAP blogs.
HAProxy is a free, fast, open source, and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. Over the years it has become the de facto standard open source load balancer, is shipped with most mainstream Linux distributions, and is often deployed by default in cloud platforms.
Given the JDBC connections to the database in our use case, we needed a load balancer and gateway solution that could work at the TCP(L4) layer and is known to be stable within the domain. After evaluating multiple solutions, we decided to use HAProxy as one of the possible proxy solutions.
Figure 2: New architecture with HAProxy.
As seen in Figure 2, we installed HAProxy in TCP mode and redirected all of our readers to come via HAProxy itself. The readers will behave as if they are connecting to the Sybase IQ database directly and quickly using the existing JDBC client driver (JConnect or SQL Anywhere), but in reality, the HAProxy is in between, sorting the queries to available nodes.
Here are additional details on how to set up HAProxy:
Configurations
Figure 3a: HAProxy config – global section.
Figure 3b: HAProxy config – listen section.
We have set up a cluster of HAProxy instances to make it fault tolerant as well. These instances are brought up on UNIX boxes. A simple gateway sits in front of them to ensure high availability in case one of the instances go down. When we went live with the static configuration, HAProxy was unable to understand the load on the Sybase IQ reader nodes. This meant that our engineers needed to manually change the weight of the nodes to ensure that the traffic did not hit the impacted nodes. The points below explain how we made this dynamic.
Figure 4a: CPU flatlining check and lowering the weight.
When we bring up HAProxy, all reader nodes are configured with a weight value of 1 (i.e. highest priority) which results in the load balancer redirecting all queries to reader nodes equally. Based on the load algorithm, the weight is increased (i.e. lower priority) on the fly so that the HAProxy load balancer can minimize the queries hitting the busy reader nodes. Here (Figure 4b) is how we change the weights on HAProxy dynamically via the stats/admin socket.
Figure 4b: Changing the reader node weights via the stats/admin socket.
Referring to the diagram below, you will notice that we had skewed usage of our cluster before introducing HAProxy (before red line), which has been balanced post go-live (after the red line).
Figure 5: Showing loads pre and post usage of HAProxy.
There are several ways to solve load balancing and failover of relational databases. In this blog post, we shared how we are using HAProxy and quantified the improvements. We hope this blog post was informative.
Want to learn more about exciting engineering opportunities at Goldman Sachs? Explore our careers page.
See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.