April 8, 2022

Providing 99.9% Availability and Sub-Second Response Times with Sybase IQ Multiplexes by Using HAProxy

Karan Ahluwalia, Vice President, Controllers Engineering and Jack Griffith, Vice President, Controllers Engineering

 

The Challenge

If you have worked closely with self-hosted/non-cloud databases, you will likely be familiar with the challenge of upgrading, load balancing, and failing over relational databases without having to touch client applications. At Goldman Sachs (GS), we work extensively with relational databases which do data processing, batches, and UI connect via JDBC (TCP layer). Some of these databases do not provide a suitable load balancing and failover mechanism out of the box that works for our use cases. To deal with load balancing and failovers, the best solution would be to place the database behind a Domain Name System (DNS) layer and have the applications connect via the DNS. However, this can cause problems when the port of the database needs to change. Changing the port or the host requires extensive coordination and testing efforts across multiple client applications. The level of effort is directly proportional to the nature/breadth of the database usage. Our databases are used extensively, so changing ports and testing them is not very efficient or feasible. Thus, in order to improve resource utilization, save time during database failovers, and improve developer efficiency, we looked into using HAProxy to solve this challenge.

HAProxy has been used extensively in the industry as a web application layer load balancer and gateway for the past 20 years. It is known to scale well, widely used, and well documented. The fact that it has been open sourced means that you can use, contribute, and adapt it for your own purposes. This blog post focuses on using HAProxy as a JDBC/TCP layer database load balancer and fast failover technique. We will be using Sybase IQ as an example here but the approach can be extended as-is to other databases. Please note that this is just one solution to this specific problem; there are other available solutions.

 

How We Use Sybase IQ

Sybase IQ provides suitable query performance without doing much database optimization making it capable of serving internal reporting dashboards with sub-second-long response times along with ad-hoc querying purposes with minute-long response times for terabytes of data. For this use case, we have two multiplex databases being used in a live-live mode; each multiplex has 3 nodes, one on each cluster has been specifically earmarked for write purposes. Sybase IQ multiplexes have a hybrid cluster architecture that involves both shared and local storage. Local storage is used for catalog metadata, temporary data, and transaction logs. The shared IQ store and shared temporary store are common to all servers. This means that you can write via one node and the data will be available across all nodes on the cluster.

For this deployment, two nodes on each cluster are used for read-only purposes. The writer framework guarantees eventual consistency across the clusters housed in different data centers. Only the data that has been written to both of the clusters can be read via the reader nodes. The data from these reader nodes is read via queries/visualization platforms such as: Tableau, Ad-hoc SQL Queries, or Legend (open source data modeling platform - referred to as readers going forward).

 

Prior Setup and Use Cases 

  • Tableau dashboards are used for canned reports and drilldowns. The set of queries fired by these dashboards are fixed and are usually served by materialized datasets, i.e., we rarely fire any queries from Tableau that have joins as they can slow down the performance. We promise sub-second SLAs for such queries.
  • Ad-hoc queries are fired from the Legend setup where users can explore their own or custom models. This means that we cannot really optimize such queries and thus promise response time in a few minutes.
  • Direct SQLs are also fired by our developers to debug production issues. 
  • The JDBC connection string in Tableau was hard-coded to use the reader node 1 of the secondary cluster whereas it was hard-coded in Legend to use the reader node 1 of the primary cluster. Developers can use any reader node of their choice.

Figure 1: Tight coupling of applications to specific reader nodes.

 

Areas for Improvement

  1. We needed our Sybase IQ cluster to serve queries with 99.9% availability during business hours. 
  2. Due to applications being tightly coupled with the reader nodes (Figure 1), most of the load was being directed to one node. To cope with each query, Sybase IQ tries to provide the maximum amount of resources to each query, and as a result, any subsequent queries will need to wait until the first query is completed.
  3. Sybase IQ does not have built-in load balancing or failover, leading to saturated hosts and errors if there are too many queries, or the queries take too long to resolve. 
  4. If the primary reader nodes go down, the time it takes to point to another node may impact our SLOs. 

Sybase IQ allows queries to run across multiple nodes at once, enabling parallelism. In our use case, we wanted load segregation and fault tolerance. This requires individual nodes to serve their own queries, which would require an external load balancer. The need to use external load balancers for query/load segregation and faster failovers has been well documented here and here on the SAP blogs.

 

New Architecture Considerations

  • Support mechanisms to route queries between primary and secondary clusters and to switch query routing to another cluster.
  • Support mechanisms to shut down unhealthy clusters for reads and writes. Ideally, all writes are rerouted to a healthy cluster.
  • Dynamically update the weights of IQ nodes in a round-robin way behind proxies, based on their health (CPU/ability to connect).
  • This architecture targets Sybase IQ but can easily be used with any other database such as DB2, Sybase ASE, etc. This is because the HAProxy configuration isn't specific to a database and can work with any database at the TCP/JDBC level.

 

Solution

HAProxy is a free, fast, open source, and reliable solution offering high availabilityload balancing, and proxying for TCP and HTTP-based applications. Over the years it has become the de facto standard open source load balancer, is shipped with most mainstream Linux distributions, and is often deployed by default in cloud platforms.

Given the JDBC connections to the database in our use case, we needed a load balancer and gateway solution that could work at the TCP(L4) layer and is known to be stable within the domain. After evaluating multiple solutions, we decided to use HAProxy as one of the possible proxy solutions.

Figure 2: New architecture with HAProxy.

As seen in Figure 2, we installed HAProxy in TCP mode and redirected all of our readers to come via HAProxy itself. The readers will behave as if they are connecting to the Sybase IQ database directly and quickly using the existing JDBC client driver (JConnect or SQL Anywhere), but in reality, the HAProxy is in between, sorting the queries to available nodes.

Here are additional details on how to set up HAProxy:

  • Command for creating an HAProxy instance on Unix: <path_to_HAProxy_installation> -f <path_to_HAProxyConfig.cfg>

Configurations

  • HAProxyConfig.cfg – file which is deployed on the server using a Terraform equivalent.
  • Global section – used to define the log path, stats socket, debug mode, etc. The stats socket is used to provide health information about the participating nodes to HAProxy. Based on the health, we can increase or decrease the weights of any node.

                                                                                                      Figure 3a: HAProxy config – global section.

  • Listen section –  used to open a HAProxy listener socket. We can define multiple sockets for a single HAProxy instance. Also, the sockets can be either HTTP or TCP. For the database load balancing, we use the TCP load balancing technique. We also define the participating nodes, socket timeout in either direction, and the load balancing algorithm in this section.
  •  

                                                                       Figure 3b: HAProxy config – listen section.    

Creating Dynamic Nodes

We have set up a cluster of HAProxy instances to make it fault tolerant as well. These instances are brought up on UNIX boxes. A simple gateway sits in front of them to ensure high availability in case one of the instances go down. When we went live with the static configuration, HAProxy was unable to understand the load on the Sybase IQ reader nodes. This meant that our engineers needed to manually change the weight of the nodes to ensure that the traffic did not hit the impacted nodes. The points below explain how we made this dynamic.

  • We also have cron jobs to continuously monitor the state of the nodes and change the weights in HAProxy via the configuration socket. The cron jobs look at the CPU and RAM usage of the Sybase IQ reader nodes along with the number of queries that are already running to identify the “load factor”. We have internal systems which keep on monitoring the health of servers but the load detection algorithms are generic and can hook up to any underlying metrics. 
  • Based on the load factor, we hit the stats socket (port 9999) to update the weights of each node.
  • As an example, below (Figure 4a) is a snippet of the code – we are looking for 5 consecutive occurrences of CPU over 95% which highlights CPU flatlining and thus the factor of the load on the reader node.
  • Note: It is common to notice CPU spikes on Sybase IQ nodes as it tries to allocate maximum resources to every query. The nodes slow down and sometimes hang when the CPU flatlines above 95%. 

                                                                                         Figure 4a: CPU flatlining check and lowering the weight.

When we bring up HAProxy, all reader nodes are configured with a weight value of 1 (i.e. highest priority) which results in the load balancer redirecting all queries to reader nodes equally. Based on the load algorithm, the weight is increased (i.e. lower priority) on the fly so that the HAProxy load balancer can minimize the queries hitting the busy reader nodes. Here (Figure 4b) is how we change the weights on HAProxy dynamically via the stats/admin socket.

                           Figure 4b: Changing the reader node weights via the stats/admin socket.

 

Conclusions 

Referring to the diagram below, you will notice that we had skewed usage of our cluster before introducing HAProxy (before red line), which has been balanced post go-live (after the red line).

                                                                                 Figure 5: Showing loads pre and post usage of HAProxy.

  • The recovery time for a primary node failure has gone down to a few seconds and is within our SLOs. Also, node recovery happens automatically because the weights are updated on HAProxy dynamically.
  • We were able to achieve 99.9% cluster availability during business hours with this new architecture. Given that we can utilize multiple nodes at the same time, the downtime of one node doesn’t impact the whole cluster.
  • Referring to the documentation, we performed stress tests on HAProxy clusters and found the latency to be observed at 5ms at the 99.9th percentile. This means that 1 in every 1000 requests/queries can have an additional latency of 5ms by just forwarding it through HAProxy, which is negligible.
  • As part of this exercise, we also started capturing the read queries and their response times on our cluster. Out of the total set of queries fired on our cluster during the time range, 64% queries ran with sub-second response times and 98% queries ran with sub-minute response times. Also, no queries were impacted due to cluster availability issues. 
  • Going forward, we plan to take a closer look at the data model to achieve sub-second response times for 99% of our queries.

There are several ways to solve load balancing and failover of relational databases. In this blog post, we shared how we are using HAProxy and quantified the improvements. We hope this blog post was informative.

Want to learn more about exciting engineering opportunities at Goldman Sachs? Explore our careers page.


See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.

Certain solutions and Institutional Services described herein are provided via our Marquee platform. The Marquee platform is for institutional and professional clients only. This site is for informational purposes only and does not constitute an offer to provide the Marquee platform services described, nor an offer to sell, or the solicitation of an offer to buy, any security. Some of the services and products described herein may not be available in certain jurisdictions or to certain types of clients. Please contact your Goldman Sachs sales representative with any questions. Any data or market information presented on the site is solely for illustrative purposes. There is no representation that any transaction can or could have been effected on such terms or at such prices. Please see https://www.goldmansachs.com/disclaimer/sec-div-disclaimers-for-electronic-comms.html for additional information.
Transaction Banking services are offered by Goldman Sachs Bank USA (“GS Bank”). GS Bank is a New York State chartered bank, a member of the Federal Reserve System and a Member FDIC.
GS DAP™ is owned and operated by Goldman Sachs. This site is for informational purposes only and does not constitute an offer to provide, or the solicitation of an offer to provide access to or use of GS DAP™. Any subsequent commitment by Goldman Sachs to provide access to and / or use of GS DAP™ would be subject to various conditions, including, amongst others, (i) satisfactory determination and legal review of the structure of any potential product or activity, (ii) receipt of all internal and external approvals (including potentially regulatory approvals); (iii) execution of any relevant documentation in a form satisfactory to Goldman Sachs; and (iv) completion of any relevant system / technology / platform build or adaptation required or desired to support the structure of any potential product or activity.
Mosaic is a service mark of Goldman Sachs & Co. LLC. This service is made available in the United States by Goldman Sachs & Co. LLC and outside of the United States by Goldman Sachs International, or its local affiliates in accordance with applicable law and regulations. Goldman Sachs International and Goldman Sachs & Co. LLC are the distributors of the Goldman Sachs Funds. Depending upon the jurisdiction in which you are located, transactions in non-Goldman Sachs money market funds are affected by either Goldman Sachs & Co. LLC, a member of FINRA, SIPC and NYSE, or Goldman Sachs International. For additional information contact your Goldman Sachs representative. Goldman Sachs & Co. LLC, Goldman Sachs International, Goldman Sachs Liquidity Solutions, Goldman Sachs Asset Management, L.P., and the Goldman Sachs funds available through Goldman Sachs Liquidity Solutions and other affiliated entities, are under the common control of the Goldman Sachs Group, Inc.
© 2024 Goldman Sachs. All rights reserved.