July 28, 2022

Legend + Snowflake Native Apps = Fast, Easy, Secure Access to Data

Neema Raphael, Chief Data Officer; Abhishek Narang, Managing Director, Data Engineering

Last month Goldman Sachs' Chief Data Officer, Neema Raphael gave a keynote presentation at the Snowflake Summit Conference.  Neema's talk focused on combining Goldman Sachs Financial Cloud for Data with our open sourced data platform Legend powering a Snowflake application to generate transformational business insights for our clients, business partners and engineers. What used to be a painful, multi-week process requiring support from engineering has become a self-serve, intuitive experience that takes a couple of days. Further, the application offers the best of all worlds: research velocity, performance, and governance.  The Legend data platform uses Snowflake native app functionality to provide the governance and benefits of API style application development with the native performance of database joins and predicate pushdown.  Watch the replay of the talk from the Snowflake Summit

How engineers at Goldman Sachs and Snowflake partnered to unlock native Snowflake performance on top of Legend APIs

Goldman Sachs's data strategy is tied directly to Legend and the firm's Financial Cloud for Data. Our latest offering gives our clients access to curated GS data and provides a client-owned AWS runtime to power the behind-the-scenes data movement. We are also users of Snowflake for relational data warehousing and data engineering. 

The idea to combine our capabilities was born from a simple conversation that happened in April 2022... Abhishek relates how the idea came about

Running our vendor data engineering team I have personally seen the time and energy required to set up teams with access to new datasets. It is a frustrating process - for both business users and engineers; 
At a recent Snowflake event, I learned about Native Apps, a marketplace of new capabilities they built on top of secure data sharing. Following the event, my manager stopped by my desk to learn more about these features and brainstormed how Snowflake Native Apps could potentially solve a critical and long-standing issue for our users
This conversation quickly led to an “ah ha!” moment: if we can combine Legend with the Snowflake Native App capabilities, we can unlock the full performance of Snowflake while still providing logical separation of client, vendor, and Goldman Sachs data. 

This new idea presented a great opportunity to work with internal researchers to understand the impact we could achieve. And the impact really was phenomenal - our research partners could setup new datasets in days instead of months. They could do this in a fully self-serve model while adhering to data governance standards. And our engineering team didn't have to support our business teams every single time they wanted new datasets or insights. A win-win for everyone!

But our ambitions didn't end there. Once we unlocked the foundational set of capabilities from this new solution, we wanted to push the boundaries of this innovation. The next step was for us to identify a real-world use case that could scale this not just internally at GS, but to external clients of the firm. We decided to leverage data from the newly launched Goldman Sachs Financial Cloud for Data to make the experience of sharing insights with our clients even more seamless. Adding more functionality within our existing application suite is a powerful value proposition for our clients.  We are excited to go-to-market with this service once Snowflake Native Apps are publicly available (expected end of this year).

Our problem statement:

A researcher at an institutional investor client who wants to access new datasets from various external sources has to work with a data engineer to complete a multi-step time consuming process. They need to spend a significant amount of time figuring out how their internal data stitches together with third party data across multiple ecosystems. It’s also hard on the engineering side to understand the intent of Quants/Researchers as they run into complex extract, transform and load (ETL) problems with multi hop operations yielding lack of ownership and performance issues. After this is all in production, the researcher still does not get lineage.

Our mission:

Allow business users to pull together all the data in need using an intuitive, self-service experience that is highly performant, secure and scalable.

A flow diagram as explained in the problem statement above the diagram showing the before of the multi-step data engineering process.
A flow diagram as explained in the problem statement above the diagram showing the before of the multi-step data engineering process.

Our solution:

Our underlying technology is based on Legend, GS’s open source contribution to FINOS. Legend is a single platform for data model driven and insights generation. It is platform agnostic and can transpile model queries to SQL with full predicate pushdown.  We have contributed a Snowflake native app that has been supercharged with capabilities from Legend (e.g., by transforming APIs to SQL, ensuring native database performance and GS specific data models are still being enforced). This gives users a simple process to get the data they need to generate insights.

  1. Get access rights to the new data in the Goldman Sachs Financial Cloud for Data and to the new Legend native app
  2. Download the Legend based native app from the snowflake marketplace and join their new datasets
  3. Upload the new data to the Goldman Sachs Financial Cloud for Data – their preferred solution for plotting graphs and generating insights
GS Legend Native App Architecture Diagram explains the flow between the client's Snowflake instance, the GS Financial Cloud and Goldman Sachs Instance of Snowflake.
GS Legend Native App Architecture Diagram explains the flow between the client's Snowflake instance, the GS Financial Cloud and Goldman Sachs Instance of Snowflake.

Here’s the impact:

This new app not only allows us to share relevant datasets with our clients but also using the same paradigm clients can share data internally with encapsulation and central integration / control point. Engineering teams no longer need to support business partners with the tedious work of creating and supporting custom onboarding data processes. They can instead focus on higher value tasks such as enabling connectivity across datasets through data models and easy tools to graduate from experimentation phase to production.

Internally, we are partnering with researchers in a business of Goldman Sachs that has been #1 ranked by our clients multiple years in a row. Our Vendor data acquisition engineering team can now bring their data discovery timeline from months to days. 

And we do this while making our technology more performant and maintaining our standards around data governance and security.

After: Client Journey map as described in the text above.
After: Client Journey map as described in the text above.

By constantly thinking about new and innovative ways to extract the best out of Goldman native technologies and our partners like Snowflake, we are on track to making data consumption, sharing and analysis lightning fast, highly accurate and enormously simple.

See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.

Certain solutions and Institutional Services described herein are provided via our Marquee platform. The Marquee platform is for institutional and professional clients only. This site is for informational purposes only and does not constitute an offer to provide the Marquee platform services described, nor an offer to sell, or the solicitation of an offer to buy, any security. Some of the services and products described herein may not be available in certain jurisdictions or to certain types of clients. Please contact your Goldman Sachs sales representative with any questions. Any data or market information presented on the site is solely for illustrative purposes. There is no representation that any transaction can or could have been effected on such terms or at such prices. Please see https://www.goldmansachs.com/disclaimer/sec-div-disclaimers-for-electronic-comms.html for additional information.
Transaction Banking services are offered by Goldman Sachs Bank USA (“GS Bank”). GS Bank is a New York State chartered bank, a member of the Federal Reserve System and a Member FDIC.
GS DAP™ is owned and operated by Goldman Sachs. This site is for informational purposes only and does not constitute an offer to provide, or the solicitation of an offer to provide access to or use of GS DAP™. Any subsequent commitment by Goldman Sachs to provide access to and / or use of GS DAP™ would be subject to various conditions, including, amongst others, (i) satisfactory determination and legal review of the structure of any potential product or activity, (ii) receipt of all internal and external approvals (including potentially regulatory approvals); (iii) execution of any relevant documentation in a form satisfactory to Goldman Sachs; and (iv) completion of any relevant system / technology / platform build or adaptation required or desired to support the structure of any potential product or activity.
Mosaic is a service mark of Goldman Sachs & Co. LLC. This service is made available in the United States by Goldman Sachs & Co. LLC and outside of the United States by Goldman Sachs International, or its local affiliates in accordance with applicable law and regulations. Goldman Sachs International and Goldman Sachs & Co. LLC are the distributors of the Goldman Sachs Funds. Depending upon the jurisdiction in which you are located, transactions in non-Goldman Sachs money market funds are affected by either Goldman Sachs & Co. LLC, a member of FINRA, SIPC and NYSE, or Goldman Sachs International. For additional information contact your Goldman Sachs representative. Goldman Sachs & Co. LLC, Goldman Sachs International, Goldman Sachs Liquidity Solutions, Goldman Sachs Asset Management, L.P., and the Goldman Sachs funds available through Goldman Sachs Liquidity Solutions and other affiliated entities, are under the common control of the Goldman Sachs Group, Inc.
© 2023 Goldman Sachs. All rights reserved.