July 28, 2022

Legend + Snowflake Native Apps = Fast, Easy, Secure Access to Data

Neema Raphael, Chief Data Officer; Abhishek Narang, Managing Director, Data Engineering

Last month Goldman Sachs' Chief Data Officer, Neema Raphael gave a keynote presentation at the Snowflake Summit Conference.  Neema's talk focused on combining Goldman Sachs Financial Cloud for Data with our open sourced data platform Legend powering a Snowflake application to generate transformational business insights for our clients, business partners and engineers. What used to be a painful, multi-week process requiring support from engineering has become a self-serve, intuitive experience that takes a couple of days. Further, the application offers the best of all worlds: research velocity, performance, and governance.  The Legend data platform uses Snowflake native app functionality to provide the governance and benefits of API style application development with the native performance of database joins and predicate pushdown.  Watch the replay of the talk from the Snowflake Summit


How engineers at Goldman Sachs and Snowflake partnered to unlock native Snowflake performance on top of Legend APIs

Goldman Sachs's data strategy is tied directly to Legend and the firm's Financial Cloud for Data. Our latest offering gives our clients access to curated GS data and provides a client-owned AWS runtime to power the behind-the-scenes data movement. We are also users of Snowflake for relational data warehousing and data engineering. 

The idea to combine our capabilities was born from a simple conversation that happened in April 2022... Abhishek relates how the idea came about

Running our vendor data engineering team I have personally seen the time and energy required to set up teams with access to new datasets. It is a frustrating process - for both business users and engineers; 
At a recent Snowflake event, I learned about Native Apps, a marketplace of new capabilities they built on top of secure data sharing. Following the event, my manager stopped by my desk to learn more about these features and brainstormed how Snowflake Native Apps could potentially solve a critical and long-standing issue for our users
This conversation quickly led to an “ah ha!” moment: if we can combine Legend with the Snowflake Native App capabilities, we can unlock the full performance of Snowflake while still providing logical separation of client, vendor, and Goldman Sachs data. 

This new idea presented a great opportunity to work with internal researchers to understand the impact we could achieve. And the impact really was phenomenal - our research partners could setup new datasets in days instead of months. They could do this in a fully self-serve model while adhering to data governance standards. And our engineering team didn't have to support our business teams every single time they wanted new datasets or insights. A win-win for everyone!

But our ambitions didn't end there. Once we unlocked the foundational set of capabilities from this new solution, we wanted to push the boundaries of this innovation. The next step was for us to identify a real-world use case that could scale this not just internally at GS, but to external clients of the firm. We decided to leverage data from the newly launched Goldman Sachs Financial Cloud for Data to make the experience of sharing insights with our clients even more seamless. Adding more functionality within our existing application suite is a powerful value proposition for our clients.  We are excited to go-to-market with this service once Snowflake Native Apps are publicly available (expected end of this year).

Our problem statement:

A researcher at an institutional investor client who wants to access new datasets from various external sources has to work with a data engineer to complete a multi-step time consuming process. They need to spend a significant amount of time figuring out how their internal data stitches together with third party data across multiple ecosystems. It’s also hard on the engineering side to understand the intent of Quants/Researchers as they run into complex extract, transform and load (ETL) problems with multi hop operations yielding lack of ownership and performance issues. After this is all in production, the researcher still does not get lineage.

Our mission:

Allow business users to pull together all the data in need using an intuitive, self-service experience that is highly performant, secure and scalable.

A flow diagram as explained in the problem statement above the diagram showing the before of the multi-step data engineering process.
A flow diagram as explained in the problem statement above the diagram showing the before of the multi-step data engineering process.

Our solution:

Our underlying technology is based on Legend, GS’s open source contribution to FINOS. Legend is a single platform for data model driven and insights generation. It is platform agnostic and can transpile model queries to SQL with full predicate pushdown.  We have contributed a Snowflake native app that has been supercharged with capabilities from Legend (e.g., by transforming APIs to SQL, ensuring native database performance and GS specific data models are still being enforced). This gives users a simple process to get the data they need to generate insights.

  1. Get access rights to the new data in the Goldman Sachs Financial Cloud for Data and to the new Legend native app
  2. Download the Legend based native app from the snowflake marketplace and join their new datasets
  3. Upload the new data to the Goldman Sachs Financial Cloud for Data – their preferred solution for plotting graphs and generating insights
GS Legend Native App Architecture Diagram explains the flow between the client's Snowflake instance, the GS Financial Cloud and Goldman Sachs Instance of Snowflake.
GS Legend Native App Architecture Diagram explains the flow between the client's Snowflake instance, the GS Financial Cloud and Goldman Sachs Instance of Snowflake.

Here’s the impact:

This new app not only allows us to share relevant datasets with our clients but also using the same paradigm clients can share data internally with encapsulation and central integration / control point. Engineering teams no longer need to support business partners with the tedious work of creating and supporting custom onboarding data processes. They can instead focus on higher value tasks such as enabling connectivity across datasets through data models and easy tools to graduate from experimentation phase to production.

Internally, we are partnering with researchers in a business of Goldman Sachs that has been #1 ranked by our clients multiple years in a row. Our Vendor data acquisition engineering team can now bring their data discovery timeline from months to days. 

And we do this while making our technology more performant and maintaining our standards around data governance and security.

After: Client Journey map as described in the text above.
After: Client Journey map as described in the text above.

By constantly thinking about new and innovative ways to extract the best out of Goldman native technologies and our partners like Snowflake, we are on track to making data consumption, sharing and analysis lightning fast, highly accurate and enormously simple.


See https://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest, and other terms and conditions relating to this blog and your reliance on information contained in it.