What is Reverse ETL: A Definition & Why It's Taking Off
Hightouch's definitive guide to Reverse ETL goes in-depth into how Reverse ETL enables marketing and sales teams to provide better customer experiences.
By Tejas Manohar on
Today, the center of gravity for data has shifted towards data warehouses like Snowflake, Google BigQuery, Amazon Redshift, Azure Synapse, etc. Every major business has started to adopt them, but there’s a wrinkle — while the data gravity exists in a data warehouse, it's historically been really hard to get data out of the warehouse.
With this data silo, it means that everyone in your company has a limited view of your customer, hurting sales, and damaging the customer experience.
At Hightouch, we've seen hundreds of companies hit the barrier that the most important data is in their warehouse. In this post, you'll learn about the new cutting-edge data integration and data pipelining technology that is Reverse ETL.
After reading through this post, you will be armed with the knowledge of what reverse ETL is and how you can use it to grow your business.
The definition of Reverse ETL
Reverse ETL is the process of copying data from a central data warehouse to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales and support.
The definition above is packed…so let’s take a step back. Before defining “reverse ETL”, let’s briefly talk about plain old ETL.
ETL stands for extract, transform, load. Simply put, ETL is the process of copying data from a system of record into the data warehouse.
ETL is nothing new. The concept was actually popularized in the 1970s.
Reverse ETL is the exact opposite. It is the process of copying data from the data warehouse into systems of record across a company.
If terms like ETL are foreign to you, check out this guide we recently published on the wider data integration space, including ETL, ELT, and Reverse ETL amongst other things.
Reasons for moving data out of the warehouse
Why would I move data out of the warehouse? Didn’t I just dump all my data into a centralized location? Why would I want to move data from my warehouse anywhere? That seems backwards.
Reverse ETL is necessary because your data warehouse — the platform you bought to eliminate data silos — has ironically become a data silo. Without reverse ETL, your business’s core definitions only live in the warehouse.
Companies are building key definitions in SQL on top of the data warehouse. Think lifetime value, Product Qualified Lead (PQL) and Marketing Qualified Lead (MQL), propensity score, customer health, ARR/MRR, funnel stages, etc.
Sure, you can easily create reports and visualizations using this data in BI tools or SQL, but these insights are much more powerful if they drive the everyday operations of your teams across sales, marketing, finance, etc. in tools like Hubspot, Salesforce, Netsuite, SAP, Workday, Gainsight, Zendesk, etc.
Reverse ETL, therefore, has emerged as a key part of the modern data stack to close the loop between analysis and action or activation.
The need for reverse ETL is only being amplified by the rise of cloud data warehouses, dbt, and more broadly, “ELT”, which stands for extract, load, transform and is the movement to transform data once it’s in the warehouse.
If you’re interested in learning more about ELT and how it differs from ETL, check out this great article by our friends at Fivetran.
What are the use cases for reverse ETL?
So, you get it— reverse ETL is the process of copying data from the data warehouse into operational systems. What can you accomplish with reverse ETL?
There are 3 primary use cases for Reverse ETL
- Operational analytics — feeding insights from analytics to business teams in their usual workflow so they can make more data-informed decisions
- Data automation — not all data problems are so glamorous. “Can I get a CSV to issue some invoices?”, your finance team asks. Reverse ETL poses a simple solution.
- Data infrastructure — with a growing number of data sources, reverse ETL is emerging as a general-purpose pattern in software engineering.
Reverse ETL enables Operational Analytics
Operational analytics is the idea of feeding insights from analytics to business teams in their usual workflow so they can make more data-informed decisions.
Every company wants to be more data-driven. Yet, the most daunting question for every leader is “how”. Deriving insights from data is part one, but the last mile— “analytics enablement” or translating those insights into action— is a different ball game.
Analytics enablement is typically seen as a people problem. And, it definitely is in part, but how data is presented can play an equally large role.
Let’s take an example — at B2B companies, a common question is “which accounts should sales reps focus on?”. In response, a data analyst uses SQL to derive characteristics of high-value leads and presents the results in a BI report.
To the analyst’s dismay, the report is never used. In fact, the BI report is rarely opened by the sales team and the analyst doesn’t even receive feedback on it.
A traditional analytics enablement outlook to this problem would be to train the sales reps on how to leverage BI reports as part of their day-to-day workflow. In practice, this is tough. Data enablement is the reason most data projects fail.
Operational Analytics is a new solution to the problem, enabled by reverse ETL.
Instead of training reps to use BI reports, the data analyst can operationalize their analysis by feeding lead scores from the data warehouse into a custom field in Salesforce. With operational analytics, your reps are armed with insights from your analytics when and where they work to take action.
Reverse ETL solutions like Hightouch enables data teams and analysts to implement operational analytics just as easily as creating a Business Intelligence (BI) report.
Reverse ETL enables Data Automation
Operational Analytics is flashy, but in reality, companies are filled with lots of far less glamorous problems when it comes to data. In any sizable organization, there are tons of manual requests for data and CSVs floating around and with any manual process, there’s always the question of how to automate it.
Here are few common examples of everyday data requests from various teams:
- Sales wants the list of webinar attendees to import as leads into Salesforce
- Support wants to see on Zendesk the data about accounts with premium support
- Product wants a Slack feed of customers who have enabled a feature
- Accounting wants customers attributes to be synced to NetSuite
- Finance wants a CSV of rolled up transaction data to use in Excel or Google Sheets
I could go on but if any of these rings a bell, you can surely do the same.
The data needed is already available in the data warehouse and with reverse ETL, SQL is all you really need to extract data from the warehouse and sync it to external tools, making it the simplest solution.
Reverse ETL is a core piece of data infrastructure
Finally, reverse ETL is emerging as a general-purpose pattern in data infrastructure and software engineering. I’ve seen 2 key use cases for this:
- Personalization of Customer Experiences
- Accessing disparate data sources
Reverse ETL: A solution to personalize customer experiences
The most obvious use case for reverse ETL in software engineering is operationalizing the results of analytics and data science to build personalized customer experiences.
Let’s take an E-commerce example.
- Available Data: Your data science team calculates a score on top of the data warehouse or data lake describing the user’s likelihood of buying a product.
- Data Ask: Your product growth team wants to drive more purchases by offering discounts to users who are deemed unlikely to make a purchase.
- Easy Solution: Since your data warehouse is too slow to serve user-facing experiences, your engineers, using Reverse ETL, can move the propensity score to a production database to serve customers personalized in-app experiences in real time.
Reverse ETL: A solution to accessing disparate data sources
Today, customer data is spread across dozens — if not hundreds — of disparate systems. Sometimes, your application needs to access information from disparate data sources. Reverse ETL poses a novel solution to this.
Let’s take an example of a B2B SaaS company with enterprise customers on contract.
- Available Data: When a new enterprise is onboarded, your sales deal desk records their credit allotment in Salesforce.
- Data Ask: Your customers keep asking to see their credit allotment inside of your web app, but your developers don’t want to integrate Salesforce. However, chances are that Salesforce data is already in your data warehouse (via an ETL tool like Fivetran).
- Easy Solution: With reverse ETL, you can sync relevant Salesforce information from your warehouse into the production database that powers your app. Your app can then show your customers their billing information just like any other user data.
Using reverse ETL, you don’t have to redo the complex work of integrating with the APIs of tools like Salesforce, and you can reuse the business definitions you’ve already created in SQL.
This isn’t specific to Salesforce though. At one of my past jobs, our internal admin dashboard was powered by the reverse ETL of data from a dozen data sources — Salesforce, Zendesk, product usage data, and a number of microservices — into a MySQL database.
The possibilities are endless but the takeaway is that reverse ETL is becoming a core part of the software engineering toolkit and isn’t limited to “data projects”. We’ll talk more about this in a future post.
How do I implement Reverse ETL? Build vs Buy
If you’ve ever bought enterprise software, you are probably familiar with this drill. There always are pros and cons to both buying a purpose-built solution and building one in-house.
If you’re leaning toward the DIY camp, you can build an in-house Reverse ETL pipeline if you have spare data engineering resources (if you do, you are one of the few). In fact, we spoke to the Engineering Team Lead at Datadog, Romoli Bakshi, who was kind enough to walk us through her process of doing this in-house — here’s the full conversation.
However, if you prefer investing in best-in-class tools and want to have a Reverse ETL solution up and running in a matter of minutes, give Hightouch a go.
Here are a few things to know about Hightouch:
- It’s just SQL. No more scripts, APIs, etc. Just paste a SQL query or select a table and use our visual mapper to specify how data should appear in tools like Salesforce.
- Robust integrations catalog. Hightouch has a growing catalog of 40+ battle-tested integrations for moving data into SaaS tools or even internal systems (webhooks, databases, etc.). If we’re missing something, email email@example.com or chat with us.
- Better than building in-house. Hightouch is loved by engineers as much as business users. Our platform has a live debugger, request/response logs, bidirectional git sync, and out-of-the-box monitoring via tools like Slack, PagerDuty, DataDog, etc.
Reverse ETL is a brand new category in the data space and like any hot category (looking at you CDP/Customer Data Platform), many companies will try and ride this wave. Now, this is alright, but it’ll also make it harder for businesses to zero in on the right solution that can solve their data problems in an elegant manner.
If you or a colleague of yours is facing data challenges, irrespective of the nature of those challenges, I’d love to chat and offer the right solution.