Let’s start off with a hot take…
Customer Data Platforms (CDPs) as we know them are fundamentally broken. In the next few years, every CDP vendor (Segment, mParticle, Treasure Data, etc.) will either pivot their offering to an unbundled, warehouse-first offering or completely lose relevance in the market.
Yes, we’re aware that’s quite a bold claim. Industry analysts are predicting rapid growth for the CDP category with estimates of the market size growing 5x and passing $15B in the next 5 years, and we’re saying that it’s going to be disrupted by what–warehouse-first offerings?
Yes. Instead of your customer data architecture looking like the following…
We’re predicting that most companies’ CDPs will be rebuilt on top of the data warehouse and look like this…
What is a Customer Data Platform (CDP)?
A Customer Data Platform (CDP) is an all-in-one data platform built for marketing teams. They serve as a database for all your customer information with a bundled activation layer to help you leverage your data for marketing automation.
All CDPs have at least 3 core components:
- Data Collection: Since CDPs are a database of customer data, they need to give you a way to send data to them. To solve this, CDPs expose an API for developers to track user traits and events that users are taking across applications.
- Data Transformation: All CDPs usually have an out-of-the-box identity stitching functionality as well as tools to create custom traits on user profiles.
- Data Activation: CDPs are useless unless they help you act on your data so they have integrations to sync enriched profiles and audience segments to marketing channels.
All of these features are bundled together in a tightly integrated all-in-one solution.
What is an Unbundled CDP?
Even if you’re not familiar with CDPs, the idea of having all your customer data collected, cleaned, and available to take action via a unified off-the-shelf platform sounds like the holy grail.
As always, the devil is in the details.
Implementing a CDP can be quite challenging, especially at an enterprise level. Because CDPs revolve around their own customer database, your product engineering team must implement data collection by tracking user traits and events across your various websites, backend services, and apps via your CDP’s APIs and SDKs. This can often take 3-6 months to start, but is not a one-time project– it’s a continuous investment as you build new product features, data sources, and campaigns.
On top of that, CDPs cannot serve as your single source of truth. Their “cookie-cutter” data models and reliance on tracking standard user events prevent you from properly representing their business-specific data models like products, groups, coupons, artists, etc. Finally, they’re often missing key data from outside of your applications–for example, SaaS tools like Salesforce, point-of-sale (POS) systems, and data science models that are often only in the data warehouse.
Introducing the unbundled CDP!
Building one source of truth is hard, so why build two? Over the last decade, every company has been investing in building out its data analytics and business intelligence practice. In almost every case, data is collected from various sources and moved to a central data warehouse like Snowflake to build clean data “models” or “tables” and answer important business questions via BI tools like Looker or Tableau.
Whether or not organizations call it an “unbundled CDP”, fast-growing companies and large enterprises alike have all started practicing data activation. Instead of just analyzing the data in their data warehouse, companies are turning their data warehouse into a CDP and using the data within to power operational business processes like personalized marketing campaigns in tools like Salesforce, Braze, and Facebook Ads via Reverse ETL.
The Evolution of CDPs and Data Warehouses
Why weren’t CDPs always built on top of the data warehouse? Why did CDPs build their own source of truth? It’s easy to discard CDPs by saying they’re all-in-one solutions and not “best of breed”, but if you pull back the curtain, there’s more to the story.
Prior to founding Hightouch, I was an early engineer at Segment, the leading CDP that was acquired by Twilio for $3.2 billion. When I joined Segment in 2016, Snowflake had 100 customers and the “modern data stack” was unheard of.
While turning your data warehouse into a Customer Data Platform was a valid solution technically, it wouldn’t have made sense for most of our customers back when I was at Segment. Data warehouse adoption outside of the largest companies was extremely low, and even at the enterprise level, popular technologies were not easy to operate.
Simply put, CDPs like Segment built their own source of truth because there was no other source of truth available. CDPs did use data warehouse technologies–just “under the hood” because customers didn’t have their own warehouse ready to activate their data.
Fast forward four years to 2020, Snowflake had the largest software IPO of all time. In 2022 any business not leveraging a cloud data warehouse is rapidly falling behind. Building a data analytics platform is a top priority for virtually all executives, whether that’s a CIO, CTO, or CMO.
Towards the end of my time at Segment, it became clear that the way customers viewed data warehouses and BI had totally changed. The warehouse was no longer just an advanced analytics tool you reached for when you couldn’t get an answer in Amplitude or Mixpanel–it was the source of truth across your business. Businesses were no longer just dumping data into the warehouse. With technologies like dbt and broader trends like ELT, the data warehouse became the place where definitions lived (e.g. high-value customer, LTV, churn risk, etc. ).
Data mature companies no longer need the all-in-one ingestion and unification components of a CDP. Instead, they just need tools like Hightouch for Reverse ETL to activate the data in their warehouse.
Reverse ETL is the process of copying data from a central data warehouse to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.
The data warehouse may not be perfect, but it’s become the best data source for most companies and this trend is only on the rise. This all goes to say that the approach taken by CDPs of establishing a source of truth wasn’t incorrect–in fact, it was essential. But after the rise of cloud data warehouses like Snowflake, it no longer makes sense because the data warehouse had become the “Customer Data Platform”.
How Do I Build an Unbundled CDP?
So, what does it take to roll out an unbundled CDP? If you can’t use an all-in-one platform, does that mean we’re back to building everything in-house? Fortunately not! The rise of cloud data warehouses has created a thriving ecosystem of tools around the data warehouse to help with every task imaginable.
- Data Collection:
- Snowplow: Generate, enhance, and model rich, quality behavioral data across all platforms and channels in a common format and stream it into your data warehouse or lake.
- Fivetran: Replicate data from your SaaS tools and databases across marketing, sales, finance/IT, product, etc. into your data warehouse.
- Data Transformation:
- Once all your raw data has landed in your data warehouse, you can use SQL to clean up and transform the data into clean tables/views.
- dbt is the emerging standard in the space. Most enterprises also have some sort of preexisting in-house process for this, often orchestrated via Airflow.
- Data Activation:
- Hightouch - Sync data from your data warehouse into the tools that your business teams are already using, e.g. Salesforce, Marketo, Facebook Ads, etc. via Reverse ETL.
If that sounds like a lot of tools, don’t fret. You don’t need to buy a bunch of tools at once. When you buy an all-in-one CDP, you're forced to evaluate everything upfront because you're locking yourself into a single platform. As the name indicates, the unbundled CDP is the exact opposite.
Rather than make a large upfront (and risky) commitment to an all-in-one vendor, the unbundled CDP allows you to solve the most important problem in front of you, incrementally. This allows you to choose the best solution and components for your business. You’re able to educate yourself throughout the process, and also future-proof and swap out certain components down the line when your needs change, or when a specific tool just isn’t “cutting it.”
Since all companies are already investing in their data warehouse for analytics, you can start activating your data in hours or days incrementally with an unbundled CDP instead of months or quarters with a typical off-the-shelf CDP.
How Will the Current CDP Landscape Change?
Given that the gravity of data has shifted to the warehouse, how will the existing CDP landscape change? What will existing CDP vendors do?
The first thing we’ll see is CDPs starting to use the term “Reverse ETL.” Some vendors will start claiming Reverse ETL capabilities with little to no product tweaks (see Lytics). Some will be stuck in their own ways and retaliate (see mParticle’s post on “data chaos”). Some will act like Reverse ETL is unrelated to Customer Data Platforms and non-competitive (see Simon Data).
Ultimately, customer needs will trump everything else. We fully expect all CDPs to eventually come around and embrace the data warehouse as their source of truth, further validating our approach from the beginning.
As CDPs start adapting their product to be more warehouse-first, we’ll see them adding pointed features to tap into data points only available in your data warehouse, but overall, we expect these offerings to be half-baked. Building and maintaining two sources of truth–a data warehouse and a CDP– inevitably leads to ongoing data quality challenges, increased customer costs, and security/privacy concerns.
For customers to reap all the benefits of the data warehouse, the product needs to be rethought from the ground up as we’ve done at Hightouch. This new foundation has created an elegant product architecture that allows us to focus on what matters most–helping companies activate their data and iterate faster than the rest of the CDP ecosystem. For example, Hightouch shipped multi-region processing in under a month compared to the multi-year initiatives it took for CDPs.
If you’re interested in learning more, register for our upcoming session with Fivetran and Snowplow on the Unbundled CDP here, where I’ll talk more about the intersecting future of Customer Data Platforms and data warehouses and how you can get started.