What is a CDP (Customer Data Platform)?
CDPs have risen up as one of the best solutions to tackle the challenge of data accessibility. Strictly speaking, CDPs collect and consolidate data from various sources and send that information to different target destinations (i.e. marketing tools and sales tools). The purpose of a CDP is to aggregate the information from various data sources and combine it together to create a single 360-degree view of the customer. In addition to this, they also provide an additional activation layer to enable marketing automation. This is because CDPs were created to analyze user behavior and personalize their experiences. Every company has data, so CDPs are useful for both B2C companies and B2B companies.
Conventionally, it has been a huge challenge for marketers to gain access to data because all of the useful information is either stuck in disparate data sources and tools. CDPs solve this problem by supplying the marketing team with a relatively easy-to-use platform that requires little or no input from the data or engineering team.
What is Segment?
Segment is one of the most popular CDPs. In fact, in 2020, Segment did over $144 million in revenue and was recently acquired by Twilio for $3.2 billion. At its core, Segment is a SaaS offering that helps businesses collect and leverage data from digital properties like websites, apps, SaaS tools, etc. Simply stated, Segment is an event tracking platform that is aimed towards app developers and SMB/mid-sized companies. Segment simplifies the data collection process and gives users the ability to spend more time leveraging their data to create personalized experiences and relevant content for customers.
What does Segment do?
Segment was originally created to solve the challenge of collecting and moving event data. In its simplest form, Segment helps fire user events that are captured in your product and sync that data to a variety of SaaS tools in addition to data warehouses. It generates messages about what is happening in an app or website and then translates the information in those messages into a format that is understandable by other tools. Segment provides an API library that can run as code on a website, app, or server to generate messages based on specific triggers defined by the user. This code can be as simple as copying and pasting a snippet into the HTML of a website to track page views, or it can be embedded within an app to send messages when a user performs a specific action like opening or closing an app or abandoning a cart after a set amount of time. Once these messages have been generated, they can be sent directly to Segment servers to be translated or forwarded to specific destinations.
Who uses Segment?
Segment has two core audiences, marketing teams, and engineering teams. Segment appeals to marketers because it gives them an easy way to collect and merge different data sets together to create various customer profiles, enrich audiences, and activate campaigns across various tools. On the other hand, engineering teams are drawn to Segment because they don’t have to spend time writing their own event tracking library and writing integrations to all of their SaaS tools since all of this is supplied through Segment’s API library. This means engineers can focus their efforts on the high-priority tasks which have the most impact on a company’s bottom line. Best of all, marketers don’t have to go through the data or engineering team every time they want to ask a question or gain access to a specific data set because all of this is provided through Segment.
What is Segment Warehouses?
Segment’s Warehouse feature gives users the ability to send information natively to various data warehouses (i.e. Snowflake, Amazon Redshift, Azure Synapse, Google BigQuery, etc.). This is super useful since the warehouse is typically the final resting point for data and acts ast the analytics platform in most organizations.
What is Segment Personas?
Segment Personas is a visual audience builder for marketers that gives businesses the ability to enrich customer profiles with new traits. Segment Personas takes event data across multiple devices and channels and intelligently merges it together using identity resolution to create a single view of the customer. Segment defines an audience as either a list of users or accounts that match specific criteria. An example of this could be users who abandoned their shopping cart at X amount of time and purchased an item in the last seven days. These audiences are basically customized segments. Marketers can define these segments in a point-and-click UI without needing to know SQL.
What is Segment Functions?
With Segment Functions, users can do basic transformations on events and send them to external tools and various APIs without having to set up or maintain any infrastructure. However, the transformation functionality is very limited and not nearly as strong as native languages like SQL or dedicated transformation tools like dbt.
What are the alternatives to Segment?
mParticle is a Segment alternative. However, whereas Segment is tailored to SMB/mid-sized companies, mParticle is built for enterprise-sized companies. Instead of falling into the CDP category, mParticle brands itself as “Customer Data Infrastructure” or (CDI). At its core, CDI focuses on data integration, data governance, and audience management. Since mParticle is tailored towards enterprise companies, it places a higher focus on providing support. In fact, all mParticle customers are assigned a dedicated customer success rep on day one. Additionally, mParticle was actually one of the very first companies in the CDP space to offer professional services and release an audience-building product. To be specific, Segment’s Personas product was a direct result of mParticle’s Audiences product.
Another core difference between the two lies in the fact that mParticle offers more robust capabilities around mobile event tracking (i.e. apps) and data integration. Since Segment strictly tailors towards SMB, it really only focuses on web tracking. All in all, mParticle tends to offer more robust capabilities than Segment. Both solutions are tailored towards developers and have a substantial implementation/setup time before marketing teams can begin leveraging either tool to the fullest extent.
Tealium is a Segment alternative that places a higher emphasis on marketers instead of developers. Tealium is a CDP solution that focuses on enterprise-sized companies. Before becoming a fully-fledged CDP, Tealium fell into the category of “enterprise tag management” (i.e. a competitor to Google Tag Manager, which is a free service that gives users the ability to implement marketing tags or snippets of code for tracking purposes on their website). On the other hand, Tealium’s flagship product, “Tealium IQ”, offers more flexibility because it is not a native Google service like Google Tag Manager. This means it integrates with a variety of different platforms.
Aside from offering the typical capabilities of a CDP, Tealium is HIPAA compliant and will sign on BAAs or business associate agreements (a contract that outlines each party’s individual responsibilities for protected health information). The same cannot be said for either Segment or mParticle. Tealium’s main selling point is focused on privacy and security. This is why so many healthcare and financial services companies are drawn to it. This removes some of the 3rd party risks and other risk factors that can affect underlying business goals.
Lytics is an alternative to Segment that is very focused on empowering marketers rather than engineers and developers. Due to this reason, the implementation time for Lytics tends to be substantially longer than players like Segment, mParticle, Tealium, etc. However, as an upside, Lytics has an extremely intuitive UI that is tailored towards marketers rather than developers which makes it extremely streamlined and easy to use. Lytics has much more detailed and predictive machine learning capabilities compared to the other platforms. In fact, Lytics' Machine Learning API provides a framework to create custom ML models directly within the platform. These models are self-training and continuously update in real-time. All audiences created in Lytics are also updated in real-time with no user input.
RudderStack is slightly different from the previous alternatives in that it is a fully open-source CDP platform tailored towards developers. RudderStack's core product functionality enables developers to deploy data pipelines and collect customer data from various apps, websites, and platforms to auto-track events. This information can then be activated in the data warehouse. Although RudderStack claims to be open-source, most of the features like cloud connect, ETL/ELT, reverse ETL, etc. are locked behind the paid offering. RudderStack has a couple of upsides though. Being an open-source platform, RudderStack is the only CDP that can run entirely on-premise. Additionally, Rudderstack does not own any of the data it hosts because everything is kept within an organization’s proprietary technology stack. In most cases, companies choose RudderStack when platforms like Segment get too expensive due to the MTU (monthly tracked user) pricing model in addition to the data ownership aspect.
SimonData is an email service platform combined within a CDP. It is very similar to solutions like Braze, Iterable, Salesforce Marketing Cloud, Marketo, etc. Most CDPs capture data from various sources to create audiences and then push that information back into operational platforms so that marketers can use it to launch campaigns. However, SimonData claims to do all of this in one. It connects natively to data warehouses, but it moves the data out of the warehouse which can be very bad for compliance. SimonData also locks users into a simple user/event data model rather than supporting all types of data within the warehouse, like products, groups, flights, trips, purchases, etc. SimonData also creates another challenge in that it doesn’t support notifications efficiently. The needs of marketers are evolving extremely fast. This is one of the many reasons that companies are choosing to keep marketing platforms and data platforms separate and leverage dedicated solutions like Iterable or Braze on top of a CDP.
ActionIQ focuses on helping companies achieve a full “digital transformation”. ActionIQ is very different from other CDPs because it leverages a database and adds a CDP as an additional layer on top. To be specific, ActionIQ helps companies assemble disparate data sources together into their own unique ActionIQ database and enables users to leverage this data through a conventional CDP. This solution tends to be very professional services heavy and getting data into the platform can be extremely challenging. It often takes up to a year to implement. Similarly, ActionIQ’s entire data model is focused solely on contacts and fields, so companies have little ability to leverage the data models that impact their business the most. It is really tailored towards businesses that have already made a significant investment into a specific technology stack and are simply looking for additional tools and data access.
Amperity’s core customers tend to be large retail or traditional brick-and-mortar businesses with extremely disparate data sources. Amperity is a CDP platform that is highly specialized in identity resolution. It has “state of the art” machine learning technologies whereas most CDPs use simple “deterministic” identity resolution logic (e.g. static value equality on a graph, or if “email = email”, then this is the same user). Like most CDPs, Amperity does have some of the typical marketing activation capabilities that other CDPs offer. However, these are more limited. At its core, Amperity is extremely efficient at identifying and predicting customer behaviors which is a super useful trait for any company.
What are the issues with off-the-shelf CDPs?
All CDPs tend to have several similar problems. Firstly, CDPs are not a single source of truth. With the rise of cloud data warehouses (i.e. Snowflake, Redshift, BigQuery, Synapse, etc.), data warehouses now contain all customer data because companies are already using them for reporting and modeling. CDPs only have the data that is ingested into it.
Secondly, CDPs create rifts in organizations because they were solely created for marketers. This discourages collaboration between marketing and data teams. Everyone within an organization needs to be working towards the same underlying goals even if they are on different teams. Additionally, all CDPs are built on proprietary systems that don’t always pair well with other technologies. As an example, if a transformation capability doesn’t exist, users are stuck filing a support ticket.
This actually happens quite frequently because conventional CDPs do not typically have the ability to do a ton of robust transformations on the data that is stored within them. Likewise, if an assortment of bad events is loaded into a CDP, users are limited to the features built-in to the CDP to clean that data set. Similarly, as a point-to-point tool that moves data back and forth between different systems, CDPs create silos because they cannot leverage any existing technologies or tools that may already exist in an organization.
Additionally, since CDPs were created solely for marketers, the data models that they provide are not flexible. In fact, they often force organizations to “shoe-horn” their data into a strict model that makes no logical sense for the business. Lastly, CDPs store all of a company’s data which has privacy and security concerns. Each organization should own its own data so that it is not subject to the whims of a particular vendor.
What are the issues with Segment?
Segment does a decent job at moving data from point A to B. However, it has a couple of problems. Firstly, data that is pushed through Segment is never really transformed to create a proper 360 view of your customer (ex: combining billing and product data); it also cannot be combined with SQL. Segment claims to unify customers across all paths and channels to enable personalized campaigns, but these campaigns can only be so useful if all of the information that is being pushed to various marketing platforms is still in its raw state.
Additionally, Segment’s data model is limited to two objects, users and accounts; and in most cases, a user can only belong to a single account. This is problematic because every business has a unique model. For example, a company like Spotify collects information on users and accounts, but it also tracks other concepts like artists and genres which are typically treated as separate tables. However, the core problem with Segment is that it’s trying to take the place of a conventional iPaaS or ETL (extract, transform, load) tools and handle the entire end-to-end process of data integration.
Additionally, with Twilio’s acquisition of Segment, it is safe to assume that there will be some bias in the tools that are recommended. After all, Twilio is focused entirely on contacting customers, and Segment is focused solely on managing the data about them. Segment does a good job of collecting and transferring event data. However, acquiring, ingesting, and transforming data from SaaS tools is another story. Using Segment is like renting a data pipeline and most organizations want to control their technology stack from top to bottom. Proprietary data should be a competitive advantage and not a liability.
Why your warehouse should be your CDP
The main difference between a CDP and a data warehouse lies in the fact that CDPs only store customer data whereas data warehouses act as a repository for all the data across the entire organization - not just customer data. CDPs strictly focus on enriching data for marketing purposes, data warehouses can run a variety of different workloads for analytics purposes. Most organizations have standardized the data warehouse as the single source of truth because your CDP only has a subset of data whereas the warehouse has all of it. This is actually the number one reason why the data warehouse should be the CDP.
Since all of the data is often already in the data warehouse, the logical choice is to simply just use it as a CDP. A modern data stack should consist of an end-to-end flow from data acquisition, collection, and transformation. In most cases, the easiest way to enable this goal is by leveraging tools that are purposely designed to handle a single task. Fivetran, Snowflake, and dbt are great examples of this. In fact, this is the core technology stack that every data-driven company is adopting. Fivetran handles the entire data integration aspect providing a simple SaaS solution that helps businesses quickly move data out of their SaaS tools and into their data warehouse. Snowflake provides an easy way for organizations to consolidate their data into one location for analytics purposes. Lastly, dbt provides a simple transformation tool that is SQL-based, enabling users to create data models that can be reused. These three solutions combined create an effective data management platform.
However, there is a slight problem with this technology stack. Fivetran currently does not provide any way to collect and transfer event data (i.e. user actions). It also does not provide a way to move data out of the warehouse and back into the operational systems. This creates a problem because this is the main use case that Segment solves.
How to use your data warehouse as your CDP
If Segment’s main advantage was solely the fact that it is able to collect event data and move that information to various SaaS platforms, this advantage is now gone thanks to Hightouch and Snowplow.
Hightouch is a reverse ETL tool that provides a seamless integration for companies to sync data from the data warehouse to various operational systems like Marketo, Salesforce, Iterable, Hubspot, etc.
“Reverse ETL is the process of copying data from a cloud data warehouse to the operational systems of record, including but not limited to Saas tools used for growth, marketing, sales, and support.”
No information is stored within Hightouch either, all of it is kept in the data warehouse. Even better, companies can define custom objects (unlike Segment which just offers users and accounts) like workspaces, accounts, products, etc. to create audiences.
Snowplow is an open-source event tracking platform that gives users the ability to generate and process high-quality behavioral data and deliver it in real-time streams to both data lakes and data warehouses. The main advantage that Snowplow provides over a platform like Segment is data ownership. Companies leveraging Snowplow own their entire data pipeline because the data never leaves their technology stack. This means that it can be highly tailored towards the needs of the business
Both of these tools combined work unilaterally to create a more robust version of Segment. Leveraging Hightouch and Snowplow together enables more use cases and democratizes more data, all within a company’s own proprietary technology stack. There is honestly no reason not to test this workflow out since Hightouch offers one free destination and Snowplow is completely open-source.
Want to learn more about Reverse ETL? Download our Reverse ETL Whitepaper below where we touch on the technology and applications of Reverse ETL across your business.