Dummy Podcasts are not so bad

Data integration is the process of moving data between databases — internal, external or both. Read this guide to learn about all the integration technologies and understand how data moves through each of them.

Dummy Podcasts are not so bad

What is Data Integration?

A quick Google search says, “data integration is the process of combining data from different sources into a single, unified view”. Sounds so simple right? But hey — since you’re reading this, you already know that such a narrow definition of data integration, to say the least, is reckless.

Before I jump into describing what data integration really means, let me assure you that this guide is not meant to convince you that a particular solution or technology is better than the others. Instead, the goal of this guide is to provide you with a comprehensive, unbiased, 360-degree overview of the data integration landscape.

To that end, I will cover all the technologies that comprise the data integration landscape — iPaaS, CDP, ETL, ELT, and finally, reverse ETL.

Even if you are a seasoned data person, I hope that this guide becomes a ready reckoner for you every time you’re tasked with choosing the appropriate method to move data.

OK, so what the heck is data integration?

To put it simply, data integration is the process of moving data between databases — internal, external, or both. Here, databases include production DBs, data warehouses (DWs) as well as third-party tools and systems that generate and store data.

It’s good to keep in mind that all integration tools use the same underlying technology — APIs. If you’d like to learn more about APIs, here’s an in-depth guide, a video, and a course.

So many moves..

iPaaS or Integration Platform as a Service: data moves between cloud apps directly with little to no transformation taking place in the iPaaS

CDP or Customer Data Platform: data moves between cloud apps via a central hub which enables moderate transformation capabilities

ETL or Extract, Transform and Load: data moves from cloud apps to a data warehouse via a robust transformation layer built into the ETL tool

ELT or Extract, Load, and Transform: data moves from cloud apps to a data warehouse directly post which transformation and data modelling take place in the warehouse via SQL. The main difference here is that with ETL, transformation takes place before data is loaded into the warehouse, whereas with ELT, transformation takes place afterwards

Reverse ETL: data moves from a data warehouse to cloud apps. Typically, the core transformation takes place in the warehouse before the reverse ETL process, but the reverse ETL tool may have a minimal transformation layer to fit data to an external system’s schema

Let us now look at each of these technologies in more detail — their pros and cons, the audience each one caters to, and the key players operating in the market today. Once again, do keep in mind that the commentary is wholly based on the technology in question and not the companies or products operating under it.