Table of ContentsThe high-level working concepts in Hightouch.At a high-level, Hightouch constructs your data pipeline by connecting a model to a destination. This is called a sync.Every model is a SQL query built on top of a source. A source is a database or warehouse that contains the data you wish to move to a destination. A destination is a SaaS tool or other location that displays and/or operationalizes your data for end-users.
Models are essentially SQL queries attached to a specific source that return a set of records. Hightouch sends the SQL you enter into the query editor directly to your source, so any SQL that is valid in your source is valid in Hightouch. This includes more specialized SQL like subqueries and user-defined functions. Hightouch can perform cross-table queries as expected, as long as the credentials stored in Hightouch are scoped with permissions to access all of the queried tables.You should take some time to develop queries that will best extract and shape the data into a format that will be smoothly ingested by your destination. Hightouch will not, for instance, automatically extract fields from a JSON object in a cell in your source to send as a scalar to your destination. But you can do that in SQL! After you have built a model, you will create a sync. A sync is a set of instructions about how to integrate data from your model to your destination. It specifies a schedule, which could be triggered manually, via Airflow/dbt or another integration, or at a regular schedule. When we run a sync, we fetch the data from your source as specified in your model and we compare each record by the unique identifier (the primary key) you specified when creating the model. This comparison process is called ‘diffing’. From there, we find changes that may be either new records, changed records, or deleted records and we update the destination as specified in the sync. See more on Diffing belowDepending on your destination, there may be one or more supported sync modes. The insert mode will only add new records to your destination. The upsert mode will add new records and update existing records. The update mode will only update existing records. Some destinations may also have more than one sync type. These refer to more complex representations of your data. Most destinations have a single sync type, called objects. These are single records that represent some unique thing. Some destinations, like Iterable, have events. Events are multiple records that describe something that happened. Some destinations like Slack also have notifications. These are messages that are constructed based on the underlying record.When Hightouch runs the initial sync of your data, we aggregate all of the data according to your model as rows and send the rows to the mapped fields in your destination. We keep a record of what we’ve sent in something called a diff file.We use the primary key you specified in your model as the point of reference for tracking your data. Hightouch can’t work without this primary key, so be sure to set it in the configuration tab of your model. The primary key is the waypoint by which Hightouch performs its searches.Once the first diff file is created, we use it to keep an eye on primary keys we’ve seen before. Once we’ve found a primary key we’ve seen before, we scan the columns of that row to look for changes.We also look for missing primary keys and consider those deleted rows. We look for new primary keys and consider those new rows.When diffing, we only look for changes in the data that’s being sent via the mappings you set up in your sync configuration.For example, if you set up a model (SQL query) that returns 20 fields but only include 10 fields in your mapping, Hightouch will only watch those 10 fields (columns). If any of the other columns change, Hightouch won’t know about it because we are not tracking them.If you add a new column to the model and add that column (field) to the mapping, Hightouch starts over again with an initial sync and creates a brand new initial diff file.If you change the data type of a column in your model, Hightouch will detect that as a change and sync the row as a changed row.Hightouch only compares the diff file from a current sync with the most recent diff file from that sync. We don’t maintain a historical record of all rows and all columns (fields) that have ever been sent.So if a row drops out of a sync, it’s considered a new row even though the row may have been sent in the past. Hightouch doesn’t store all primary keys that have ever been sent. Consequently, Hightouch recommends the following ‘good data practice’:Your warehouse should be your single source of truth. It is not a good practice to update data only in your end tool.When Hightouch executes a sync, we run a query against your specified source and move the generated diff file over to an AWS S3 bucket where we perform the diff check. This is called S3 diffing.If you have warehouse diffing setup, instead of moving the diff file to our S3 bucket, we will use your warehouse for in-warehouse diffing. There is a significant speed difference when using in-warehouse when you are dealing with millions or hundreds of millions of records.When you execute a sync, either manually or via a schedule or trigger, your sync will display a status of “Querying”. This status means your sync is in one of the following 3 diff states:
Source The database or warehouse where your data is stored. The starting point for a Hightouch data pipeline.Model The SQL query that pulls data from your source to send to your destination. We send your SQL query directly to your source so any SQL that is valid for your source (including functions) is valid in Hightouch.Destination The service receiving your data, either a SaaS tool, SFTP location, etc. For example, common Destinations include Salesforce, Hubspot, Customer.io.Sync The full Hightouch pipeline that pushes the data aggregated from your Model to your Destination, including the Mappings and the Execution Schedule.Sync Modes The type of sync Hightouch will perform, either Insert, Update, Upsert, or ArchivePrimary Key The field Hightouch should use to search for and keep track of a record. Primary keys must be unique. We use the primary key as the foundational piece of data for diffing records. See the section on Diffing below.Mappings Your sync’s configuration of data fields to send from the model to the source.Execution schedule Every Hightouch sync can be triggered either manually or at regular intervals through a Hightouch enabled schedule. You also initiate your Hightouch sync from an Airflow trigger or dbt.Diffing The process for comparing your current sync to a previous sync to determine what, if anything, about your data needs to change.
- Hightouch is waiting for your model’s query to complete in your warehouse;
- Hightouch is transferring the diff file over to an S3 bucket; or
- Hightouch is performing the diff check