Introducing Git Sync: Back your Syncs with Version Control

Leverage the power of software development and DevOps and implement the same best practices you use in your production code in the context of data integration with Git.

By Pedram Navid and Ernest Cheng and Luke Kline on

Ops

Developers can’t live without Git; it’s a core system of the modern developer workflow. Git’s ability to track iterative changes to code over time collaboratively is what has made dbt such a powerful tool in the data space. This is why we’re excited to announce the next feature as part of our vision to be a tool loved by data practitioners, the Hightouch Git Integration.

The Hightouch Git Integration brings all the great features of Git to your Reverse ETL Workflows: commit logs of incremental changes, the ability to roll back to a previous state, and the ability to use code to create and edit Hightouch Syncs and models. It’s just one step closer to bringing some of the powerful learnings from software development and DevOps to data.

Announcing Hightouch/Git Integration

Our new Git integration enables you to implement the same best practices you use for your production code in the context of data integration.

Features:

  • Data Workflows as Code: Express your models and Syncs as code, with an easy-to-read YAML schema
  • Bi-directional updates: Maintain the existing user experience of easy configuration through the UI, or make changes and create resources via the CLI. It’s your choice, and it works both ways. No state conflicts, no compromises.
  • Based on Git protocol: Our integration connects and works with all Git-based services that your teams already use like Github, Gitlab, and Bitbucket.
  • Deep observability: You can easily see all edits to your Syncs and models, and roll back unintended changes using all the power of git commands. You can effortlessly view all changes to your Syncs in Git directly or in Hightouch.
  • Edit Syncs in Git or your CLI directly: Once you have created a Sync in Hightouch, you can simply edit everything within your CLI in Git. Create a resource in the UI, and update it via the command line. Creating multiple copies of the same Sync with slightly different parameters has never been easier!

Video Demo

You can see Git Sync in action in our demo video below:

How does this work behind the scenes?

When we built this integration, we had a few design choices that we believed were critical to its success. We wanted to create a native integration with Git that felt painless while avoiding some of the headaches that often come with other infrastructure-as-code projects.

State changes can be really hard to deal with and we didn’t want to force users to pick between making changes in the UI or via Git and the command line. Anyone who’s had a sleepless night trying to reconcile Terraform state changes before making a small change knows this pain all too well.

We also believed that observability and auditing should be first-class concepts. It wasn’t enough to simply Sync changes between Git and Hightouch. We wanted to capture changes to all resources and commit those individually, to make roll-backs and cherry-picking easy. To that end, we came up with the following underlying architecture:

First, Hightouch implements Syncing in two directions:

  1. Hightouch to Git (outbound)
  2. Git to Hightouch (inbound)

git_sync_diagram.png

For the outbound direction, Hightouch keeps an audit log of changes made to all resources, e.g. configuration changes, schedule changes, and so on. On a fixed interval, we check the audit log to see which resources have changed and Sync the new versions out to the Git repository.

We make an individual commit per resource, enabling users to roll back any unintended changes made in the Hightouch UI.

For the inbound direction, Hightouch looks into the Git repository and checks the state of every resource with the state in Hightouch. For each of those changes, we Sync the new version into Hightouch, whether it’s a small change or a completely new resource.

There are some particular nuances that help us reduce the possibility of edge cases:

  1. We run the inbound Sync, if and only if, the outbound Sync was successful.
  2. We have added a required slug on every resource to help users identify resources. without the need for an uninformative ID. This is also useful if we’re creating resources directly inside Git.

How do you get started?

Read our docs here to get started with Git Sync.

Sign up for more articles like this