Search documentation...

K
ChangelogBook a demoSign up

S3

Transfer data at scale from your warehouse to S3

Supported syncing

Object TypeDescriptionSupported Sync Modes
Any data setSync data from a source to S3 as CSV, JSON, or Parquet filesInsert, All, Diff
  • All: All mode creates one file with all the rows in the query results, every time the sync runs.
  • Insert: Insert mode creates one file with the rows that were added since the last sync.
  • Diff: Creates three files, one for rows added, one for rows changed, and another for rows removed since the last sync.

For more information about sync modes, refer to the sync modes docs.

Prerequisites

To get started, you need:

Hightouch needs the following IAM actions to store items in your bucket:

ActionDetails
s3:PutObjectGrants permission to add an object to a bucket
s3:AbortMultipartUploadGrants permission to to abort or cancel an ongoing multipart upload in an S3 bucket
s3:ListMultipartUploadPartsGrants permission to list the parts of a multipart upload in an S3 bucket
s3:ListBucketMultipartUploadsGrants permission to list all the ongoing multipart uploads within an S3 bucket

For small file transfers, Hightouch only uses the s3:PutObject action. For large files, multi-part uploads may be necessary, hence the other required actions.

You can use the following JSON sample to create your IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Sample",
            "Effect": "Allow",
            "Action": [
              "s3:PutObject",
              "s3:AbortMultipartUpload",
              "s3:ListMultipartUploadParts",
              "s3:ListBucketMultipartUploads"
            ],
            "Resource": [
              "arn:aws:s3:::${bucketName}/*",
              "arn:aws:s3:::${bucketName}"
            ]
        }
    ]
}

See the guide for configuring AWS credentials for more details.

Connect to Amazon S3

Go to the Destinations overview page and click the Add destination button. Select Amazon S3 and click Continue. You can then authenticate Hightouch to Amazon S3 by entering your Bucket Name. The Bucket Name should just be the name of the bucket, not a URL.

Bucket name in Amazon S3 Console

Sync configuration

Once you've set up your S3 destination and have a model to pull data from, you can set up your sync configuration to begin syncing data. Go to the Syncs overview page and click the Add sync button to begin. Then, select the relevant model and the S3 destination you want to sync to.

Select file format

Hightouch supports syncing JSON, CSV, and Parquet files to Amazon S3.

Enter filename

The filename or object key field lets you specify the parent directory and the name of the file you want to use for your results. You can include timestamp variables in the filename, surrounding each with {}. Hightouch supports these timestamp variables:

  • YYYY: Represents the full year in four digits.
  • YY: The last two digits of the year.
  • MM: Two-digit month format (01-12).
  • DD: Two-digit day format (01-31).
  • HH: Two-digit hour format in 24-hour clock (00-23).
  • mm: Two-digit minute format (00-59).
  • ss: Two-digit second format (00-59).
  • ms: Three-digit millisecond format.
  • X: Unix timestamp in seconds.
  • x: Unix timestamp in milliseconds.

All dates and times are UTC.

For example, you could enter upload/{YYYY}-{MM}-{DD}-{HH}-{mm}-result.json to dynamically include the year, month, date, hour, and minute in each uploaded file. Hightouch would insert each file in the upload directory, which would need to already exist in your bucket.

You can also use other variable values to include sync metadata in the filename:

  • {model.id}
  • {model.name}
  • {sync.id}
  • {sync.run.id}

If a file already exists at the path you entered at the time of a sync, Hightouch overwrites it. To keep different versions of the same results file, you can enable versioning in your bucket, or your app can copy the data to another location.

If you are using an audience and would like to include the audience name, you will still use {model.name}.

Set filename offset

By default, Hightouch uses the timestamp of the sync run to fill in timestamp variables. You can optionally include an offset in seconds. For example, if you want the filename's date to be 24 hours before the sync takes place, enter -86400 (24 hours * 60 minutes * 60 seconds). If you want the filename's data to be one hour after the sync takes place, you would enter 3600 (60 minutes * 60 seconds).

CSV options

If you're syncing to a CSV file, you have additional configuration options:

  • Delimiter: Your options are comma (,), semicolon (;), pipe (|), tilde (~), and tab
  • (Optional) Whether to include a CSV header row in the exported files
  • (Optional) Whether to include a byte order mark (BOM) in the exported files, the BOM is <U+FEFF>

Columns to sync

You can export all columns exactly as your model returns them or choose to export specific ones.

If you need to rename or transform any column values you're syncing, you can use the advanced mapper to do so. If you choose this option, Hightouch only syncs the fields you explicitly map.

Field mapping in the Hightouch UI

The preceding example shows how to selectively export the customer_id, email, first_name and last_name, columns. These columns are mapped to new fields in the destination file as id, email, and name—a templated concatenation of first_name and last_name—respectively. Hightouch exports these fields to new fields in the file and ignores all other columns from your results.

Batch size

By default, and depending on your sync mode, Hightouch sends one file to your S3 bucket for each export. (In diff mode, Hightouch sends three files.)

By enabling batching, you allow Hightouch to send multiple files if your model query results are large. This can help avoid file size errors from S3.

Empty file results

You can select how Hightouch should handle empty results files. Empty result files can occur if your model's query results haven't changed since the last sync.

You can select whether to skip empty files. If you skip empty files, it means Hightouch won't export any files if your model's query results haven't changed since your last sync.

Gzip compression

You can enable gzip compression to reduce the size of your exported files. Note that Hightouch does not automatically add the .gz extension to your files. You can add this or any other extension to your filename.

Tips and troubleshooting

Common errors

If you encounter an error or question not listed below and need assistance, don't hesitate to . We're here to help.

Access denied

The ACCESS_DENIED error is a permissions-related error. Ensure your Hightouch AWS credentials have read/write access to the S3 bucket specified in your destination. The minimal permissions required are s3:GetObject, s3:PutObject, and s3:ListBucket.

Sync alerts

Hightouch can alert you of sync issues via Slack, PagerDuty, SMS, or email. For details, please visit our article on alerting.

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Last updated: Jan 19, 2024

On this page

Supported syncingPrerequisitesConnect to Amazon S3Sync configurationSelect file formatEnter filenameSet filename offsetCSV optionsColumns to syncBatch sizeEmpty file resultsGzip compressionTips and troubleshootingCommon errorsSync alerts

Was this page helpful?