Search documentation...

K
ChangelogBook a demoSign up

Trino

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources.

Getting started

Hightouch lets you pull data stored in TrinoDB (previously PrestoSQL) and push it to downstream destinations. Most of the setup occurs in the Hightouch UI, but you need access to your Trino instance for information like your host, port, database name, and credentials.

You need to allowlist Hightouch's IP addresses to let our systems contact your warehouse. Reference our networking docs to determine which IPs you need to allowlist.

Connection configuration

To get started, go to the Sources overview page and click the Add source button. Select Trino and follow the steps below.

Choose connection type

Hightouch can connect directly to Trino over the public internet or via an SSH tunnel. Since data is encrypted in transit via TLS, a direct connection is suitable for most use cases. You may need to set up a tunnel if your Trino instance is on a private network or virtual private cloud (VPC).

Direct connection vs. SSH tunnel

Hightouch supports both standard and reverse SSH tunnels. To learn more about SSH tunneling, refer to Hightouch's tunneling documentation.

Configure your source

Enter the following fields into Hightouch:

  • Host: The hostname or IP address of your Trino coordinator server.
  • Port: The port number of your Trino coordinator server.
  • (Optional) Default catalog: This specifies the catalog that will be used when Hightouch executes queries in Trino.
  • Engine: Select whether to use the Trino or Presto engine to execute queries.
  • Connector: Select the connector that will be used to access the data source used by the sync engine.
  • (Optional) Disable encryption: Encryption is enabled by default. Check this option if your Trino instance doesn't support encrypted connections.

Choose your sync engine

For optimal performance, Hightouch tracks incremental changes in your data model—such as added, changed, or removed rows—and only syncs those records. You can choose between two different sync engines for this work.

The standard engine requires read-only access to Trino. Hightouch executes a query in your database, reads all query results, and then determines incremental changes using Hightouch's infrastructure. This engine is easier to set up since it requires read—not write—access to Trino.

The Lightning engine requires read and write access to Trino. The engine stores previously synced data in a separate schema in Trino managed by Hightouch. In other words, the engine uses Trino to track incremental changes to your data rather than performing these calculations in Hightouch. Therefore, these computations are completed more quickly.

Standard vs Lightning engine comparison
Hightouch recommends using the Lightning sync engine when syncing more than 100 thousand rows of data.

If you select the standard engine, you can switch to the Lightning engine later. Once you've configured the Lightning engine, you can't move back to the standard engine without recreating Trino as a source.

To learn more, including migration steps and tips, check out the Lightning sync engine docs.

Standard versus Lightning engine comparison

CriteriaStandard sync engineLightning sync engine
PerformanceSlowerQuicker
Ideal for large data models (over 100 thousand rows)NoYes
ReliabilityNormalHigh
Resilience to sync interruptionsNormalHigh
Extra featuresNoneWarehouse Sync Logs, Match Booster, Identity Resolution
Ease of setupSimplerMore involved
Location of change data captureHightouch infrastructureTrino schemas managed by Hightouch
Required permissions in TrinoRead-onlyRead and write
Ability to switchYou can move to the Lightning engine at any timeYou can't move to the standard engine once Lightning is configured

Lightning engine setup

To set up the Lightning engine, you need to grant Hightouch write access to your Trino connector. The steps to do so depend on the connector. These snippets serve as a reference and should be run on your Trino server.

Hive

CREATE SCHEMA IF NOT EXISTS <catalog>.hightouch_audit;
CREATE SCHEMA IF NOT EXISTS <catalog>.hightouch_planner;

PostgreSQL

CREATE USER hightouch_user WITH PASSWORD '********';
CREATE SCHEMA IF NOT EXISTS <catalog>.hightouch_audit;
CREATE SCHEMA IF NOT EXISTS <catalog>.hightouch_planner;
GRANT CREATE, USAGE ON SCHEMA <catalog>.hightouch_audit TO hightouch_user;
GRANT CREATE, USAGE ON SCHEMA <catalog>.hightouch_planner TO hightouch_user;

The snippet provisions two schemas (hightouch_planner and hightouch_audit) for storing logs of previously synced data and, in some cases, creates a dedicated SQL user. Hightouch must be able to read and write to these schemas, but the specific username and schema names might vary.

Provide credentials

Enter the following fields into Hightouch:

  • User: This can be your personal Trino login or a dedicated user for Hightouch. At minimum, this user must have read access to the data you wish to sync. If using the Lightning sync engine, you must also grant this user additional permissions as described above.
  • Password: The password for the user specified above.

Test your connection

When setting up a source for the first time, Hightouch validates the following:

  • Network connectivity
  • Trino credentials
  • Permission to list schemas and tables
  • Permission to write to hightouch_planner schema
  • Permission to write to hightouch_audit schema

All configurations must pass the first three, while those with the Lightning engine must pass all of them.

Some sources may initially fail connection tests due to timeouts. Once a connection is established, subsequent API requests should happen more quickly, so it's best to retry tests if they first fail. You can do this by clicking Test again.

If you've retried the tests and verified your credentials are correct but the tests are still failing, don't hesitate to .

Next steps

Once your source configuration has passed the necessary validation, your source setup is complete. Next, you can set up models to define which data you want to pull from Trino.

The Trino source supports these modeling methods:

You may also want to consider storing sync logs in Trino. Like using the Lightning sync engine versus the standard one, this feature lets you use Trino instead of Hightouch infrastructure. Rather than performance gains, it makes your sync log data available for more complex analysis. Refer to the warehouse sync logs docs to learn more.

You must enable the Lightning sync engine to store sync logs in your warehouse.

Data types

Hightouch supports all built-in Trino data types. Custom data types provided by plugins are represented as strings.

Tips and troubleshooting

To date, our customers haven't experienced any errors while using this source. If you run into any issues, please don't hesitate to . We're here to help.

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Last updated: Apr 11, 2024

On this page

Getting startedConnection configurationChoose connection typeConfigure your sourceChoose your sync engineProvide credentialsTest your connectionNext stepsData typesTips and troubleshooting

Was this page helpful?