Administration#

The primary tool for configuring Trel is treladmin command which is accessible from the Trel instance. Refer to administrator instructions under First Time Use to learn about how to login to the Trel instance.

Starting Trel platform#

Your Trel platform lacks credentials, so all processes are in SHUTDOWN mode.

The following credentials will be required before we can start the processes.

  1. AWS credentials. Trel uses AWS S3 for multiple internal steps as well as logs. It takes your credentials to use your buckets for storing these.

  2. Credentials for SMTP access.

  3. OpenAI API key for the AI powered data discovery.

Insert the Trel AWS credentials under key aws.access_key (See Credential management). If you wish to use this credential only for Trel platform use and wish to use a different one for your AWS jobs, insert the platform credentials instead under key aws.access_key.trel.

Add credentials for smtp (email, password, host) as a json into default with key smtp.

Additional credentials may be required for use with other clouds.

Now, you are ready to start the platform. First, take a look at the services that you have:

treladmin service

Assuming you have 15 services, run:

treladmin service_update RUN 1-15

Initial Status of Trel#

Assuming you chose the basic plan, and AWS cloud, using the information you provided, the Trel platform has been configured with two repositories: dw which is based on Athena and dl which is based on S3.

You have access to various sensors that can crawl or extract-load from data sources and add to catalog. these sensors must be created by filling in the details in the template after clicking the “Add Sensors” button in the sensor page.

You have access to four execution technologies: python, athena, emr_pyspark, and ec2. The first one runs in the Trel instance, while the rest run in the cloud. There are multiple templates for each and you can create new jobs using these technologies by filling in the templates. Start by clicking “Add job” in the jobs page.

Adding data repositories#

In Trel, repositories are powerful entities that help

To start cataloging and processing data, you must add at least one repository. The file setup2_for_aws.sh provides some sample commands for AWS that you can modify and run.

Each repository may also need its credentials. Follow the instructions for the specific Repository Plugins.

Now you can register and activate any sensor and have them populate the catalog with datasets!

Configure the execution profile plugins#

Every job in Trel must be associated with an Execution Profile Plugins. Therefore you need to make sure all the configuration needed by the profile plugin is provided before some job is using that profile.

Once this is done, you are ready to register jobs with those profiles. Make sure to take a look at Design guidelines to learn more about designing jobs and sensors.

Important files#

Path

Description

~/.trel.config

The main configuration for the Trel platform. After updates, restart Trel.

~/.trel.user.config

Required for command-line access for any user. The only config required for Trel CLI.

/var/log/trel

Main Trel log. Covers ReST API and scheduler.

/var/log/trel_sensor

Logs from the sensor subsystem manager

/var/log/trel_worker

Logs from the workers.

/opt/trel_ws

The folder where code for attempts is checked out and are run.

~/.trelcreds.cnf

MySQL credential file to access the DB. This is used by treladmin to directly access the DB and bypass the REST API.

trel_contrib#

Additionally a number of sense, ETL, copy and reverse-ETL code can be found in trel_contrib @ Github. This project also contains suitable code for lifecycle of the supported storage platforms.

Adding users#

A user has already been added based on the e-mail that was used to sign-up. The credentials are stored in ~/.trel.user.config of the Trel instance. If you are just exploring, you can directly access Trel from here.

To add additional users, we use treladmin as follows:

treladmin user_add <email>

For security purposes, the administrator decides when the user can generate credentials. To allow the user to log-in, run:

treladmin user_update <email> --enable_authentication 48

This e-mails the user with next steps and gives the user 2 days to generate the keys needed to login.

Adding templates#

You can add job and sensor templates to the Trel platform for to simplify the creation workflow for both. Here are some sample scripts you can use to download and install some common templates.

AWS Job templates:

REPO=https://raw.githubusercontent.com/cumulativedata/trel_contrib/main
cd /tmp/

wget $REPO/templates/athena.yml && \
trel template_set athena job "Athena jobs (python code)" athena.yml

wget $REPO/templates/athena_sql_wrapper.yml && \
trel template_set athena_sql_wrapper job "Wrapper for Athena SQL scripts" athena_sql_wrapper.yml

wget $REPO/templates/emr_pyspark.yml && \
trel template_set emr_pyspark job "EMR Pyspark scripts in a Cluster" emr_pyspark.yml

wget $REPO/templates/ec2.yml && \
trel template_set ec2 job "Python script in an EC2 instance" ec2.yml

wget $REPO/templates/databricks_nb.yml && \
trel template_set databricks_nb job "Run a Databricks notebook as a job" databricks_nb.yml

cd -

Sensor templates:

REPO=https://raw.githubusercontent.com/cumulativedata/trel_contrib/main
cd /tmp/

wget $REPO/templates/odbc_sensor.yml && \
trel template_set odbc sensor "Connect to ODBC data sources" odbc_sensor.yml

wget $REPO/templates/s3_path.yml && \
trel template_set s3_path sensor "Crawls S3 for valid datasets" s3_path.yml

wget $REPO/templates/s3_state.yml && \
trel template_set s3_state sensor "Polls state file in S3" s3_state.yml

cd -

Backup and Restore Mechanisms#

This section provides guidance on how to set up and manage the backup and restore processes for the Trel platform. The user is responsible for configuring the necessary parameters in the /opt/trel.config file and ensuring the availability of the required encrypted backup file for restoration.

Backup Process#

The backup mechanism is designed to create regular backups of the Trel database. The backup process occurs hourly and involves several steps to ensure the security and integrity of the backup files.

Steps in the Backup Process#

  1. Database Dump: The database is dumped into an SQL file using the mysqldump command. The filename follows the format trel.YYYYMMDD_HH.sql.

  2. File Compression: The SQL file is compressed using gzip to save storage space. The compressed file is then owned by the trel_admin user with restrictive permissions to ensure security.

  3. Backup Path Configuration: The trel.config file contains configuration parameters that define the paths for database backups and log backups. The paths are retrieved using Python scripts within the shell script.

  4. Local Cleanup: If cloud backup is not configured, the script cleans up old backups locally. The cleanup process follows a lifecycle management approach, where backups older than a specified duration are deleted to save space.

  5. Cloud Backup: If a cloud backup path is configured, the compressed backup file is encrypted using openssl and then uploaded to the specified S3 bucket. Once the upload is successful, local copies of the backup files are deleted.

  6. Log File Backup: The script also handles the backup of log files associated with the Trel platform. These logs are uploaded to the cloud and cleaned up locally after successful uploads.

Additional Notes#

  • Ensure the trel.config file is properly configured with the correct paths for the database and log backups.

  • The openssl command used for encryption requires access to a key file located at /etc/manage_api_key. Ensure this file is securely managed.

Restore Process#

The restore mechanism is designed to recover the Trel database from a previously created and encrypted backup file. The restore process involves decrypting, decompressing, and then importing the SQL file into the database.

Steps in the Restore Process#

  1. Verify Encrypted Backup File: The process begins by verifying the existence of the encrypted backup file located at /home/trel_admin/db_restore_file.gz.enc. If the file is not found, the restore process cannot proceed.

  2. File Decryption: The encrypted backup file is decrypted using the openssl command. The decryption key is provided by the key file located at /etc/manage_api_key. The decrypted output is saved as /tmp/db_restore_file.gz.

  3. File Decompression: The decrypted file, which is still in a compressed format (.gz), is decompressed using the gunzip command. The resulting SQL file is stored as /tmp/db_restore_file.sql.

  4. Database Restoration: The decompressed SQL file is then used to restore the database. The restoration is performed using the mysql command, where the SQL file is piped into the MySQL command line interface to recreate the database contents.

  5. Cleanup: After the restore process is complete, any temporary files created during the process, such as the decrypted SQL file, are deleted to ensure no sensitive data is left on the system.

Additional Notes#

  • Ensure the key file at /etc/manage_api_key is available and accessible only by authorized users.

  • The user is responsible for providing the encrypted backup file at /home/trel_admin/db_restore_file.gz.enc before initiating the restore process.

  • If any errors occur during the restore process, detailed error messages will be provided for troubleshooting.

Configuration Parameters#

The following parameters in the trel.config file must be properly set to ensure successful backup and restore operations:

  • db_backup_path: Defines the cloud storage path where database backups will be uploaded.

  • log_backup_path: Defines the cloud storage path where log files will be uploaded.

Ensure these parameters are set according to your organization’s cloud storage structure.

Backup and Restore Summary#

  • Backup Frequency: Hourly.

  • Backup Storage: Local and optionally cloud (S3).

  • Encryption: AES-256-CBC with PBKDF2 key derivation.

  • Restore File: The encrypted file located at /home/trel_admin/db_restore_file.gz.enc.

This mechanism ensures that your database backups are securely stored and easily restorable in the event of a failure, minimizing downtime and data loss.