How to…#

Fix bad data due to Sensor bug#

If your sensor has execution time invariance, this is an easy fix.

Shut down the sensor.
Remove the bad entries from the catalog.
Fix the sensor code and restart it.

The sensor should automatically realize that the entries are gone and replace them correctly.

If the sensor does not have execution time invariance, the fix depends on its ability to backfill. You still should do the steps above. But now, there is no guarantee that it will fix everything, and you will have to manually investigate what else needs to be done.

Fix bad data due to bad inputs#

Suppose we did not catch bad data in a dataset on time. As a result, it could be used by the next job and that job could produce more bad data. How to fix this?

Delete all the dataset entries that have this bad data. Inputs and outputs.
Produce the correct inputs.
Set “hide from scheduler” for the tasks that processed the bad data.
Trigger the scheduler for the job.

When triggered, the scheduler will realize that there are unprocessed inputs and start creating the same tasks you hid.

Fix bad data due to transformation bug#

If the inputs are correct, but the outputs are incorrect due to a bug (that does not cause the job to fail), follow these steps

Deactivate the job.
Fix the code.
Delete all the output dataset entries.
Set “hide from scheduler” for the tasks that ran the buggy code.
Activate the job, activate the scheduler, and trigger the scheduler.

When triggered, the scheduler will realize that there are unprocessed inputs and start creating the same tasks you hid.

Use a sensor from `trel_contrib`#

If the behavior of the sensor is exactly what you need, download the registration file, and make the modifications you need. This may include:

Picking a name for the sensor
Picking the correct output dataset class
Cron constraint

Then register the sensor and activate it. Done!

If the code needs to be modified, please download it and commit it to your codebase and make changes. Once done, repeat the above steps along with:

Update code location to your codebase

Then register the sensor and activate it.

Delete data created by a sensor without having the sensor replace it#

If the data you wish to delete is older than the data you wish to keep,

Update the parameter min_instance_ts in the sensor registration.
Delete the unwanted data

The sensor will not replace and dataset with instance_ts older than this parameter.

You can also use the max_instance_age_seconds parameter. The sensor will not insert any dataset where the difference between instance_ts and the current time is more than this.

As for deletion itself, you can use Lifecycle Management to automatically delete entries and the backing datasets if there is a recurring need to delete datasets.

Keep storage costs low#

The Trel feature Lifecycle Management is specifically designed for this.

Safely add a new job to production#

Most likely, this new job generates a new dataset class. In such cases, adding the job to production is extremely safe.

Verify once more that you really are generating a new dataset class.
Verify that you are not accidentally modifying the inputs.
Register the job.
Enable, activate scheduler, and trigger.

The safety comes from the fact that every active job and sensor will ignore the output of this job. Also, based on the design guidelines, this job will not modify the inputs.

Safely modify an existing job#

The process for this is most involved than for a new job. Assume we want to modify job1. Assume we want to test only a single task and not reproduce a set of tasks or a set of jobs.

In this case, we will commit changes to a branch, test using a different label, verify, and merge.

There are a few options to test the job. Here, we will do the elaborate one: make a new job for testing.

Create a branch feature1 with the changes involved.
Copy the production registration file and change the name to job1.qa.feature1.
Change the branch in execution.source_code.main.
Change the output label in execution.output_generator to feature1. Leave the input label unchanged.
Register the job.
Run the job using the schedule_id you want to test. This will generate output with label feature1.
Verify that this dataset is as expected.
Deactivate scheduling in job1.
Merge the branch with production.
Re-run any tasks to replace any existing datasets using the new code.
Update the schedule_ts_min if required.
Enable scheduling for job1.
Disable job1.qa.feature1.
Delete datasets of this label feature1.