Warehousing: Buckets Breakdown Changelog Monitoring

Updated by Emily Shreero

Rockerbox tables are updated in your data warehouse each time there is a change to your foundational dataset. This means that more frequent syncs to your data warehouse may occur throughout the day as Rockerbox receives new data from each of the sources it aggregates.

Buckets Breakdown (MTA Tiers) Monitoring

Rockerbox aims to make your Buckets Breakdown (MTA Tiers) dataset available as early as possible so you have the data you need for daily decision making and analysis. Because Rockerbox aggregates data from all of your marketing channels, there may be cases where data is not yet made available by a given platform (e.g., spend is not yet available from the TikTok API).

To ensure you are always aware of any missing data and can plan any downstream processes accordingly, Rockerbox's MTA Tiers Changelog Table will report on the status for each update to spend and conversions for all Buckets Breakdown (MTA Tiers) tables shared into your data warehouse. The schema of this changelog table can be found here.

Best Practices for Ensuring Completeness of Data
  • Ensure conversion_count > 0 per date per table before processing data
Rockerbox aggregates conversion data on a daily basis for each conversion segment. This means that once the conversion_count is greater than zero, then you know that all of your conversions for that date are included in your Buckets Breakdown dataset.
  • Check that spend is available for all critical platforms before processing data. You can check to see if the spend_platform_count and spend_platforms columns match your expectations based on when spend is expected to appear in Rockerbox for each platform.
Example:

Let's say you run a data pipeline at the first thing each morning and you run marketing across 3 platforms: (1) Google (2) Meta (3) Tatari OTT.

You always expect Google and Meta spend to be included in your Buckets Breakdown report as that data is immediately available for Rockerbox to pull from the Google and Meta APIs; however, Tatari OTT spend is not available at the first thing in the morning because Tatari data is delivered to Rockerbox on a 2-day lag.

In this case, you'll want to check that the latest version of your Buckets Breakdown data includes spend from just Google and Meta since you will always expect to see spend in your Buckets Breakdown dataset for these channels.

The sample queries below shows how you can execute this check against yesterday's data in a few different ways.

Sample Query 1

--Add a row to the case statement for each platform that you always expect to see in your Rockerbox reporting

SELECT
date,
total_spend,
total_spend_platform_count,
total_spend_platforms,
case
when total_spend_platforms not ilike '%adwords%' then true
when total_spend_platforms not ilike '%facebook%' then true
--add additional platforms here
else false
end as missing_spend_platforms
FROM MTA_TIERS_CHANGE_LOG
WHERE
most_recent = true
AND table_name = "{insert table name here}"
AND missing_spend_platforms = true
AND date = current_date - 1
ORDER BY updated_at DESC;

Sample Query 2

--Update the coalesce in the where clause and add a new case statement for each platform that you always expect to see in your Rockerbox reporting

SELECT
date,
total_spend,
total_spend_platform_count,
total_spend_platforms,
case when total_spend_platforms not ilike '%adwords%' then true end as missing_adwords_spend,
case when total_spend_platforms not ilike '%facebook%' then true end as missing_facebook_spend
--add a new case statement for additional platforms
FROM MTA_TIERS_CHANGE_LOG
WHERE
most_recent = true
AND table_name = "{insert table name here}"
AND date = current_date - 1
AND coalesce(
missing_adwords_spend,
missing_facebook_spend
--add reference to column for additional platforms
) is not null
ORDER BY updated_at DESC;
How to evaluate the latest version of your Buckets Breakdown dataset(s)?

Filter the table to produce all records where most_recent is true.

SELECT * 
FROM MTA_TIERS_CHANGE_LOG
WHERE most_recent = true
ORDER BY updated_at DESC;

How to identify all of the Rockerbox datasets that were updated in the last 48H?

The conversions, log level MTA, and MTA Tiers (Buckets Breakdown) datasets are all related for a single conversion segment + date. If there is an incremental update to conversion data in the MTA Tiers dataset per the status table, then you should also assume that there is an incremental update to the conversions and log level MTA dataset.

Here is a query to detect the all of the MTA Tiers datasets that were updated by Rockerbox in the last 48 hours

Ensure that both timestamps in the WHERE clause are in the UTC timezone.

Snowflake

SELECT 
table_name,
identifier,
date,
updated_at
FROM MTA_TIERS_CHANGE_LOG
WHERE
most_recent = true
AND table_name = "{insert table name here}"
AND updated_at >= dateadd(hour, -48, sysdate())
ORDER BY updated_at DESC;

Redshift

SELECT 
table_name,
identifier,
date,
updated_at
FROM MTA_TIERS_CHANGE_LOG
WHERE
most_recent = true
AND table_name = "{insert table name here}"
AND updated_at >= dateadd(hour, -48, getdate())
ORDER BY updated_at DESC;

BigQuery

SELECT 
table_name,
identifier,
date,
updated_at
FROM MTA_TIERS_CHANGE_LOG
WHERE
most_recent = true
AND table_name = "{insert table name here}"
AND updated_at >= datetime_add(current_datetime('Etc/UTC'), INTERVAL -48 HOUR)
ORDER BY updated_at DESC;


How did we do?