Schema: Data Warehouse - Clickstream Dataset

Updated by Emily Shreero

Rockerbox's Clickstream dataset is a user and event-level dataset that reports on every Page View and Conversion event tracked via Rockerbox pixel on your site. These events are then attributed back to click-based marketing touchpoints, with the Rockerbox tier structure and spend keys applied for standardization with your other Rockerbox conversion datasets.

  1. Clickstream Schema
  2. Clickstream FAQ
See our Clickstream Use Cases for more information on how Rockerbox customers are using this dataset for enhanced marketing analysis.

Clickstream Schema

Name

Description

Type

action

Name of the raw pixel event. This includes events for all conversion segments configured in Rockerbox + a page view.

str

advertiser

Rockerbox Account ID

str

base_id

Primary User ID

str

date

Date when the action occurred

date

engaged_session

Binary 0 or 1 indicating a session lasting > 10 seconds (session_max - session_min > 10).

int

event_id

A unique identifier for each action.

Can be used as the primary key.

str

hash_ip_events

Hashed IP address of user for a particular action

str

identifier

Advertiser-specific identifier

str

marketing_type

Type of marketing touchpoint

This will always be = onsite as this dataset reflects click-based marketing events only

str

onsite_count

The total number of actions seen against a given user within a given session. times a user was seen on your website

int

original_url

URL of the page landing page

str

rb_sync_id

Identifier used by Rockerbox to sync dataset to your warehouse

int

report

The name of the report

str

request_referrer

Page Referrer (the previous site where the user came from)

str

session_id

Identifier for a session, indicated by a timestamp.

A unique user session will be a combination of session_id|uid, or can be identified using session_start

str

session_max

Timestamp of the last time a user was seen on site during a session

timestamp

session_min

Timestamp of the first time a user was seen on site during a session

timestamp

session_start

Binary 0 or 1 indicating the first event of a session.

Can be used for session / visitor analysis by filtering for session_start = 1

int

spend_key

The ID used to pull spend from an advertisting platform. This is typically the Ad ID, but may differ based on your account setup.

str

tier_1

Aligns to the 5-tiered categorization structure available in the Rockerbox UI.

tier_1 = most broad categorization (level 1)

str

tier_2

Aligns to the 5-tiered categorization structure available in the Rockerbox UI.

tier_2 = level 2 categorization (more granular than level 1)

str

tier_3

Aligns to the 5-tiered categorization structure available in the Rockerbox UI.

tier_3 = level 3 categorization

str

tier_4

Aligns to the 5-tiered categorization structure available in the Rockerbox UI.

tier_4 = level 4 categorization

str

tier_5

Aligns to the 5-tiered categorization structure available in the Rockerbox UI.

tier_5 = level 5 categorization

str

timestamp_action

Timestamp of when the action occurred

timestamp

timestamp_event

Timestamp of when the marketing touchpoint occured.

This will only appear on the first action within a session, as the only action that will have a marketing touchpoint. The timestamp_event and timestamp_action will match.

timestamp

transform_table_id

ID associated with the Rockerbox table used to apply mappings and spend

*Closed beta feature only

int

uid

Rockerbox User ID cookie

str

updated_at

Time the cache record was updated most recently

timestamp

user_agent

Web identifier that includes characteristics like browser, device, operating system, and application

str

utm_campaign

utm_campaign value parsed from the landing page URL, if present

str

utm_content

utm_content value parsed from the landing page URL, if present

str

utm_id

utm_id value parsed from the landing page URL, if present

str

utm_medium

utm_medium value parsed from the landing page URL, if present

str

utm_source

utm_source value parsed from the landing page URL, if present

str

utm_term

utm_term value parsed from the landing page URL, if present

str

Clickstream FAQ

  • What are marketers using this dataset for?
  • What's the primary key for this table?
    • event_id
  • When should I expect this table to sync?
    • The Clickstream dataset is updated daily, after your other core conversion datasets are synced.
  • What's the backfill behavior for this table?
    • This dataset is not backfilled, as it represents a "live" stream of onsite events vs other Rockerbox datasets that are aggregated from various data sources. This means you should not expected historical data to be changed after the initial sync.
    • This also means that to your tier structure that might apply to click-based events reflected in this table will not be edited retroactively.
  • How does Rockerbox handle a session that spans two dates (UTC)
    • Rockerbox will break a session (generating a new session_id and date) if a user's session is active across two days. This may be common is users are active around UTC midnight.
  • How does Rockerbox define a session? How can I make sure a session is unique when I query against the dataset?
    • The session_id used by Rockerbox is a timestamp of the first event in a session, vs the session_id cached in your browser. This allows Rockerbox to maintain the same session_id when a user opens a new tab.
    • Because session_id is defined as a timestamp, it's not unique per user. To identify all unique user sessions, join session_id|uid or filter for session_start = 1 in your query.
    • A session "expires" after 30 minutes of inactivity. If the user is seen as active again, the session_id will be reset.
  • Why are some fields for a given row in the dataset blank?
    • Not all actions will have associated marketing context. Typically, only the first action in a session will pass along click-based marketing context in the URL. In your Rockerbox conversion data, the marketing context for each conversion event is carried over via this first action with marketing context. In this dataset, only the events that carry marketing context will have relevant fields populated to avoid any duplication.
  • Can I see non-click marketing context?
    • Non-click marketing context like view-based data from Linear TV, OTT, Display, and Social as well as other marketing context like promo code attribution or direct mail matching is not currently available in this dataset.
  • How can I identify a repeat visitor?
    • When a visitor returns to site, they may or make not have the same Rockerbox cookie ID (uid). To string together a user path when the uid is NOT the same, Rockerbox applies an identity resolution process to our conversion datasets. This is not currently available for this dataset.
    • Users with > 200 events / day are excluded from this dataset under the assumption that these are admin users or server-side cookie IDs.
  • What is an engaged session?
    • An engaged session is defined by a session where the differences of the session_max and session_min timestamps are > 10 seconds.
    • By logic, this means that a session with only 1 action cannot be an engaged session, since the session_min and session_max timestamps will be the same


How did we do?