Schema: Data Warehouse - Clickstream Dataset
Rockerbox's Clickstream dataset is a user and event-level dataset that reports on every Page View and Conversion event tracked via Rockerbox pixel on your site. These events are then attributed back to click-based marketing touchpoints, with the Rockerbox tier structure and spend keys applied for standardization with your other Rockerbox conversion datasets.
Clickstream Schema
Name | Description | Type |
action | Name of the raw pixel event. This includes events for all conversion segments configured in Rockerbox + a page view. | str |
advertiser | Rockerbox Account ID | str |
base_id | Primary User ID | str |
date | Date when the action occurred | date |
engaged_session | Binary 0 or 1 indicating a session lasting > 10 seconds (session_max - session_min > 10). | int |
event_id | A unique identifier for each action. Can be used as the primary key. | str |
hash_ip_events | Hashed IP address of user for a particular action | str |
identifier | Advertiser-specific identifier | str |
marketing_type | Type of marketing touchpoint This will always be = onsite as this dataset reflects click-based marketing events only | str |
onsite_count | The total number of actions seen against a given user within a given session. | int |
original_url | URL of the page landing page | str |
rb_sync_id | Identifier used by Rockerbox to sync dataset to your warehouse | int |
report | The name of the report | str |
request_referrer | Page Referrer (the previous site where the user came from) | str |
session_id | Identifier for a session, indicated by a timestamp. A unique user session will be a combination of session_id|uid, or can be identified using session_start | str |
session_max | Timestamp of the last time a user was seen on site during a session | timestamp |
session_min | Timestamp of the first time a user was seen on site during a session | timestamp |
session_start | Binary 0 or 1 indicating the first event of a session. Can be used for session / visitor analysis by filtering for session_start = 1 | int |
spend_key | The ID used to pull spend from an advertisting platform. This is typically the Ad ID, but may differ based on your account setup. | str |
tier_1 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_1 = most broad categorization (level 1) | str |
tier_2 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_2 = level 2 categorization (more granular than level 1) | str |
tier_3 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_3 = level 3 categorization | str |
tier_4 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_4 = level 4 categorization | str |
tier_5 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_5 = level 5 categorization | str |
timestamp_action | Timestamp of when the action occurred | timestamp |
timestamp_event | Timestamp of when the marketing touchpoint occured. This will only appear on the first action within a session, as the only action that will have a marketing touchpoint. The timestamp_event and timestamp_action will match. | timestamp |
transform_table_id | ID associated with the Rockerbox table used to apply mappings and spend *Closed beta feature only | int |
uid | Rockerbox User ID cookie | str |
updated_at | Time the cache record was updated most recently | timestamp |
user_agent | Web identifier that includes characteristics like browser, device, operating system, and application | str |
utm_campaign | utm_campaign value parsed from the landing page URL, if present | str |
utm_content | utm_content value parsed from the landing page URL, if present | str |
utm_id | utm_id value parsed from the landing page URL, if present | str |
utm_medium | utm_medium value parsed from the landing page URL, if present | str |
utm_source | utm_source value parsed from the landing page URL, if present | str |
utm_term | utm_term value parsed from the landing page URL, if present | str |
Clickstream FAQ
- What are marketers using this dataset for?
- For a quick start list of common use cases, visit our Use Case Library.
- What's the primary key for this table?
- event_id
- When should I expect this table to sync?
- The Clickstream dataset is updated daily, after your other core conversion datasets are synced.
- What's the backfill behavior for this table?
- This dataset is not backfilled, as it represents a "live" stream of onsite events vs other Rockerbox datasets that are aggregated from various data sources. This means you should not expected historical data to be changed after the initial sync.
- This also means that to your tier structure that might apply to click-based events reflected in this table will not be edited retroactively.
- How does Rockerbox handle a session that spans two dates (UTC)
- Rockerbox will break a session (generating a new session_id and date) if a user's session is active across two days. This may be common if users are active around UTC midnight.
- How does Rockerbox define a session? How can I make sure a session is unique when I query against the dataset?
- The session_id used by Rockerbox is a timestamp of the first event in a session, vs the session_id cached in your browser. This allows Rockerbox to maintain the same session_id when a user opens a new tab.
- Because session_id is defined as a timestamp, it's not unique per user. To identify all unique user sessions, join session_id|uid or filter for session_start = 1 in your query.
- A session "expires" after 30 minutes of inactivity. If the user is seen as active again, the session_id will be reset.
- A session is NOT re-set if new source information is provide (for example, a UTM on an internal page)
- Why are some fields for a given row in the dataset blank?
- Not all actions will have associated marketing context. Typically, only the first action in a session will pass along click-based marketing context in the URL. In your Rockerbox conversion data, the marketing context for each conversion event is carried over via this first action with marketing context. In this dataset, only the events that carry marketing context will have relevant fields populated to avoid any duplication.
- Can I see non-click marketing context?
- Non-click marketing context like view-based data from Linear TV, OTT, Display, and Social as well as other marketing context like promo code attribution or direct mail matching is not currently available in this dataset.
- How can I identify a repeat visitor?
- When a visitor returns to site, they may or make not have the same Rockerbox cookie ID (uid). To string together a user path when the uid is NOT the same, Rockerbox applies an identity resolution process to our conversion datasets. This is not currently available for this dataset.
- Users with > 200 events / day are excluded from this dataset under the assumption that these are admin users or server-side cookie IDs.
- What is an engaged session?
- An engaged session is defined by a session where the differences of the session_max and session_min timestamps are > 10 seconds.
- By logic, this means that a session with only 1 action cannot be an engaged session, since the session_min and session_max timestamps will be the same