Cheat sheet

BigQuery cheat sheet

Q: How does BigQuery charge for queries?

On-demand pricing charges per byte scanned, not per row, not per query. SELECT * scans every column even if you only need one; named column lists scan less. Partition filters (WHERE _PARTITIONDATE = ...) skip whole partitions. A dry run (bq query --dry_run or the query validator in the console) shows estimated bytes before you run; check it for expensive queries.

Q: How do I work with STRUCT and ARRAY in BigQuery?

STRUCT is a record/object; access fields with dot notation: user.email. ARRAY holds an ordered list of values (or structs). To turn an array into rows, use UNNEST: SELECT u.user_id, o.* FROM users u, UNNEST(u.orders) AS o. The STRUCT(...) constructor and ARRAY[...] literal build them inline.

Q: Why is BigQuery's DATE_TRUNC argument order different?

BigQuery uses DATE_TRUNC(date_col, MONTH): date first, then unit. Postgres / Snowflake use DATE_TRUNC('month', date_col), which is unit first, then date. There's no good reason for the divergence; just remember to swap when porting code.

Q: How do I write a query that returns all columns except a few?

SELECT EXCEPT (password, secret_key) FROM users. BigQuery-only feature; very useful when a table has 50+ columns and you want most of them. Combine with REPLACE to modify columns: SELECT REPLACE (UPPER(email) AS email) FROM users.

Q: How do I check the cost of a query before running it?

Run a dry run: the --dry_run flag in the bq CLI (or the dryRun API option) returns estimated bytes scanned without executing. The web UI also shows the estimate before you click Run. Approximate functions (APPROX_COUNT_DISTINCT, APPROX_QUANTILES) trade tiny accuracy loss for big cost savings on huge datasets.

BigQuery-specific syntax: STRUCT, ARRAY, UNNEST, partitioned tables, project.dataset.table qualification, SELECT EXCEPT, SELECT REPLACE, and the dialect quirks (DATE_TRUNC argument order, RE2 regex, per-byte pricing).

Naming and qualification #

sql

-- Table references need at minimum dataset.table; usually project.dataset.table.
SELECT * FROM `my-project.analytics.orders`;
SELECT * FROM `my-project.analytics`.orders;            -- dataset-qualified
SELECT * FROM analytics.orders;                          -- relies on default project

-- Datasets live inside a project; tables live inside a dataset.
-- "Dataset" in BigQuery = "schema" in Postgres.

-- Wildcard tables (event_20260401, event_20260402, ...).
SELECT * FROM `my-project.events.event_20260401`;
SELECT * FROM `my-project.events.event_*`
WHERE  _TABLE_SUFFIX BETWEEN '20260401' AND '20260430';

Data types and STRUCT #

sql

-- Types: INT64, FLOAT64, NUMERIC, BIGNUMERIC, BOOL, STRING, BYTES,
--        DATE, DATETIME, TIME, TIMESTAMP, GEOGRAPHY, JSON, ARRAY<T>, STRUCT.

-- STRUCT (record / nested object).
SELECT STRUCT(id, email, plan) AS user_record FROM users;
SELECT STRUCT('Ana' AS name, 30 AS age);                   -- inline literal

-- Access STRUCT fields with dot notation.
SELECT user_record.email FROM users_struct_table;

-- ARRAY of STRUCT (the BigQuery idiom for nested data).
SELECT
  user_id,
  ARRAY_AGG(STRUCT(order_id, total)) AS orders
FROM orders
GROUP BY user_id;

UNNEST: explode arrays into rows #

sql

-- Single-column array.
SELECT id, tag
FROM   posts, UNNEST(tags) AS tag;

-- Array of structs (most common BigQuery pattern).
SELECT
  u.user_id,
  o.order_id,
  o.total
FROM   user_orders u, UNNEST(u.orders) AS o;

-- Keep parents with empty arrays via LEFT JOIN UNNEST.
SELECT u.user_id, o.order_id
FROM   user_orders u
LEFT JOIN UNNEST(u.orders) AS o ON TRUE;

-- WITH OFFSET to get array index.
SELECT day, ARRAY_LENGTH(events) AS n, e, idx
FROM   daily_events, UNNEST(events) AS e WITH OFFSET AS idx;

SELECT * EXCEPT and REPLACE #

sql

-- Drop columns from SELECT *.
SELECT * EXCEPT (password, secret_key) FROM users;

-- Modify columns in SELECT *.
SELECT * REPLACE (UPPER(email) AS email) FROM users;

-- Combine.
SELECT * EXCEPT (raw_payload) REPLACE (UPPER(email) AS email) FROM users;

Date functions (BigQuery argument order) #

sql

CURRENT_DATE()                              -- today (note parens)
CURRENT_DATETIME()                          -- no timezone
CURRENT_TIMESTAMP()                         -- UTC

-- DATE_TRUNC: argument order is (date, part) — opposite of Postgres!
DATE_TRUNC(order_date, MONTH)               -- first of month
DATETIME_TRUNC(ts, DAY)
TIMESTAMP_TRUNC(ts, HOUR)

EXTRACT(YEAR FROM order_date)
DATE_DIFF(end, start, DAY)                  -- (end, start, unit)
DATE_ADD(d, INTERVAL 7 DAY)
DATE_SUB(d, INTERVAL 1 MONTH)

LAST_DAY(d, MONTH)                          -- with unit

-- Parse / format.
PARSE_DATE('%d/%m/%Y', '26/04/2026')
FORMAT_DATE('%Y-%m-%d', d)

Strings (BigQuery quirks) #

sql

-- Concatenation: CONCAT or the || operator (STRING, BYTES, ARRAY).
CONCAT('a', 'b', 'c')
'a' || 'b' || 'c'

-- Common functions.
SUBSTR(s, 1, 5)
LENGTH(s)                                   -- character count
SPLIT(s, ',')                               -- returns ARRAY<STRING>
UNNEST(SPLIT(s, ',')) AS part               -- explode to rows

REGEXP_CONTAINS(s, 'pattern')               -- match (RE2 syntax)
REGEXP_REPLACE(s, 'pattern', 'replacement')
REGEXP_EXTRACT(s, 'pattern')                -- first match
REGEXP_EXTRACT_ALL(s, 'pattern')            -- all matches as array

JSON #

sql

JSON_VALUE(payload, '$.user_id')                     -- scalar string
JSON_QUERY(payload, '$.user')                        -- nested JSON
JSON_EXTRACT_SCALAR(payload, '$.user_id')            -- legacy alias
UNNEST(JSON_EXTRACT_ARRAY(payload, '$.items')) AS i  -- array → rows
UNNEST(JSON_QUERY_ARRAY(payload, '$.items'))         -- new JSON-typed version

-- Build.
TO_JSON(some_struct)
JSON_OBJECT('id', id, 'email', email)

Partitioned and clustered tables #

sql

-- Create a partitioned table (filters by partition column become free).
CREATE TABLE `my-project.analytics.events`
PARTITION BY DATE(occurred_at)
CLUSTER BY user_id, event_type
AS SELECT * FROM source_events;

-- Query a single partition (cheap).
SELECT * FROM `my-project.analytics.events`
WHERE DATE(occurred_at) = DATE '2026-04-26';

-- Estimated bytes scanned (matters because BigQuery charges per byte):
-- use a dry run (bq query --dry_run, or the console's query validator).
-- BigQuery has no EXPLAIN statement.

Cost-aware querying #

sql

-- BigQuery charges by bytes scanned. SELECT * scans every column.
-- Always prefer column lists.
SELECT user_id, total FROM orders;

-- Filter to a partition to limit scanned bytes.
WHERE _PARTITIONDATE = DATE '2026-04-26'

-- LIMIT does NOT reduce bytes scanned (whole table is still read).

-- Approximate functions for big-data scale.
APPROX_COUNT_DISTINCT(user_id)
APPROX_QUANTILES(total, 100)[OFFSET(95)] AS p95
APPROX_TOP_COUNT(category, 10)

INFORMATION_SCHEMA: region-scoped #

sql

-- Tables in a dataset.
SELECT table_name
FROM   `my-project.my_dataset.INFORMATION_SCHEMA.TABLES`;

-- Column metadata.
SELECT column_name, data_type, is_nullable
FROM   `my-project.my_dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE  table_name = 'orders';

-- Region-scoped (works across all datasets in a region).
SELECT * FROM `region-us.INFORMATION_SCHEMA.JOBS`
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 1 HOUR);

FAQ #

How does BigQuery charge for queries?

On-demand pricing charges per byte scanned, not per row, not per query. SELECT * scans every column even if you only need one; named column lists scan less. Partition filters (WHERE _PARTITIONDATE = ...) skip whole partitions. A dry run (bq query --dry_run or the query validator in the console) shows estimated bytes before you run; check it for expensive queries.

How do I work with STRUCT and ARRAY in BigQuery?

STRUCT is a record/object; access fields with dot notation: user.email. ARRAY holds an ordered list of values (or structs). To turn an array into rows, use UNNEST: SELECT u.user_id, o.* FROM users u, UNNEST(u.orders) AS o. The STRUCT(...) constructor and ARRAY[...] literal build them inline.

Why is BigQuery's DATE_TRUNC argument order different?

BigQuery uses DATE_TRUNC(date_col, MONTH): date first, then unit. Postgres / Snowflake use DATE_TRUNC('month', date_col), which is unit first, then date. There's no good reason for the divergence; just remember to swap when porting code.

How do I write a query that returns all columns except a few?

SELECT EXCEPT (password, secret_key) FROM users. BigQuery-only feature; very useful when a table has 50+ columns and you want most of them. Combine with REPLACE to modify columns: SELECT REPLACE (UPPER(email) AS email) FROM users.

What's a partitioned table and when should I use one?

A partitioned table is split into segments by a date column (or integer range). Filtering on the partition column skips entire segments, making it much faster and much cheaper. Always partition tables that grow daily and are filtered by date. CREATE TABLE ... PARTITION BY DATE(occurred_at).

How do I check the cost of a query before running it?

Run a dry run: the --dry_run flag in the bq CLI (or the dryRun API option) returns estimated bytes scanned without executing. The web UI also shows the estimate before you click Run. Approximate functions (APPROX_COUNT_DISTINCT, APPROX_QUANTILES) trade tiny accuracy loss for big cost savings on huge datasets.

Tip

BigQuery's mental model is different from row-store databases. It's a columnar warehouse that charges per-byte-scanned, so the optimization patterns shift: prefer narrow column lists, partition on date columns, cluster on filters, use approximate functions for huge cardinalities. STRUCT and ARRAY are first-class types, so embrace them rather than fighting them with normalization.