Cheat sheet

Snowflake cheat sheet

Q: What is QUALIFY in Snowflake?

QUALIFY filters on a window function result without needing a subquery wrapper. Instead of SELECT FROM (SELECT , ROW_NUMBER() OVER (...) rn FROM t) WHERE rn = 1, write: SELECT * FROM t QUALIFY ROW_NUMBER() OVER (...) = 1. Available on Snowflake, BigQuery, DuckDB, Databricks, Teradata.

Q: What's a zero-copy CLONE?

CREATE TABLE orders_copy CLONE orders is instant, with no data copied initially. Both tables share the same underlying storage; modifications to either diverge them. Used for cheap test environments, dev databases that mirror prod, and snapshot-style "snapshot before this migration" patterns.

Q: How do I work with VARIANT JSON in Snowflake?

Path access: payload:user_id::INTEGER, payload:user.email::STRING. Cast is critical: without it, comparisons happen in VARIANT space and produce surprising results. To explode a JSON array into rows: SELECT f.value FROM events, LATERAL FLATTEN(input => events.payload:items) f.

Q: How do I bulk-load data into Snowflake?

Stage files in S3/Azure/GCS, then COPY INTO. CREATE STAGE my_stage URL='s3://bucket/path/'; COPY INTO orders FROM @my_stage FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1). Far faster than row-by-row INSERT for large datasets. ON_ERROR = 'CONTINUE' makes the load tolerant of bad rows.

Q: What's the difference between TIMESTAMP_NTZ, TIMESTAMP_TZ, and TIMESTAMP_LTZ?

_NTZ (no time zone) is a fixed wall-clock value, no TZ awareness. _TZ (with time zone) stores the offset alongside. _LTZ (local time zone) is TZ-aware but always rendered in the session's timezone. For event timestamps (audit columns, log entries): use _LTZ or _TZ. For "scheduled at 9am locally" semantics: _NTZ.

Snowflake-specific syntax: VARIANT and JSON via colon-path, MERGE, QUALIFY, time travel, COPY INTO, sequences, and the account → database → schema → table hierarchy.

Naming hierarchy and context #

sql

-- Hierarchy: ACCOUNT → DATABASE → SCHEMA → TABLE
SELECT * FROM my_db.public.orders;
SELECT * FROM "MY_DB"."PUBLIC"."ORDERS";       -- case-preserving identifiers (uppercased by default)

-- Set context.
USE ROLE     analyst;
USE WAREHOUSE my_wh;
USE DATABASE  my_db;
USE SCHEMA    my_db.public;

-- Show what you're in.
SELECT CURRENT_ACCOUNT(), CURRENT_DATABASE(), CURRENT_SCHEMA(), CURRENT_ROLE(), CURRENT_WAREHOUSE();

Data types #

sql

-- Numerics: NUMBER(38, s) is the universal — INT, BIGINT, DECIMAL all alias to NUMBER.
NUMBER(38, 0)                               -- integer
NUMBER(10, 2)                               -- money
FLOAT, DOUBLE, REAL                         -- all 64-bit IEEE 754 (no real distinction)

-- Strings: VARCHAR(n) — n is hint only; underlying storage same up to ~16 MB.
VARCHAR, TEXT, STRING                       -- all aliases for VARCHAR

-- Dates and timestamps.
DATE, TIME, TIMESTAMP_NTZ, TIMESTAMP_LTZ, TIMESTAMP_TZ
-- TIMESTAMP alone is configurable via TIMESTAMP_TYPE_MAPPING — be explicit.

-- Semi-structured: VARIANT, OBJECT, ARRAY.
VARIANT                                     -- can hold JSON, OBJECT, ARRAY, anything

-- Auto-increment.
id NUMBER AUTOINCREMENT START 1 INCREMENT 1 PRIMARY KEY
-- Or use a sequence:
CREATE SEQUENCE my_seq;
INSERT INTO t (id) VALUES (my_seq.NEXTVAL);

VARIANT and JSON access #

sql

-- Path access — colon for top-level, dot for nested.
SELECT payload:user_id::INTEGER             AS user_id,
       payload:user.email::STRING           AS email,
       payload:items::ARRAY                 AS items
FROM   events;

-- Get an array element (0-based).
SELECT payload:items[0]::STRING FROM events;

-- Test if a path exists.
SELECT * FROM events WHERE payload:event_type IS NOT NULL;

-- Build VARIANT.
SELECT OBJECT_CONSTRUCT('id', id, 'email', email) AS user_obj FROM users;
SELECT TO_VARIANT(some_value);

FLATTEN: explode arrays / objects #

sql

-- Explode an array column to one row per element.
SELECT e.id, f.value AS item
FROM   events e,
       LATERAL FLATTEN(input => e.payload:items) f;

-- Explode an object: one row per key/value pair.
SELECT e.id, f.key, f.value
FROM   events e, LATERAL FLATTEN(input => e.payload) f;

-- FLATTEN columns: SEQ, KEY, PATH, INDEX, VALUE, THIS.

QUALIFY (Snowflake's superpower) #

sql

-- Filter on window function results without a subquery wrapper.
SELECT *
FROM   orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) = 1;

-- Top 5 per region.
SELECT *
FROM   user_revenue
QUALIFY ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) <= 5;

MERGE (the canonical upsert) #

sql

MERGE INTO target AS t
USING source AS s
   ON t.id = s.id
WHEN MATCHED AND s.is_deleted THEN DELETE
WHEN MATCHED THEN UPDATE SET t.value = s.value, t.updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN INSERT (id, value) VALUES (s.id, s.value);

Time travel and zero-copy clones #

sql

-- Query a table as it was N minutes / hours / days ago.
SELECT * FROM orders AT(OFFSET => -60 * 5);                        -- 5 minutes ago
SELECT * FROM orders AT(TIMESTAMP => '2026-04-25 09:00:00'::TIMESTAMP);
SELECT * FROM orders BEFORE(STATEMENT => 'query_id_here');

-- Undo an accidental delete (within retention period).
INSERT INTO orders SELECT * FROM orders BEFORE(OFFSET => -300);     -- 5 minutes back

-- Zero-copy clone (instant, no storage cost initially).
CREATE TABLE orders_clone CLONE orders;
CREATE DATABASE prod_clone CLONE prod;

COPY INTO: bulk load and unload #

sql

-- Load from S3 / Azure / GCS via a stage.
CREATE STAGE my_stage URL='s3://bucket/path/' FILE_FORMAT = (TYPE = CSV);

COPY INTO orders
FROM @my_stage
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)
ON_ERROR = 'CONTINUE';

-- Unload (export).
COPY INTO @my_stage/exports/orders.csv
FROM orders
FILE_FORMAT = (TYPE = CSV)
HEADER = TRUE;

Date functions #

sql

CURRENT_DATE, CURRENT_TIMESTAMP, SYSDATE()

DATE_TRUNC('MONTH', d)             -- standard arg order
DATEDIFF('day', start, end)        -- (unit, start, end)
DATEADD('day', 7, d)
LAST_DAY(d, 'MONTH')

YEAR(d), MONTH(d), DAY(d)          -- shortcuts
QUARTER(d), DAYOFWEEK(d)

-- Parse / format.
TO_DATE('26/04/2026', 'DD/MM/YYYY')
TO_CHAR(d, 'YYYY-MM-DD')

SHOW commands #

sql

SHOW DATABASES;
SHOW SCHEMAS;
SHOW TABLES;
SHOW TABLES IN SCHEMA my_db.public;
SHOW COLUMNS IN TABLE orders;
SHOW WAREHOUSES;
SHOW ROLES;
SHOW GRANTS TO ROLE analyst;
SHOW GRANTS ON TABLE orders;

-- Recent queries.
SELECT query_id, query_text, total_elapsed_time, bytes_scanned
FROM   TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
ORDER  BY start_time DESC
LIMIT  20;

Approximate functions (Snowflake at scale) #

sql

APPROX_COUNT_DISTINCT(col)                  -- HLL-based, fast
APPROX_PERCENTILE(col, 0.95)
APPROX_TOP_K(col, 10)                       -- approx top-N

-- Sample rows.
SELECT * FROM events SAMPLE (1);            -- 1% sample (system / random)
SELECT * FROM events SAMPLE (1000 ROWS);    -- fixed size

FAQ #

What is QUALIFY in Snowflake?

QUALIFY filters on a window function result without needing a subquery wrapper. Instead of SELECT FROM (SELECT , ROW_NUMBER() OVER (...) rn FROM t) WHERE rn = 1, write: SELECT * FROM t QUALIFY ROW_NUMBER() OVER (...) = 1. Available on Snowflake, BigQuery, DuckDB, Databricks, Teradata.

How does Snowflake's Time Travel work?

Snowflake retains historical data for the configured period (1 day by default, up to 90 with Enterprise). Query as it was: SELECT FROM orders AT(OFFSET => -300) (5 minutes ago). Recover dropped data: INSERT INTO orders SELECT FROM orders BEFORE(OFFSET => -300). The killer feature for "oh no I just deleted those rows."

What's a zero-copy CLONE?

CREATE TABLE orders_copy CLONE orders is instant, with no data copied initially. Both tables share the same underlying storage; modifications to either diverge them. Used for cheap test environments, dev databases that mirror prod, and snapshot-style "snapshot before this migration" patterns.

How do I work with VARIANT JSON in Snowflake?

Path access: payload:user_id::INTEGER, payload:user.email::STRING. Cast is critical: without it, comparisons happen in VARIANT space and produce surprising results. To explode a JSON array into rows: SELECT f.value FROM events, LATERAL FLATTEN(input => events.payload:items) f.

How do I bulk-load data into Snowflake?

Stage files in S3/Azure/GCS, then COPY INTO. CREATE STAGE my_stage URL='s3://bucket/path/'; COPY INTO orders FROM @my_stage FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1). Far faster than row-by-row INSERT for large datasets. ON_ERROR = 'CONTINUE' makes the load tolerant of bad rows.

What's the difference between TIMESTAMP_NTZ, TIMESTAMP_TZ, and TIMESTAMP_LTZ?

_NTZ (no time zone) is a fixed wall-clock value, no TZ awareness. _TZ (with time zone) stores the offset alongside. _LTZ (local time zone) is TZ-aware but always rendered in the session's timezone. For event timestamps (audit columns, log entries): use _LTZ or _TZ. For "scheduled at 9am locally" semantics: _NTZ.

Tip

Snowflake's three superpowers worth memorizing: QUALIFY (window-filter without subquery), Time Travel (free instant point-in-time queries within retention), and zero-copy CLONE (instant database/schema/table copies for testing). The combination makes "test a migration on a copy of prod" a 1-second operation.