Recipe

ORDER BY random: pick random rows

Short answer: ORDER BY RANDOM() (Postgres / DuckDB / SQLite / Snowflake) or ORDER BY RAND() (MySQL / BigQuery) or ORDER BY NEWID() (SQL Server). Combined with LIMIT, gives you "random N rows." The naive form sorts the entire table, which is slow on large tables. For sampling at scale, TABLESAMPLE (Postgres / SQL Server) or SAMPLE (Snowflake / BigQuery) is much faster.

Canonical SQL #

sql
-- Random N rows (slow on big tables — sorts entire table).
SELECT * FROM users ORDER BY RANDOM() LIMIT 10;        -- Postgres / DuckDB / SQLite / Snowflake
SELECT * FROM users ORDER BY RAND()    LIMIT 10;       -- MySQL / BigQuery
SELECT TOP 10 * FROM users ORDER BY NEWID();           -- SQL Server

-- Faster: TABLESAMPLE (system / bernoulli).
SELECT * FROM events TABLESAMPLE BERNOULLI(1);          -- Postgres: ~1% of rows, randomly
SELECT * FROM events SAMPLE (1000 ROWS);                -- Snowflake: 1000 random rows
SELECT * FROM events TABLESAMPLE (1 PERCENT);           -- SQL Server / BigQuery similar
-- TABLESAMPLE picks blocks/pages, not rows — much faster, slightly less uniform.

-- Reproducible random sample (seeded).
SELECT * FROM users TABLESAMPLE BERNOULLI(1) REPEATABLE(42);   -- Postgres

Example #

Loading SQL editor...

Common patterns #

sql
-- Random row per group (e.g., one random product per category).
SELECT *
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY RANDOM()) AS rn
  FROM products
) ranked
WHERE rn = 1;

-- Stratified sample: 3 random customers per country.
SELECT *
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY country ORDER BY RANDOM()) AS rn
  FROM ecom_customers
) ranked
WHERE rn <= 3;

-- Reproducible "random" assignment for A/B testing.
SELECT
  id AS user_id,
  CASE WHEN MOD(ABS(HASH(id)), 2) = 0 THEN 'control' ELSE 'treatment' END AS variant
FROM users;
-- Same user → same bucket forever. Re-runnable. See: modulo-and-remainder recipe.

-- Random N values from a generated series.
SELECT n FROM generate_series(1, 100) AS t(n) ORDER BY RANDOM() LIMIT 10;

Performance notes #

  • ORDER BY RANDOM() LIMIT N sorts the entire table to find the random N. On a billion-row table, that's hours. Use TABLESAMPLE instead for big tables.
  • TABLESAMPLE BERNOULLI(p) rolls a random number per row: uniform but still scans every page.
  • TABLESAMPLE SYSTEM(p) picks random pages; faster but produces clustered samples (rows on the same page tend to come from the same source). Fine for big-data sampling, bad for tiny populations.
  • For sampling that has to be reproducible, hash-and-mod (MOD(ABS(HASH(id)), 100) = 0) beats RANDOM: same input rows always produce the same sample.
  • Don't use ORDER BY RANDOM() for "random user signup display" in a high-traffic web app; it scans the entire users table on every page load. Pre-shuffle and store, or use a hash-based pseudo-random.

Dialect comparison #

EngineSyntaxGotcha
PostgreSQLORDER BY RANDOM()full-table sort; TABLESAMPLE BERNOULLI(1) samples without sorting
MySQLORDER BY RAND()no RANDOM(), no TABLESAMPLE; RAND(seed) for reproducibility
SQL ServerORDER BY NEWID()RAND() is evaluated once per query, giving the same value for every row
BigQueryORDER BY RAND()no RANDOM(); TABLESAMPLE SYSTEM (1 PERCENT) for cheap sampling
SnowflakeORDER BY RANDOM()SAMPLE (1000 ROWS) is far cheaper at scale
OracleORDER BY DBMS_RANDOM.VALUEpackage function, not built-in; SAMPLE(1) clause for fast sampling
SQLiteORDER BY RANDOM()RANDOM() returns a 64-bit integer, not a value in [0, 1)
DuckDBORDER BY RANDOM()USING SAMPLE 10 is the fast path for big tables

Dialect notes #

  • RANDOM() returns a value in [0, 1) on Postgres and DuckDB; on SQLite and Snowflake it returns a random 64-bit integer (fine for ORDER BY, wrong for "fraction of rows" filters). BigQuery has RAND() (in [0, 1)), not RANDOM().
  • TABLESAMPLE: Postgres (BERNOULLI / SYSTEM), SQL Server (SYSTEM-style), BigQuery (SYSTEM).
  • SAMPLE: Snowflake / Databricks have richer SAMPLE clauses with row-count or percent options.
  • Seeding: Postgres SETSEED(0.5) before RANDOM(); TABLESAMPLE ... REPEATABLE(seed) for reproducible samples. MySQL RAND(seed).
Tip

Three different "random" patterns, three different jobs. (1) ORDER BY RANDOM() LIMIT N: show me N random rows, and I don't care if it's slow because the table is small. (2) TABLESAMPLE: fast statistical sample of a huge table, OK with imperfect uniformity. (3) hash-and-mod: reproducible "random" assignment that's the same every run, perfect for A/B tests and stable sampling. Pick by the use case, not by what comes to mind first. Detail: Modulo and remainder.