Recipe

Regex in SQL — match, replace, extract

Short answer: REGEXP_REPLACE(str, pattern, replacement) to substitute, REGEXP_LIKE(str, pattern) (or str ~ pattern in Postgres) to match, REGEXP_SUBSTR(str, pattern) to extract. The pattern syntax is mostly POSIX-extended (PCRE-ish), with engine-specific dialect quirks. Most modern engines have native regex support — Postgres, MySQL (8.0+), Oracle, Snowflake, BigQuery, DuckDB. SQLite requires a REGEXP function to be registered (most clients ship one); SQL Server only added native REGEXP_* functions in very recent versions.

Canonical SQL #

sql
-- Match (does the string contain a pattern?)
SELECT REGEXP_LIKE('user-2138', '^user-[0-9]+$');                       -- TRUE
SELECT 'user-2138' ~ '^user-[0-9]+$';                                   -- Postgres / DuckDB shorthand
SELECT 'user-2138' SIMILAR TO 'user-[0-9]+';                            -- ANSI SIMILAR TO (limited)

-- Replace.
SELECT REGEXP_REPLACE('orders/2026/04/138', '[0-9]+', '?', 'g');        -- Postgres / DuckDB: 'orders/?/?/?'
SELECT REGEXP_REPLACE('orders/2026/04/138', '[0-9]+', '?');             -- MySQL / Oracle / Snowflake / BigQuery

-- Extract (first match).
SELECT REGEXP_SUBSTR('contact: ana@example.com', '[a-z]+@[a-z.]+');     -- 'ana@example.com'
SELECT (REGEXP_MATCHES('contact: ana@example.com', '([a-z]+)@([a-z.]+)'))[1]; -- Postgres: 'ana'

Example #

Loading SQL editor...

Common patterns #

sql
-- Validate an email-like format (good enough for most analytics).
SELECT REGEXP_LIKE(email, '^[^@\s]+@[^@\s]+\.[^@\s]+$') AS looks_like_email
FROM users;

-- Strip non-digits from a phone number.
SELECT REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS digits_only FROM users;

-- Extract a YYYY-MM-DD date from free text.
SELECT REGEXP_SUBSTR(notes, '[0-9]{4}-[0-9]{2}-[0-9]{2}') AS first_date_in_notes
FROM customer_notes;

-- Anchored pattern: only "https://" URLs (not "http://").
SELECT * FROM links WHERE url ~ '^https://';

-- Case-insensitive: ~* in Postgres / DuckDB; (?i) prefix in others.
SELECT * FROM products WHERE name ~* 'sql';
SELECT * FROM products WHERE REGEXP_LIKE(name, '(?i)sql');

Dialect comparison #

EngineSyntaxGotcha
PostgreSQLstr ~ 'pattern'REGEXP_MATCHES here is a set-returning extraction function, not a boolean
MySQLREGEXP_LIKE(str, 'pattern')8.0+ only; 5.7 has just the REGEXP / RLIKE match operators
SQL ServerREGEXP_LIKE(str, 'pattern')2025+ only; older versions have no native regex at all
BigQueryREGEXP_CONTAINS(str, r'pattern')RE2 syntax — no lookbehind
SnowflakeREGEXP_LIKE(str, 'pattern')pattern is implicitly anchored to the whole string
OracleREGEXP_LIKE(str, 'pattern')it's a condition, not a function — usable in WHERE, not in SELECT
SQLitestr REGEXP 'pattern'works only if the client registers a regexp() function
DuckDBREGEXP_MATCHES(str, 'pattern')returns BOOLEAN — unlike Postgres' function of the same name

Dialect notes #

  • Postgres: ~ (match), !~ (not match), ~ (case-insensitive), !~. Functions: REGEXP_REPLACE, REGEXP_MATCH (one row), REGEXP_MATCHES (one row per match), REGEXP_SPLIT_TO_TABLE, REGEXP_SPLIT_TO_ARRAY. POSIX-extended syntax. Postgres 15+ adds REGEXP_LIKE, REGEXP_SUBSTR, REGEXP_COUNT.
  • MySQL (8.0+): REGEXP_LIKE, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_INSTR. Older 5.7: only REGEXP / RLIKE operators (match-only, no replace). ICU regex.
  • SQL Server: no native regex until very recently — LIKE with % and _ only. SQL Server 2025 adds REGEXP_LIKE / REGEXP_REPLACE / etc. For older versions: SQLCLR functions or pre-process in application code.
  • Oracle: REGEXP_LIKE, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_COUNT, REGEXP_INSTR. POSIX-extended.
  • Snowflake: REGEXP_LIKE, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_COUNT. Also RLIKE, REGEXP as match operators. Java regex syntax — slight differences from POSIX.
  • BigQuery: REGEXP_CONTAINS (match), REGEXP_REPLACE, REGEXP_EXTRACT, REGEXP_EXTRACT_ALL. RE2 syntax — no lookbehind, limited backreferences.
  • DuckDB: REGEXP_MATCHES (match), REGEXP_REPLACE, REGEXP_EXTRACT, REGEXP_EXTRACT_ALL. RE2-style.
  • SQLite: no native regex; REGEXP operator requires the application to register a function. Most clients ship one; raw sqlite3 does not.
Warning

Regex against an indexed column kills the index. WHERE email ~ '^ana@' cannot use a B-tree index on email even though the pattern is anchored to the prefix. Some engines (Postgres with text_pattern_ops, others with prefix indexes) can optimize anchored prefix matches, but in general regex predicates force a sequential scan. For prefix matching, prefer LIKE 'ana@%'. For exact extraction, store the extracted value in a separate column at write time so analytics queries hit it directly. Regex is the right tool for ad-hoc work — wrong tool for hot paths in production.