Regex in SQL — match, replace, extract
Short answer: REGEXP_REPLACE(str, pattern, replacement) to substitute, REGEXP_LIKE(str, pattern) (or str ~ pattern in Postgres) to match, REGEXP_SUBSTR(str, pattern) to extract. The pattern syntax is mostly POSIX-extended (PCRE-ish), with engine-specific dialect quirks. Most modern engines have native regex support — Postgres, MySQL (8.0+), Oracle, Snowflake, BigQuery, DuckDB. SQLite requires a REGEXP function to be registered (most clients ship one); SQL Server only added native REGEXP_* functions in very recent versions.
Canonical SQL #
-- Match (does the string contain a pattern?)
SELECT REGEXP_LIKE('user-2138', '^user-[0-9]+$'); -- TRUE
SELECT 'user-2138' ~ '^user-[0-9]+$'; -- Postgres / DuckDB shorthand
SELECT 'user-2138' SIMILAR TO 'user-[0-9]+'; -- ANSI SIMILAR TO (limited)
-- Replace.
SELECT REGEXP_REPLACE('orders/2026/04/138', '[0-9]+', '?', 'g'); -- Postgres / DuckDB: 'orders/?/?/?'
SELECT REGEXP_REPLACE('orders/2026/04/138', '[0-9]+', '?'); -- MySQL / Oracle / Snowflake / BigQuery
-- Extract (first match).
SELECT REGEXP_SUBSTR('contact: ana@example.com', '[a-z]+@[a-z.]+'); -- 'ana@example.com'
SELECT (REGEXP_MATCHES('contact: ana@example.com', '([a-z]+)@([a-z.]+)'))[1]; -- Postgres: 'ana'Example #
Common patterns #
-- Validate an email-like format (good enough for most analytics).
SELECT REGEXP_LIKE(email, '^[^@\s]+@[^@\s]+\.[^@\s]+$') AS looks_like_email
FROM users;
-- Strip non-digits from a phone number.
SELECT REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS digits_only FROM users;
-- Extract a YYYY-MM-DD date from free text.
SELECT REGEXP_SUBSTR(notes, '[0-9]{4}-[0-9]{2}-[0-9]{2}') AS first_date_in_notes
FROM customer_notes;
-- Anchored pattern: only "https://" URLs (not "http://").
SELECT * FROM links WHERE url ~ '^https://';
-- Case-insensitive: ~* in Postgres / DuckDB; (?i) prefix in others.
SELECT * FROM products WHERE name ~* 'sql';
SELECT * FROM products WHERE REGEXP_LIKE(name, '(?i)sql');Dialect comparison #
| Engine | Syntax | Gotcha |
|---|---|---|
| PostgreSQL | str ~ 'pattern' | REGEXP_MATCHES here is a set-returning extraction function, not a boolean |
| MySQL | REGEXP_LIKE(str, 'pattern') | 8.0+ only; 5.7 has just the REGEXP / RLIKE match operators |
| SQL Server | REGEXP_LIKE(str, 'pattern') | 2025+ only; older versions have no native regex at all |
| BigQuery | REGEXP_CONTAINS(str, r'pattern') | RE2 syntax — no lookbehind |
| Snowflake | REGEXP_LIKE(str, 'pattern') | pattern is implicitly anchored to the whole string |
| Oracle | REGEXP_LIKE(str, 'pattern') | it's a condition, not a function — usable in WHERE, not in SELECT |
| SQLite | str REGEXP 'pattern' | works only if the client registers a regexp() function |
| DuckDB | REGEXP_MATCHES(str, 'pattern') | returns BOOLEAN — unlike Postgres' function of the same name |
Dialect notes #
- Postgres:
~(match),!~(not match),~(case-insensitive),!~. Functions:REGEXP_REPLACE,REGEXP_MATCH(one row),REGEXP_MATCHES(one row per match),REGEXP_SPLIT_TO_TABLE,REGEXP_SPLIT_TO_ARRAY. POSIX-extended syntax. Postgres 15+ addsREGEXP_LIKE,REGEXP_SUBSTR,REGEXP_COUNT. - MySQL (8.0+):
REGEXP_LIKE,REGEXP_REPLACE,REGEXP_SUBSTR,REGEXP_INSTR. Older 5.7: onlyREGEXP/RLIKEoperators (match-only, no replace). ICU regex. - SQL Server: no native regex until very recently —
LIKEwith%and_only. SQL Server 2025 addsREGEXP_LIKE/REGEXP_REPLACE/ etc. For older versions: SQLCLR functions or pre-process in application code. - Oracle:
REGEXP_LIKE,REGEXP_REPLACE,REGEXP_SUBSTR,REGEXP_COUNT,REGEXP_INSTR. POSIX-extended. - Snowflake:
REGEXP_LIKE,REGEXP_REPLACE,REGEXP_SUBSTR,REGEXP_COUNT. AlsoRLIKE,REGEXPas match operators. Java regex syntax — slight differences from POSIX. - BigQuery:
REGEXP_CONTAINS(match),REGEXP_REPLACE,REGEXP_EXTRACT,REGEXP_EXTRACT_ALL. RE2 syntax — no lookbehind, limited backreferences. - DuckDB:
REGEXP_MATCHES(match),REGEXP_REPLACE,REGEXP_EXTRACT,REGEXP_EXTRACT_ALL. RE2-style. - SQLite: no native regex;
REGEXPoperator requires the application to register a function. Most clients ship one; rawsqlite3does not.
Regex against an indexed column kills the index. WHERE email ~ '^ana@' cannot use a B-tree index on email even though the pattern is anchored to the prefix. Some engines (Postgres with text_pattern_ops, others with prefix indexes) can optimize anchored prefix matches, but in general regex predicates force a sequential scan. For prefix matching, prefer LIKE 'ana@%'. For exact extraction, store the extracted value in a separate column at write time so analytics queries hit it directly. Regex is the right tool for ad-hoc work — wrong tool for hot paths in production.