Cheat sheet

SQL string functions cheat sheet

Q: How do I extract a substring from a string in SQL?

SUBSTRING(s FROM start FOR length) is the ANSI form; SUBSTRING(s, start, length) is the function form supported on most engines. SUBSTR is an alias on MySQL, SQLite, Oracle. LEFT(s, n) and RIGHT(s, n) are convenient shortcuts on Postgres, MySQL, SQL Server, DuckDB. Position is 1-based on every major engine.

Q: Why does my CONCAT return NULL?

On Postgres, DuckDB, SQLite, Oracle, Snowflake, Redshift: (and CONCAT on most) returns NULL if any operand is NULL. Use CONCAT_WS(' ', a, b, c) which skips NULLs, or wrap each side in COALESCE(col, ''). MySQL's CONCAT also propagates NULL; SQL Server's CONCAT (2012+) treats NULL as empty string.

Q: How do I split a comma-separated string into rows?

Postgres / DuckDB: UNNEST(STRING_TO_ARRAY(s, ',')). SQL Server: STRING_SPLIT(s, ','). BigQuery: UNNEST(SPLIT(s, ',')). Snowflake: LATERAL FLATTEN(input => SPLIT(s, ',')). MySQL has no native function; use a recursive CTE or numbers table.

Q: How do I do case-insensitive matching?

Three options. (1) LOWER(col) = LOWER('value'): portable but defeats indexes (use a functional index on LOWER(col)). (2) ILIKE, the Postgres / DuckDB shorthand for case-insensitive LIKE. (3) A COLLATE clause, which pushes the case-fold into the type system and lets the engine optimize.

Q: What's the difference between LIKE and REGEXP?

LIKE supports two wildcards: % (zero-or-more chars) and _ (single char). REGEXP (or ~ on Postgres) supports full regex: character classes, quantifiers, alternation, captures. Use LIKE for simple prefix/contains; reach for regex when the pattern is more complex. Both can defeat indexes when wrapping the column.

Common string operations across the major engines: length, case, trim, search, replace, split, concat, regex. Use this when the function name you remember from one engine isn't in the other.

Length, case, trim #

sql

LENGTH(s)                              -- character count (most engines)
LEN(s)                                 -- SQL Server (LEN ignores trailing spaces)
CHAR_LENGTH(s)                         -- ANSI alias
OCTET_LENGTH(s)                        -- byte count

UPPER(s)            LOWER(s)
INITCAP(s)                             -- Postgres / Oracle / Snowflake — Title Case Each Word

TRIM(s)                                -- both sides
LTRIM(s)            RTRIM(s)
TRIM(BOTH 'x' FROM s)                  -- trim a specific character (ANSI)

Substring extraction #

sql

SUBSTRING(s FROM 1 FOR 5)              -- ANSI form
SUBSTRING(s, 1, 5)                     -- function form (most engines)
SUBSTR(s, 1, 5)                        -- MySQL / SQLite / Oracle / DuckDB / Snowflake alias

LEFT(s, 5)                             -- first 5 chars
RIGHT(s, 5)                            -- last 5 chars (most engines, not Oracle)

-- Find a position.
POSITION('@' IN email)                 -- ANSI
STRPOS(s, 'sub')                       -- Postgres / DuckDB
LOCATE('sub', s)                       -- MySQL
CHARINDEX('sub', s)                    -- SQL Server
INSTR(s, 'sub')                        -- Oracle / SQLite / MySQL / DuckDB

Concatenate #

sql

a || b                                 -- ANSI (Postgres / DuckDB / SQLite / Oracle / Snowflake / Redshift)
CONCAT(a, b, c)                        -- portable function form
CONCAT_WS(', ', city, state, zip)      -- with separator, skips NULLs (most engines)
a + b                                  -- SQL Server (only)

-- NULL trap on most engines:
a || NULL                              -- returns NULL — every concat with NULL becomes NULL
                                       -- use CONCAT_WS or wrap each side in COALESCE(col, '')

Replace and pad #

sql

REPLACE(s, 'old', 'new')               -- replace ALL occurrences
TRANSLATE(s, 'abc', 'xyz')             -- char-for-char remap (Postgres / Oracle / Snowflake / DuckDB)

LPAD(s, 10, '0')                       -- pad left to length 10 with '0'
RPAD(s, 10, ' ')                       -- pad right
REPEAT(s, 3)                           -- 'abcabcabc'

REVERSE(s)                             -- 'abc' -> 'cba' (Postgres / MySQL / SQL Server / Snowflake / DuckDB)

Split by delimiter #

sql

-- Get one piece by index.
SPLIT_PART('a,b,c', ',', 2)            -- 'b' (Postgres / DuckDB / Snowflake / Redshift)
SUBSTRING_INDEX('a,b,c', ',', 2)       -- 'a,b' (MySQL — note: returns up-to-N)
REGEXP_SUBSTR('a,b,c', '[^,]+', 1, 2)  -- 'b' (Oracle, regex)

-- Explode into rows.
UNNEST(STRING_TO_ARRAY('a,b,c', ','))  -- Postgres / DuckDB
STRING_SPLIT('a,b,c', ',')             -- SQL Server 2016+
UNNEST(SPLIT('a,b,c', ','))            -- BigQuery
LATERAL FLATTEN(input => SPLIT('a,b,c', ',')) -- Snowflake

Aggregate strings (rows → list) #

sql

STRING_AGG(col, ', ' ORDER BY col)                         -- Postgres / SQL Server 2017+ / DuckDB / Snowflake / BigQuery
LISTAGG(col, ', ') WITHIN GROUP (ORDER BY col)             -- Oracle / Snowflake / Redshift
GROUP_CONCAT(col ORDER BY col SEPARATOR ', ')              -- MySQL / SQLite

-- Watch the truncation defaults: MySQL 1024 bytes, Oracle 4000 bytes.
SET SESSION group_concat_max_len = 1000000;                -- raise on MySQL

LIKE patterns #

sql

WHERE s LIKE 'abc%'                    -- starts with 'abc'
WHERE s LIKE '%abc'                    -- ends with 'abc'
WHERE s LIKE '%abc%'                   -- contains 'abc'
WHERE s LIKE 'a_c'                     -- 'a' + any single char + 'c'
WHERE s LIKE 'abc\%' ESCAPE '\'       -- literal '%' (escape required)

-- Case-insensitive.
WHERE LOWER(s) LIKE 'abc%'             -- portable (defeats index)
WHERE s ILIKE 'abc%'                   -- Postgres / DuckDB

-- Anchored: prefix LIKE 'abc%' uses index; '%abc' or '%abc%' usually doesn't.

Regex #

sql

-- Match.
REGEXP_LIKE(s, '^abc[0-9]+$')          -- most engines
s ~ '^abc'                             -- Postgres / DuckDB
s ~* '^abc'                            -- Postgres / DuckDB case-insensitive

-- Replace.
REGEXP_REPLACE(s, '[0-9]+', '?')       -- replace first match (or all, depending on engine)
REGEXP_REPLACE(s, '[0-9]+', '?', 'g')  -- explicitly all (Postgres / DuckDB)

-- Extract.
REGEXP_SUBSTR(s, '[a-z]+@[a-z.]+')     -- first email-like substring
(REGEXP_MATCHES(s, '([a-z]+)@([a-z.]+)'))[1]   -- Postgres: capture group 1
REGEXP_EXTRACT(s, '[0-9]{4}')          -- BigQuery / DuckDB

Type casts to/from string #

sql

CAST(value AS VARCHAR)                 -- portable
value::VARCHAR                         -- Postgres / DuckDB shorthand
CAST(s AS INTEGER)                     -- string → number
CAST(s AS DATE)                        -- string → date

-- Safe casts (return NULL on failure instead of erroring).
TRY_CAST(s AS INTEGER)                 -- DuckDB / SQL Server / Snowflake
SAFE_CAST(s AS INT64)                  -- BigQuery
TO_NUMBER(s, '999.99')                 -- Oracle / Postgres with format

Common patterns #

sql

-- Pull domain out of email.
SELECT SPLIT_PART(email, '@', 2) AS domain FROM users;

-- Strip non-digits from a phone number.
SELECT REGEXP_REPLACE(phone, '[^0-9]', '', 'g') FROM users;

-- Build a full name (NULL-safe).
SELECT CONCAT_WS(' ', first_name, last_name) FROM users;

-- Case-insensitive search with index-friendliness on Postgres.
WHERE email ILIKE 'ana@%';
-- Or: lower-cased index: CREATE INDEX users_email_lower_idx ON users (LOWER(email));
--     WHERE LOWER(email) = LOWER('Ana@example.com');

-- Truncate a long string for display.
SELECT
  CASE WHEN LENGTH(description) > 100
       THEN SUBSTRING(description, 1, 100) || '...'
       ELSE description END AS short_desc
FROM products;

Common mistakes #

NULL poisoning with ||: any NULL in the chain returns NULL for the whole expression. Use CONCAT_WS or COALESCE(col, '').
Wrapping an indexed column in a function: WHERE LOWER(email) = ... defeats a plain index. Use ILIKE (Postgres / DuckDB) or a functional index on LOWER(email).
MySQL's ||: defaults to logical OR, not concat, so 'a' || 'b' is 1. Use CONCAT for portability.
POSITION is 1-based, not 0-based. POSITION('@' IN 'a@b') returns 2.
String-typed JSON extraction: (payload ->> 'amount') > 100 does string comparison ("9" > "100"). Always cast: (payload ->> 'amount')::DECIMAL.

FAQ #

How do I extract a substring from a string in SQL?

SUBSTRING(s FROM start FOR length) is the ANSI form; SUBSTRING(s, start, length) is the function form supported on most engines. SUBSTR is an alias on MySQL, SQLite, Oracle. LEFT(s, n) and RIGHT(s, n) are convenient shortcuts on Postgres, MySQL, SQL Server, DuckDB. Position is 1-based on every major engine.

Why does my CONCAT return NULL?

On Postgres, DuckDB, SQLite, Oracle, Snowflake, Redshift: || (and CONCAT on most) returns NULL if any operand is NULL. Use CONCAT_WS(' ', a, b, c) which skips NULLs, or wrap each side in COALESCE(col, ''). MySQL's CONCAT also propagates NULL; SQL Server's CONCAT (2012+) treats NULL as empty string.

How do I split a comma-separated string into rows?

Postgres / DuckDB: UNNEST(STRING_TO_ARRAY(s, ',')). SQL Server: STRING_SPLIT(s, ','). BigQuery: UNNEST(SPLIT(s, ',')). Snowflake: LATERAL FLATTEN(input => SPLIT(s, ',')). MySQL has no native function; use a recursive CTE or numbers table.

How do I do case-insensitive matching?

Three options. (1) LOWER(col) = LOWER('value'): portable but defeats indexes (use a functional index on LOWER(col)). (2) ILIKE, the Postgres / DuckDB shorthand for case-insensitive LIKE. (3) A COLLATE clause, which pushes the case-fold into the type system and lets the engine optimize.

What's the difference between LIKE and REGEXP?

LIKE supports two wildcards: % (zero-or-more chars) and _ (single char). REGEXP (or ~ on Postgres) supports full regex: character classes, quantifiers, alternation, captures. Use LIKE for simple prefix/contains; reach for regex when the pattern is more complex. Both can defeat indexes when wrapping the column.

How do I aggregate multiple rows into a single delimited string?

Postgres / SQL Server (2017+) / DuckDB / Snowflake / BigQuery: STRING_AGG(col, ', ' ORDER BY col). Oracle / Snowflake / Redshift: LISTAGG(col, ', ') WITHIN GROUP (ORDER BY col). MySQL / SQLite: GROUP_CONCAT(col ORDER BY col SEPARATOR ', '). Watch the truncation defaults: MySQL caps at 1024 bytes by default.

Tip

**For string manipulation that doesn't need to happen in SQL, clean it up at ingest, not at every query.** Splitting strings, parsing dates from text, normalizing case: each one repeated across a hundred queries is real cost. Doing it once at the ingest layer (or on materialized columns) makes downstream SQL simpler and faster. Detail pages: Substring, Split string, Regex, Concatenate, Concatenate rows.