String Functions
AdvancedConcept
String functions manipulate text data: extracting substrings, changing case, trimming whitespace, searching for patterns, and concatenating values. In data-heavy applications, string functions are essential for cleaning data, formatting output, parsing semi-structured text, and building dynamic values.
Common functions include UPPER/LOWER for case conversion, TRIM/LTRIM/RTRIM for whitespace removal, SUBSTRING for extracting parts of a string, LENGTH for character count, REPLACE for substitution, and CONCAT or || for joining strings together. POSITION (or STRPOS) finds the location of a substring within a string.
Regular expression functions like REGEXP_MATCHES (PostgreSQL) or REGEXP_LIKE (Oracle/MySQL) provide powerful pattern matching beyond what LIKE can do. However, regex in SQL can be slow on large datasets because it typically cannot use indexes. Use it for data cleaning and validation rather than high-volume filtering.
Syntax
-- Case conversion and trimming
SELECT
UPPER(first_name) AS upper_name,
LOWER(email) AS lower_email,
TRIM(address) AS clean_address
FROM customers;
-- Substring and length
SELECT
name,
SUBSTRING(name FROM 1 FOR 10) AS short_name,
LENGTH(name) AS name_length
FROM products;
-- Concatenation and replacement
SELECT
CONCAT(first_name, ' ', last_name) AS full_name,
REPLACE(phone, '-', '') AS clean_phone
FROM customers;Practical Example
Using the ecommerce schema (customers, orders, order_items, products)
-- Clean and standardize customer email domains
SELECT
email,
LOWER(SUBSTRING(email FROM POSITION('@' IN email) + 1)) AS domain,
COUNT(*) AS customer_count
FROM customers
GROUP BY
email,
LOWER(SUBSTRING(email FROM POSITION('@' IN email) + 1))
ORDER BY customer_count DESC;
-- Search products with flexible name matching
SELECT
product_id,
name,
category,
price
FROM products
WHERE LOWER(name) LIKE '%' || LOWER('wireless') || '%'
OR LOWER(category) LIKE '%' || LOWER('wireless') || '%'
ORDER BY name;
-- Format order summary with string building
SELECT
o.order_id,
'Order #' || o.order_id || ' - ' ||
TO_CHAR(o.order_date, 'Mon DD, YYYY') || ' - $' ||
TO_CHAR(o.total_amount, 'FM999,999.00') AS order_summary
FROM orders o
ORDER BY o.order_date DESC
LIMIT 10;Common Pitfalls & Tips
- 1String function names vary widely across databases. PostgreSQL uses || for concatenation; MySQL uses CONCAT(); SQL Server uses +. Check your database's documentation.
- 2LIKE and string comparisons may be case-sensitive depending on the database and collation. Use LOWER() or ILIKE (PostgreSQL) for case-insensitive searches.
- 3String operations on large tables are slow because they typically cannot use indexes. For full-text search on large datasets, use dedicated full-text search features like tsvector/tsquery in PostgreSQL.