Regular Expressions

Overview

Regular expressions (regex) are patterns used to match character combinations in strings. They are supported by virtually every programming language and many command-line tools (e.g. grep, sed, awk). A regex engine scans the input string and checks whether (and where) the pattern matches.

Basics

Character Classes

Match a single character from a defined set.

Syntax	Meaning
`.`	Any character except newline
`[abc]`	One of `a`, `b`, or `c`
`[^abc]`	Any character except `a`, `b`, or `c`
`[a-z]`	Any lowercase letter
`[0-9]`	Any digit
`\d`	Digit (`[0-9]`)
`\D`	Non-digit (`[^0-9]`)
`\w`	Word character (`[a-zA-Z0-9_]`)
`\W`	Non-word character (`[^a-zA-Z0-9_]`)
`\s`	Whitespace (`[ \t\n\r\f\v]`)
`\S`	Non-whitespace (`[^\t\n\r\f\v]`)

Examples

Pattern: [A-Z]\w+
Input:   "Hello World 123"
Matches: Hello, World

[A-Z] matches one uppercase letter, \w+ then matches one or more word characters after it. 123 has no uppercase letter at the start, so it is skipped. \w does include digits, but [A-Z] restricts the first character to letters only.

Pattern: \d\d\d
Input:   "Call 555-1234"
Matches: 555, 123

Three consecutive digits. The - breaks the sequence, so 1234 produces two overlapping windows but only 123 matches as a complete three-digit group (the engine then continues at 4, which alone is not enough).

Quantifiers

Control how many times the preceding element must occur.

Syntax	Meaning
`*`	0 or more (greedy)
`+`	1 or more (greedy)
`?`	0 or 1 (optional)
`{n}`	Exactly n times
`{n,}`	n or more times
`{n,m}`	Between n and m times

Examples

Pattern: colou?r
Input:   "color and colour"
Matches: color, colour

The ? makes the u optional, so both color (0 times u) and colour (1 time u) match.

Pattern: \d{2,4}
Input:   "1 22 333 4444 55555"
Matches: 22, 333, 4444, 5555

Matches between 2 and 4 consecutive digits. 1 is too short. 55555 yields 5555 (greedy, so the engine takes the maximum 4) and the remaining 5 is too short for another match.

Anchors

Match a position rather than a character.

Syntax	Meaning
`^`	Start of string (or line with `m`)
`$`	End of string (or line with `m`)
`\b`	Word boundary
`\B`	Non-word boundary

Examples

Pattern: \bcat\b
Matches: "the cat sat"    => cat
No match: "concatenate"

\b marks the boundary between a word character and a non-word character. In concatenate, cat is surrounded by other letters, so \b does not match at those positions.

Pattern: ^\d+
Input:   "42 is the answer"
Match:   42

^ anchors the match to the start of the string. \d+ then matches one or more digits from that position. Since 42 is at the very beginning, it matches.

Pattern: \.$
Input:   "End of sentence."
Match:   .

$ anchors the match to the end of the string. \. matches a literal dot (escaped because . normally means "any character"). Together they match a dot at the end of the string.

Groups and Alternation

Parentheses () create groups that capture the matched substring.

Pattern: (foo)(bar)
Input:   foobar
Group 1: foo
Group 2: bar

Each pair of () creates a numbered group. The full match is foobar, but the groups let you access foo and bar individually (e.g. for search-and-replace or extraction).

The pipe | acts as a logical OR.

Pattern: cat|dog
Matches: cat, dog

The engine tries cat first, and if that fails at the current position, it tries dog.

Pattern: (\d{3})-(\d{4})
Input:   "555-1234"
Group 1: 555
Group 2: 1234

Groups can capture parts of a structured string separately. Here the area code and number are split into two groups, while the - is matched but not captured.

Flags

Flags modify how the pattern is applied.

Flag	Name	Effect
`g`	Global	Find all matches, not just the first
`i`	Case-insensitive	Ignore upper/lower case
`m`	Multiline	`^` and `$` match start/end of each line
`s`	Dotall	`.` also matches newline characters
`u`	Unicode	Treat pattern and input as Unicode

Examples

Pattern (no flag): /hello/
Input:   "Hello World"
No match

Pattern (with i): /hello/i
Input:   "Hello World"
Match:   Hello

Without the i flag, hello does not match Hello because the H is uppercase. With the i flag, case is ignored and the match succeeds.

Advanced Patterns

Greedy vs. Lazy

Greedy (default): matches as much as possible
Lazy (append ?): matches as little as possible

Syntax	Meaning
`*?`	0 or more (lazy)
`+?`	1 or more (lazy)
`??`	0 or 1 (lazy)

Examples

Input:   <b>bold</b> and <b>more</b>

Greedy:  <.*>   => 1 match:  <b>bold</b> and <b>more</b>
Lazy:    <.*?>  => 4 matches: <b>, </b>, <b>, </b>

Greedy .* expands as far as possible, matching from the first < to the very last >, the entire string in one match. Lazy .*? stops at the earliest possible >, so each tag is matched individually.

Non-Capturing Groups

Use (?:...) when grouping is needed but capturing is not.

Pattern: (?:foo|bar)baz
Matches: foobaz, barbaz

Named Groups

Use (?<name>...) to assign a name to a group.

Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Input:   2026-03-18
year:    2026
month:   03
day:     18

Backreferences

Refer to a previously captured group with \1, \2, etc.

Pattern: (\w+)\s\1
Matches: "hello hello"    => hello hello
No match: "hello world"

Lookaround

Lookaround assertions check for a pattern without consuming characters.

Syntax	Name	Meaning
`(?=...)`	Positive lookahead	Followed by ...
`(?!...)`	Negative lookahead	Not followed by ...
`(?<=...)`	Positive lookbehind	Preceded by ...
`(?<!...)`	Negative lookbehind	Not preceded by ...

Examples

Pattern: \d+(?= USD)
Input:   "100 USD and 200 EUR"
Match:   100

Pattern: \b\w+\b(?!\.com)
Input:   "test.com and example.org"
Effect:  Matches words NOT followed by .com

Pattern: (?<=\$)\d+
Input:   "Price: $50"
Match:   50

Pattern: (?<!un)happy
Input:   "happy and unhappy"
Match:   happy (first one only)

Common Patterns

Email (simplified):     [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
IPv4 address:           \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
ISO date (YYYY-MM-DD):  \d{4}-\d{2}-\d{2}
Hex color code:         #[0-9a-fA-F]{3,8}
URL (simplified):       https?://[^\s]+

JavaScript Functions

JavaScript provides two main ways to apply a regex to a string: .test() and .match().

`test()`

Returns true or false - use it when you only need to know whether a pattern matches.

const pattern = /\d{3}/;
pattern.test("abc 123"); // true
pattern.test("no digits"); // false

`match()`

Returns the matched substrings (or null) - use it when you need to extract data from the string.

Without the g flag, match() returns the first match plus captured groups:

const result = "2026-03-18".match(/(\d{4})-(\d{2})-(\d{2})/);
// result[0] => "2026-03-18"  (full match)
// result[1] => "2026"        (group 1)
// result[2] => "03"          (group 2)
// result[3] => "18"          (group 3)

With the g flag, match() returns all matches but no captured groups:

"cat bat sat".match(/[a-z]at/g);
// => ["cat", "bat", "sat"]

If nothing matches, match() returns null and not an empty array:

"hello".match(/\d+/); // null

When to Use Which

Goal	Function
Check if a pattern matches	`test()`
Extract the matched strings	`match()`

Overview​

Basics​

Character Classes​

Examples​

Quantifiers​

Examples​

Anchors​

Examples​

Groups and Alternation​

Flags​

Examples​

Advanced Patterns​

Greedy vs. Lazy​

Examples​

Non-Capturing Groups​

Named Groups​

Backreferences​

Lookaround​

Examples​

Common Patterns​

JavaScript Functions​

test()​

match()​

When to Use Which​

Overview

Basics

Character Classes

Examples

Quantifiers

Examples

Anchors

Examples

Groups and Alternation

Flags

Examples

Advanced Patterns

Greedy vs. Lazy

Examples

Non-Capturing Groups

Named Groups

Backreferences

Lookaround

Examples

Common Patterns

JavaScript Functions

`test()`

`match()`

When to Use Which