Skip to main content
info

This page has not been translated into German yet

Regular Expressions

Overview

Regular expressions (regex) are patterns used to match character combinations in strings. They are supported by virtually every programming language and many command-line tools (e.g. grep, sed, awk). A regex engine scans the input string and checks whether (and where) the pattern matches.

Basics

Character Classes

Match a single character from a defined set.

SyntaxMeaning
.Any character except newline
[abc]One of a, b, or c
[^abc]Any character except a, b, or c
[a-z]Any lowercase letter
[0-9]Any digit
\dDigit ([0-9])
\DNon-digit ([^0-9])
\wWord character ([a-zA-Z0-9_])
\WNon-word character ([^a-zA-Z0-9_])
\sWhitespace ([ \t\n\r\f\v])
\SNon-whitespace ([^\t\n\r\f\v])

Examples

Pattern: [A-Z]\w+
Input: "Hello World 123"
Matches: Hello, World

[A-Z] matches one uppercase letter, \w+ then matches one or more word characters after it. 123 has no uppercase letter at the start, so it is skipped. \w does include digits, but [A-Z] restricts the first character to letters only.

Pattern: \d\d\d
Input: "Call 555-1234"
Matches: 555, 123

Three consecutive digits. The - breaks the sequence, so 1234 produces two overlapping windows but only 123 matches as a complete three-digit group (the engine then continues at 4, which alone is not enough).

Quantifiers

Control how many times the preceding element must occur.

SyntaxMeaning
*0 or more (greedy)
+1 or more (greedy)
?0 or 1 (optional)
{n}Exactly n times
{n,}n or more times
{n,m}Between n and m times

Examples

Pattern: colou?r
Input: "color and colour"
Matches: color, colour

The ? makes the u optional, so both color (0 times u) and colour (1 time u) match.

Pattern: \d{2,4}
Input: "1 22 333 4444 55555"
Matches: 22, 333, 4444, 5555

Matches between 2 and 4 consecutive digits. 1 is too short. 55555 yields 5555 (greedy, so the engine takes the maximum 4) and the remaining 5 is too short for another match.

Anchors

Match a position rather than a character.

SyntaxMeaning
^Start of string (or line with m)
$End of string (or line with m)
\bWord boundary
\BNon-word boundary

Examples

Pattern: \bcat\b
Matches: "the cat sat" => cat
No match: "concatenate"

\b marks the boundary between a word character and a non-word character. In concatenate, cat is surrounded by other letters, so \b does not match at those positions.

Pattern: ^\d+
Input: "42 is the answer"
Match: 42

^ anchors the match to the start of the string. \d+ then matches one or more digits from that position. Since 42 is at the very beginning, it matches.

Pattern: \.$
Input: "End of sentence."
Match: .

$ anchors the match to the end of the string. \. matches a literal dot (escaped because . normally means "any character"). Together they match a dot at the end of the string.

Groups and Alternation

Parentheses () create groups that capture the matched substring.

Pattern: (foo)(bar)
Input: foobar
Group 1: foo
Group 2: bar

Each pair of () creates a numbered group. The full match is foobar, but the groups let you access foo and bar individually (e.g. for search-and-replace or extraction).

The pipe | acts as a logical OR.

Pattern: cat|dog
Matches: cat, dog

The engine tries cat first, and if that fails at the current position, it tries dog.

Pattern: (\d{3})-(\d{4})
Input: "555-1234"
Group 1: 555
Group 2: 1234

Groups can capture parts of a structured string separately. Here the area code and number are split into two groups, while the - is matched but not captured.

Flags

Flags modify how the pattern is applied.

FlagNameEffect
gGlobalFind all matches, not just the first
iCase-insensitiveIgnore upper/lower case
mMultiline^ and $ match start/end of each line
sDotall. also matches newline characters
uUnicodeTreat pattern and input as Unicode

Examples

Pattern (no flag): /hello/
Input: "Hello World"
No match

Pattern (with i): /hello/i
Input: "Hello World"
Match: Hello

Without the i flag, hello does not match Hello because the H is uppercase. With the i flag, case is ignored and the match succeeds.

Advanced Patterns

Greedy vs. Lazy

  • Greedy (default): matches as much as possible
  • Lazy (append ?): matches as little as possible
SyntaxMeaning
*?0 or more (lazy)
+?1 or more (lazy)
??0 or 1 (lazy)

Examples

Input: <b>bold</b> and <b>more</b>

Greedy: <.*> => 1 match: <b>bold</b> and <b>more</b>
Lazy: <.*?> => 4 matches: <b>, </b>, <b>, </b>

Greedy .* expands as far as possible, matching from the first < to the very last >, the entire string in one match. Lazy .*? stops at the earliest possible >, so each tag is matched individually.

Non-Capturing Groups

Use (?:...) when grouping is needed but capturing is not.

Pattern: (?:foo|bar)baz
Matches: foobaz, barbaz

Named Groups

Use (?<name>...) to assign a name to a group.

Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Input: 2026-03-18
year: 2026
month: 03
day: 18

Backreferences

Refer to a previously captured group with \1, \2, etc.

Pattern: (\w+)\s\1
Matches: "hello hello" => hello hello
No match: "hello world"

Lookaround

Lookaround assertions check for a pattern without consuming characters.

SyntaxNameMeaning
(?=...)Positive lookaheadFollowed by ...
(?!...)Negative lookaheadNot followed by ...
(?<=...)Positive lookbehindPreceded by ...
(?<!...)Negative lookbehindNot preceded by ...

Examples

Pattern: \d+(?= USD)
Input: "100 USD and 200 EUR"
Match: 100
Pattern: \b\w+\b(?!\.com)
Input: "test.com and example.org"
Effect: Matches words NOT followed by .com
Pattern: (?<=\$)\d+
Input: "Price: $50"
Match: 50
Pattern: (?<!un)happy
Input: "happy and unhappy"
Match: happy (first one only)

Common Patterns

Email (simplified): [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
IPv4 address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
ISO date (YYYY-MM-DD): \d{4}-\d{2}-\d{2}
Hex color code: #[0-9a-fA-F]{3,8}
URL (simplified): https?://[^\s]+

JavaScript Functions

JavaScript provides two main ways to apply a regex to a string: .test() and .match().

test()

Returns true or false - use it when you only need to know whether a pattern matches.

const pattern = /\d{3}/;
pattern.test("abc 123"); // true
pattern.test("no digits"); // false

match()

Returns the matched substrings (or null) - use it when you need to extract data from the string.

Without the g flag, match() returns the first match plus captured groups:

const result = "2026-03-18".match(/(\d{4})-(\d{2})-(\d{2})/);
// result[0] => "2026-03-18" (full match)
// result[1] => "2026" (group 1)
// result[2] => "03" (group 2)
// result[3] => "18" (group 3)

With the g flag, match() returns all matches but no captured groups:

"cat bat sat".match(/[a-z]at/g);
// => ["cat", "bat", "sat"]

If nothing matches, match() returns null and not an empty array:

"hello".match(/\d+/); // null

When to Use Which

GoalFunction
Check if a pattern matchestest()
Extract the matched stringsmatch()