This page has not been translated into German yet
Regular Expressions
Overview
Regular expressions (regex) are patterns used to match character combinations in strings. They are supported by virtually every programming language and many command-line tools (e.g. grep, sed, awk). A regex engine scans the input string and checks whether (and where) the pattern matches.
Basics
Character Classes
Match a single character from a defined set.
| Syntax | Meaning |
|---|---|
. | Any character except newline |
[abc] | One of a, b, or c |
[^abc] | Any character except a, b, or c |
[a-z] | Any lowercase letter |
[0-9] | Any digit |
\d | Digit ([0-9]) |
\D | Non-digit ([^0-9]) |
\w | Word character ([a-zA-Z0-9_]) |
\W | Non-word character ([^a-zA-Z0-9_]) |
\s | Whitespace ([ \t\n\r\f\v]) |
\S | Non-whitespace ([^\t\n\r\f\v]) |
Examples
Pattern: [A-Z]\w+
Input: "Hello World 123"
Matches: Hello, World
[A-Z] matches one uppercase letter, \w+ then matches one or more word characters after it. 123 has no uppercase letter at the start, so it is skipped. \w does include digits, but [A-Z] restricts the first character to letters only.
Pattern: \d\d\d
Input: "Call 555-1234"
Matches: 555, 123
Three consecutive digits. The - breaks the sequence, so 1234 produces two overlapping windows but only 123 matches as a complete three-digit group (the engine then continues at 4, which alone is not enough).
Quantifiers
Control how many times the preceding element must occur.
| Syntax | Meaning |
|---|---|
* | 0 or more (greedy) |
+ | 1 or more (greedy) |
? | 0 or 1 (optional) |
{n} | Exactly n times |
{n,} | n or more times |
{n,m} | Between n and m times |
Examples
Pattern: colou?r
Input: "color and colour"
Matches: color, colour
The ? makes the u optional, so both color (0 times u) and colour (1 time u) match.
Pattern: \d{2,4}
Input: "1 22 333 4444 55555"
Matches: 22, 333, 4444, 5555
Matches between 2 and 4 consecutive digits. 1 is too short. 55555 yields 5555 (greedy, so the engine takes the maximum 4) and the remaining 5 is too short for another match.
Anchors
Match a position rather than a character.
| Syntax | Meaning |
|---|---|
^ | Start of string (or line with m) |
$ | End of string (or line with m) |
\b | Word boundary |
\B | Non-word boundary |
Examples
Pattern: \bcat\b
Matches: "the cat sat" => cat
No match: "concatenate"
\b marks the boundary between a word character and a non-word character. In concatenate, cat is surrounded by other letters, so \b does not match at those positions.
Pattern: ^\d+
Input: "42 is the answer"
Match: 42
^ anchors the match to the start of the string. \d+ then matches one or more digits from that position. Since 42 is at the very beginning, it matches.
Pattern: \.$
Input: "End of sentence."
Match: .
$ anchors the match to the end of the string. \. matches a literal dot (escaped because . normally means "any character"). Together they match a dot at the end of the string.
Groups and Alternation
Parentheses () create groups that capture the matched substring.
Pattern: (foo)(bar)
Input: foobar
Group 1: foo
Group 2: bar
Each pair of () creates a numbered group. The full match is foobar, but the groups let you access foo and bar individually (e.g. for search-and-replace or extraction).
The pipe | acts as a logical OR.
Pattern: cat|dog
Matches: cat, dog
The engine tries cat first, and if that fails at the current position, it tries dog.
Pattern: (\d{3})-(\d{4})
Input: "555-1234"
Group 1: 555
Group 2: 1234
Groups can capture parts of a structured string separately. Here the area code and number are split into two groups, while the - is matched but not captured.
Flags
Flags modify how the pattern is applied.
| Flag | Name | Effect |
|---|---|---|
g | Global | Find all matches, not just the first |
i | Case-insensitive | Ignore upper/lower case |
m | Multiline | ^ and $ match start/end of each line |
s | Dotall | . also matches newline characters |
u | Unicode | Treat pattern and input as Unicode |
Examples
Pattern (no flag): /hello/
Input: "Hello World"
No match
Pattern (with i): /hello/i
Input: "Hello World"
Match: Hello
Without the i flag, hello does not match Hello because the H is uppercase. With the i flag, case is ignored and the match succeeds.
Advanced Patterns
Greedy vs. Lazy
- Greedy (default): matches as much as possible
- Lazy (append
?): matches as little as possible
| Syntax | Meaning |
|---|---|
*? | 0 or more (lazy) |
+? | 1 or more (lazy) |
?? | 0 or 1 (lazy) |
Examples
Input: <b>bold</b> and <b>more</b>
Greedy: <.*> => 1 match: <b>bold</b> and <b>more</b>
Lazy: <.*?> => 4 matches: <b>, </b>, <b>, </b>
Greedy .* expands as far as possible, matching from the first < to the very last >, the entire string in one match. Lazy .*? stops at the earliest possible >, so each tag is matched individually.
Non-Capturing Groups
Use (?:...) when grouping is needed but capturing is not.
Pattern: (?:foo|bar)baz
Matches: foobaz, barbaz
Named Groups
Use (?<name>...) to assign a name to a group.
Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Input: 2026-03-18
year: 2026
month: 03
day: 18
Backreferences
Refer to a previously captured group with \1, \2, etc.
Pattern: (\w+)\s\1
Matches: "hello hello" => hello hello
No match: "hello world"
Lookaround
Lookaround assertions check for a pattern without consuming characters.
| Syntax | Name | Meaning |
|---|---|---|
(?=...) | Positive lookahead | Followed by ... |
(?!...) | Negative lookahead | Not followed by ... |
(?<=...) | Positive lookbehind | Preceded by ... |
(?<!...) | Negative lookbehind | Not preceded by ... |
Examples
Pattern: \d+(?= USD)
Input: "100 USD and 200 EUR"
Match: 100
Pattern: \b\w+\b(?!\.com)
Input: "test.com and example.org"
Effect: Matches words NOT followed by .com
Pattern: (?<=\$)\d+
Input: "Price: $50"
Match: 50
Pattern: (?<!un)happy
Input: "happy and unhappy"
Match: happy (first one only)
Common Patterns
Email (simplified): [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
IPv4 address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
ISO date (YYYY-MM-DD): \d{4}-\d{2}-\d{2}
Hex color code: #[0-9a-fA-F]{3,8}
URL (simplified): https?://[^\s]+
JavaScript Functions
JavaScript provides two main ways to apply a regex to a string: .test() and .match().
test()
Returns true or false - use it when you only need to know whether a pattern matches.
const pattern = /\d{3}/;
pattern.test("abc 123"); // true
pattern.test("no digits"); // false
match()
Returns the matched substrings (or null) - use it when you need to extract data from the string.
Without the g flag, match() returns the first match plus captured groups:
const result = "2026-03-18".match(/(\d{4})-(\d{2})-(\d{2})/);
// result[0] => "2026-03-18" (full match)
// result[1] => "2026" (group 1)
// result[2] => "03" (group 2)
// result[3] => "18" (group 3)
With the g flag, match() returns all matches but no captured groups:
"cat bat sat".match(/[a-z]at/g);
// => ["cat", "bat", "sat"]
If nothing matches, match() returns null and not an empty array:
"hello".match(/\d+/); // null
When to Use Which
| Goal | Function |
|---|---|
| Check if a pattern matches | test() |
| Extract the matched strings | match() |