Introduction to Regular Expressions — Common Patterns and Practical Examples
What Are Regular Expressions?
Regular expressions (regex) are a special notation for describing patterns in strings. They are used in virtually every situation that requires string processing, including text searching, replacement, and validation.
Nearly all programming languages support regular expressions, so once you learn them, you can apply them across JavaScript, Python, Java, PHP, Go, and more.
Basic Metacharacters
Regular expressions are composed of ordinary characters and metacharacters (characters with special meanings). Let's start with the basic metacharacters.
Character Classes
| Metacharacter | Meaning | Example | Matches |
|---|---|---|---|
. |
Any single character (except newline) | a.c |
abc, a1c, a-c |
\d |
Digit (0-9) | \d{3} |
123, 456 |
\D |
Non-digit | \D+ |
abc, --- |
\w |
Alphanumeric and underscore | \w+ |
hello_123 |
\W |
Non-word character | \W |
@, #, |
\s |
Whitespace | \s+ |
, \t, \n |
\S |
Non-whitespace | \S+ |
hello |
Anchors
Anchors match positions rather than characters themselves.
| Metacharacter | Meaning | Example | Description |
|---|---|---|---|
^ |
Start of line | ^Hello |
Matches "Hello" at the start |
$ |
End of line | end$ |
Matches "end" at the end |
\b |
Word boundary | \bcat\b |
Matches "cat", not "catch" |
Character Sets
Square brackets [] let you define custom character classes.
[abc] # Any of a, b, or c
[a-z] # Any lowercase letter
[A-Za-z] # Any letter (upper or lower)
[0-9] # Any digit (equivalent to \d)
[^0-9] # Any non-digit (^ means negation)
[a-zA-Z0-9_] # Alphanumeric and underscore (equivalent to \w)
Quantifiers
Quantifiers specify how many times the preceding element should repeat.
| Quantifier | Meaning | Example | Matches |
|---|---|---|---|
* |
0 or more | ab*c |
ac, abc, abbc |
+ |
1 or more | ab+c |
abc, abbc |
? |
0 or 1 | colou?r |
color, colour |
{n} |
Exactly n | \d{4} |
2026 |
{n,} |
n or more | \d{2,} |
12, 123, 1234 |
{n,m} |
Between n and m | \d{2,4} |
12, 123, 1234 |
Greedy vs. Lazy Matching
By default, quantifiers are "greedy" — they match as many characters as possible. Adding ? after a quantifier makes it "lazy," matching as few as possible.
# Input: <div>hello</div>
<.*> # Greedy: matches <div>hello</div> entirely
<.*?> # Lazy: matches <div> only
Lazy matching is particularly useful when extracting HTML tags or quoted strings.
Grouping and Capturing
Parentheses () group patterns together and capture (extract) the matched portions.
Basic Grouping
(abc)+ # One or more repetitions of "abc"
(red|blue) # "red" or "blue"
Using Capture Groups
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2026-03-08".match(pattern);
console.log(match[1]); // "2026" (year)
console.log(match[2]); // "03" (month)
console.log(match[3]); // "08" (day)
Named Capture Groups
The (?<name>...) syntax lets you assign names to groups, greatly improving readability.
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-03-08".match(pattern);
console.log(match.groups.year); // "2026"
console.log(match.groups.month); // "03"
console.log(match.groups.day); // "08"
Non-Capturing Groups
When you don't need to save the match result, (?:...) creates a non-capturing group. This offers a slight performance advantage.
(?:http|https):// # Groups the protocol part without capturing it
Lookahead and Lookbehind
Lookaheads and lookbehinds specify conditions about surrounding text without including it in the match result.
| Syntax | Name | Description |
|---|---|---|
(?=...) |
Positive lookahead | Position followed by the pattern |
(?!...) |
Negative lookahead | Position not followed by the pattern |
(?<=...) |
Positive lookbehind | Position preceded by the pattern |
(?<!...) |
Negative lookbehind | Position not preceded by the pattern |
# Match digits followed by "px" (without including "px" in the match)
\d+(?=px)
# Input: "100px 200em 300px"
# Matches: "100", "300"
# Match digits preceded by "$"
(?<=\$)\d+
# Input: "$100 200 $300"
# Matches: "100", "300"
Practical Pattern Examples
Email Address Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This pattern validates common email address formats. However, full RFC 5321 compliance would require a much more complex pattern. In practice, a simple format check combined with a confirmation email is the recommended approach.
Phone Number (US Format)
^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
This pattern handles various US phone number formats with or without country code, parentheses, and different separators.
URL Validation
^https?:\/\/[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]{2,})+([\/\w.-]*)*\/?$
ZIP Code (US)
^\d{5}(-\d{4})?$
Matches both "12345" and "12345-6789" formats.
Password Strength Check
Validates that a password is at least 8 characters long and contains at least one uppercase letter, one lowercase letter, and one digit.
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d@$!%*?&]{8,}$
This pattern uses three positive lookaheads to independently check for the presence of each character type.
Regular Expression Performance
Regular expressions are powerful, but poorly written patterns can cause performance issues.
Patterns to Avoid
- Catastrophic backtracking: Nested quantifiers like
(a+)+bcan cause exponentially increasing processing time with certain inputs - Unnecessary capturing: Use
(?:...)for groups where capturing is not needed - Overly complex patterns: Instead of handling everything in a single regex, split the logic into multiple steps
Recommendations
- Use regex visualization and debugging tools
- Start with simple patterns and add conditions incrementally
- Test with real input data and cover edge cases
Conclusion
Regular expressions are a versatile tool for text processing. By mastering metacharacters, quantifiers, grouping, and lookaheads, you can handle most practical challenges. While they may seem complex at first, learning the most commonly used patterns gradually will build your confidence. Using a regex tester to verify pattern behavior in real-time is the most efficient way to learn.
