Skip to content

Syntax Fundamentals

Every character in a regex is either a literal (matches itself) or a metacharacter (has special meaning).

Only 12 metacharacters need escaping outside character classes:

\ ^ $ . | ? * + ( ) [ ] { }

Literals and Escaping

Most characters match themselves. To use a metacharacter literally, prefix with \.

hello\.world    # matches "hello.world" literally
                 # unescaped . would match "helloXworld" too

The Dot .

Matches any single character except a newline (unless the s / DOTALL flag is set).

Most Overused Token

The dot is the most overused token in regex. .* is greedy, slow on backtracking, and matches far more than you usually intend.
Use [^,]* or a specific character class wherever possible.


Escape Sequences

Sequence Matches Example
\n Newline (LF) "line1\nline2"
\r Carriage return Windows line endings
\t Tab TSV fields
\f Form feed Page breaks
\v Vertical tab Old format characters
\0 Null byte Binary data
\xHH Hex codepoint \x41A
\uHHHH Unicode (4-hex) \u00e9é
\u{HHHHH} Unicode (ES6 / u flag) \u{1F600} → 😀
\a Bell (PCRE) Terminal bell
\e Escape char (PCRE) ANSI escape sequences

The Structure of a Regex

Pattern:   /^(\d{1,3}\.){3}\d{1,3}$/i
            │  └──────────┘    └──────┘ └ flags
            │  group + quant   literal
            anchor

Every regex is a sequence of:

  • Atoms — a character, class, or group
  • Quantifiers — how many times an atom repeats
  • Anchors — zero-width position assertions
  • Alternation — the | OR operator between sequences

Syntax Across Flavors

Most engines share PCRE-ish syntax. Key divergences:

Feature JS Python Go PCRE
Named groups (?<name>…) (?P<name>…) (?P<name>…) (?<name>…)
Named backref \k<name> (?P=name) N/A \k<name>
Atomic group (?>…)
Lookbehind ES2018+
Unicode props /u flag \p{} via regex \p{}