NFA vs. DFA Engines¶

The two main engine architectures have fundamentally different performance characteristics and feature sets.

Comparison Table¶

Attribute	NFA (Backtracking)	DFA (Thompson/RE2)
Algorithm	Depth-first search with backtracking	Simulates all states in parallel
Worst case	O(2^n) — exponential	O(n) — linear in string length
Backreferences	✓ Supported	✗ Not supported
Lookarounds	✓ Supported	✗ Not supported (or limited)
Full captures	✓	Limited (leftmost-longest)
Match semantics	Leftmost, first-match	Leftmost-longest (POSIX)
Used by	PCRE, JS, Python, Java, .NET, Ruby	RE2 (Go), Rust regex, Hyperscan
ReDoS vulnerable	Yes	No — by design

How NFA Works¶

The NFA (Non-deterministic Finite Automaton) engine explores match paths depth-first. When it hits an ambiguity (quantifier, alternation), it chooses one path and backtracks if it fails.

Pattern: (a|ab)c
Input:   "abc"

Path 1: match 'a', try 'c' at 'b' → fail
Backtrack: try 'ab', try 'c' at 'c' → success
Result: "abc"

NFA engines are feature-rich (backreferences, lookarounds, atomic groups) but can exhibit exponential behavior on adversarial input.

How DFA / RE2 Works¶

The DFA (Deterministic Finite Automaton) engine — or Thompson NFA simulation — processes the input left to right, one character at a time, tracking all possible states simultaneously.

Pattern: (a|ab)c
Input:   "abc"

At each position, ALL possible states are tracked in parallel.
No backtracking — the engine never re-reads a character.
Time: O(n) regardless of pattern complexity.

The cost: no backreferences, no lookarounds. Features that require "remembering" what was matched at a specific point are fundamentally incompatible with the DFA model.

POSIX vs. Perl Semantics¶

Pattern: (foo|foobar) against "foobar"

Perl/PCRE/JS → "foo"    (first alternative wins — leftmost)
POSIX        → "foobar" (longest overall match — leftmost-longest)

Most modern tools (grep, sed, awk in POSIX mode) use leftmost-longest semantics. Most programming language regex libraries use Perl/PCRE semantics.

Go's RE2 Engine¶

Go's regexp package uses RE2, which guarantees O(n) time. The tradeoff: no backreferences, no lookaheads/lookbehinds.

// Will compile only if pattern is RE2-safe
r, err := regexp.Compile(`(?P<year>\d{4})-(?P<month>\d{2})`)
// Pattern with backreference → compile error
r, err = regexp.Compile(`(\w+)\s+\1`)    // error: invalid backreference

For security-sensitive applications processing untrusted input, RE2/linear engines are the right choice.

Rust's `regex` Crate¶

Also O(n) — uses NFA simulation (no backtracking). Same limitations as RE2: no backreferences, no lookarounds. Extremely fast in practice.

Hybrid Engines¶

Some engines detect patterns that can be compiled to a DFA and use the fast path, falling back to NFA for patterns requiring backtracking features. .NET's regex engine, Oniguruma, and some PCRE2 configurations do this.

Choosing an Engine¶

Situation	Recommended engine
Untrusted input, must not hang	RE2 / Go / Rust
Need backreferences	PCRE, Java, .NET
Need lookarounds	PCRE, Python, Java, .NET, JS
High throughput, simple patterns	RE2, Hyperscan
Complex patterns, trusted input	PCRE, Java