Skip to content

NFA vs. DFA Engines

The two main engine architectures have fundamentally different performance characteristics and feature sets.


Comparison Table

Attribute NFA (Backtracking) DFA (Thompson/RE2)
Algorithm Depth-first search with backtracking Simulates all states in parallel
Worst case O(2^n) — exponential O(n) — linear in string length
Backreferences ✓ Supported ✗ Not supported
Lookarounds ✓ Supported ✗ Not supported (or limited)
Full captures Limited (leftmost-longest)
Match semantics Leftmost, first-match Leftmost-longest (POSIX)
Used by PCRE, JS, Python, Java, .NET, Ruby RE2 (Go), Rust regex, Hyperscan
ReDoS vulnerable Yes No — by design

How NFA Works

The NFA (Non-deterministic Finite Automaton) engine explores match paths depth-first. When it hits an ambiguity (quantifier, alternation), it chooses one path and backtracks if it fails.

Pattern: (a|ab)c
Input:   "abc"

Path 1: match 'a', try 'c' at 'b' → fail
Backtrack: try 'ab', try 'c' at 'c' → success
Result: "abc"

NFA engines are feature-rich (backreferences, lookarounds, atomic groups) but can exhibit exponential behavior on adversarial input.


How DFA / RE2 Works

The DFA (Deterministic Finite Automaton) engine — or Thompson NFA simulation — processes the input left to right, one character at a time, tracking all possible states simultaneously.

Pattern: (a|ab)c
Input:   "abc"

At each position, ALL possible states are tracked in parallel.
No backtracking — the engine never re-reads a character.
Time: O(n) regardless of pattern complexity.

The cost: no backreferences, no lookarounds. Features that require "remembering" what was matched at a specific point are fundamentally incompatible with the DFA model.


POSIX vs. Perl Semantics

Pattern: (foo|foobar) against "foobar"

Perl/PCRE/JS → "foo"    (first alternative wins — leftmost)
POSIX        → "foobar" (longest overall match — leftmost-longest)

Most modern tools (grep, sed, awk in POSIX mode) use leftmost-longest semantics. Most programming language regex libraries use Perl/PCRE semantics.


Go's RE2 Engine

Go's regexp package uses RE2, which guarantees O(n) time. The tradeoff: no backreferences, no lookaheads/lookbehinds.

// Will compile only if pattern is RE2-safe
r, err := regexp.Compile(`(?P<year>\d{4})-(?P<month>\d{2})`)
// Pattern with backreference → compile error
r, err = regexp.Compile(`(\w+)\s+\1`)    // error: invalid backreference

For security-sensitive applications processing untrusted input, RE2/linear engines are the right choice.


Rust's regex Crate

Also O(n) — uses NFA simulation (no backtracking). Same limitations as RE2: no backreferences, no lookarounds. Extremely fast in practice.


Hybrid Engines

Some engines detect patterns that can be compiled to a DFA and use the fast path, falling back to NFA for patterns requiring backtracking features. .NET's regex engine, Oniguruma, and some PCRE2 configurations do this.


Choosing an Engine

Situation Recommended engine
Untrusted input, must not hang RE2 / Go / Rust
Need backreferences PCRE, Java, .NET
Need lookarounds PCRE, Python, Java, .NET, JS
High throughput, simple patterns RE2, Hyperscan
Complex patterns, trusted input PCRE, Java