An experiment in implementing static checks of regular expressions in Typescript using Brzozowski derivatives.
This is also an experiment in what an API for what a RegEx-validated string type might look like. There has already been discussion of the use cases and potential API in the Typescript RegEx-validated string types issue.
This allows you to make assertions about string constants:
import type { Compile, Exec, Recognize } from "brzozowski-ts/src";
type HexStrRE = Compile<"(?<hex>[0-9A-Fa-f]{5,})">;
type Match = Exec<HexStrRE, "abc123">;
const captures: Match["captures"] = ["abc123"];
const groups: Match["groups"] = { hex: "abc123" };
type HexStr<S extends string> = Recognize<HexStrRE, S>;
type NominalHex = string & { readonly isHex: unique symbol };
const castSpell = <S extends string>(hex: HexStr<S> | NominalHex) => hex;
const spell = castSpell("00dead00" as const); // ok!
const spellCheck: typeof spell = "00dead00"; // ok!
// @ts-expect-error
castSpell("xyz");
let dynamicHex: string = "a5df0";
castSpell(dynamicHex as NominalHex); // ok!
RecognizePattern<RE, Str>
uses a limited and naively-implemented regular expression language:- no lookaround:
- no positive or negative lookahead:
(?=no)
(?!nope)
- no positive or negative lookbehind:
(?<=no)
(?<!nope)
- no positive or negative lookahead:
- matches always start from the start of the string:
/^always/
- string recognition is implemented as a series of potentially-nested commands rather than state transitions within a finite automaton.
- no flags:
- no case-insensitive matching:
/NOPE/i
- no multiline mode:
nope$
- no case-insensitive matching:
- no lookaround:
- Using these types likely slows down builds
This is pre-alpha software: the API will change without warning, the implementation is brittle and incomplete, and none of this code has been optimized for memory usage or speed.
If you're brave, you can:
pnpm add -D "git+https://github.com/skalt/brzozowski-ts.git"
and import the types like
import type { Compile, Exec } from "brzozowski-ts/src";
Compile-time parsing follows this general algorithm:
- Given a constant string type
S
and a regular expression string typeR
- take the derivative of
R
with respect the start ofS
to produce a shorter regular expressionr
and a shorter strings
- recur using
s
andr
- when
r
is empty, the entire regular expression has been matched. - if
s
is empty andr
is not, the expression has not been matched
These DFA-based pure-type RegEx implementations were an inspiration! brzozowski_ts
adds the ability to compile regular expressions, but uses a naive backtracking algorithm based on Brzozowski derivatives rather than DFAs.