This is a regular expressions library. It is based on a decoupling of the surface syntax used to specify regular expressions (the frontend) and the engine that implements the matcher (the matcher). An abstract syntax is used to communicate between the front end and the back end of the system.
Given a structure S1
describing a surface syntax and a structure S2
describing a matching engine, a regular expression package can be defined by applying the functor RegExpFn like so:
RegExpFn (structure P=S1 structure E=S2)
To match a regular expression, one first needs to compile a representation in the surface syntax. The type of a compiled regular expression is given in the REGEXP signature as:
type regexp
Once a regular expression has been compiled, three functions are provided to perform the matching, find
, prefix
and match
. These functions operate on readers as defined in the StringCvt
structure of the Basis Library. A reader of type ('a,'b) reader
is a function 'b -> ('a,'b) option
taking a stream of type 'b
and returning an element of type 'a
and the remainder of the stream, or NONE
if the end of the stream is reached.
The function find
returns a reader that searches a stream and attempts to match the given regular expression. The function prefix
returns a reader that attempts to match the regular expression at the current position in the stream. The function match
takes a list of regular expressions and functions and returns a reader that attempts to match one of the regular expressions at the current position in the stream. The function corresponding to the matched regular expression is invoked on the matching information.
Once a match is found, it is returned as a match_tree
datatype This is a hierarchical structure describing the matches of the various subexpressions appearing in the matched regular expression. A match for an expression is a record containing the position of the match and its length. The root of the structure always describes the outermost match (the whole string matched by the regular expression).
Last Modified June 1, 1998
Comments to John Reppy
Copyright © 1998 Bell Labs, Lucent Technologies