TRE API reference manual

The regcomp() functions

#include <tre/regex.h>

int regcomp(regex_t *preg, const char *regex, int cflags);
int regncomp(regex_t *preg, const char *regex, size_t len, int cflags);
int regwcomp(regex_t *preg, const wchar_t *regex, int cflags);
int regwncomp(regex_t *preg, const wchar_t *regex, size_t len, int cflags);

The regcomp() function compiles the regex string pointed to by regex to an internal representation and stores the result in the pattern buffer structure pointed to by preg. The regncomp() function is like regcomp(), but regex is not terminated with the null byte. Instead, the len argument is used to give the length of the string, and the string may contain null bytes. The regwcomp() and regwncomp() functions work like regcomp() and regncomp(), respectively, but take a wide character (wchar_t) string instead of a byte string.

The cflags argument is a the bitwise inclusive OR of zero or more of the following flags (defined in the header <tre/regex.h>):

REG_EXTENDED
Use POSIX Extended Regular Expression (ERE) compatible syntax when compiling regex. The default syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is considered obsolete.
REG_ICASE
Ignore case. Subsequent searches with the regexec family of functions using this pattern buffer will be case insensitive.
REG_NOSUB
Do not report submatches. Subsequent searches with the regexec family of functions will only report whether a match was found or not and will not fill the submatch array.
REG_NEWLINE
Normally the newline character is treated as an ordinary character. When this flag is used, the newline character ('\n', ASCII code 10) is treated specially as follows:
  1. The match-any-character operator (dot "." outside a bracket expression) does not match a newline.
  2. A non-matching list ([^...]) not containing a newline does not match a newline.
  3. The match-beginning-of-line operator ^ matches the empty string immediately after a newline as well as the empty string at the beginning of the string (but see the REG_NOTBOL regexec() flag below).
  4. The match-end-of-line operator $ matches the empty string immediately before a newline as well as the empty string at the end of the string (but see the REG_NOTEOL regexec() flag below).

The regex_t structure has the following fields that the application can read:

size_t re_nsub
Number of parenthesized subexpressions in regex.

The regcomp function returns zero if the compilation was successful, or one of the following error codes if there was an error:

REG_BADPAT
Invalid regexp. TRE returns this only if a multibyte character set is used in the current locale, and regex contained an invalid multibyte sequence.
REG_ECOLLATE
Invalid collating element referenced. TRE returns this whenever equivalence classes or multicharacter collating elements are used in bracket expressions (they are not supported yet).
REG_ECTYPE
Unknown character class name in [[:name:]].
REG_EESCAPE
The last character of regex was a backslash (\).
REG_ESUBREG
Invalid back reference; number in \digit invalid.
REG_EBRACK
[] imbalance.
REG_EPAREN
\(\) or () imbalance.
REG_EBRACE
\{\} or {} imbalance.
REG_BADBR
{} content invalid: not a number, more than two numbers, first larger than second, or number too large.
REG_ERANGE
Invalid character range, e.g. ending point is earlier in the collating order than the starting point.
REG_ERANGE
Out of memory.
REG_BADRPT
Invalid use of repetition operator. TRE never returns this.

The regexec() functions

#include <tre/regex.h>

int regexec(const regex_t *preg, const char *string, size_t nmatch,
            regmatch_t pmatch[], int eflags);
int regnexec(const regex_t *preg, const char *string, size_t len,
             size_t nmatch, regmatch_t pmatch[], int eflags);
int regwexec(const regex_t *preg, const wchar_t *string, size_t nmatch,
             regmatch_t pmatch[], int eflags);
int regwnexec(const regex_t *preg, const wchar_t *string, size_t len,
              size_t nmatch, regmatch_t pmatch[], int eflags);

The regexec() function matches the null-terminated string against the compiled regexp preg, initialized by a previous call to any one of the regcomp functions. The regnexec() function is like regexec(), but string is not terminated with a null byte. Instead, the len argument is used to give the length of the string, and the string may contain null bytes. The regwexec() and regwnexec() functions work like regexec() and regnexec(), respectively, but take a wide character (wchar_t) string instead of a byte string. The eflags argument is a bitwise OR of zero or more of the following flags:

REG_NOTBOL

When this flag is used, the match-beginning-of-line operator ^ does not match the empty string at the beginning of string. If REG_NEWLINE was used when compiling preg the empty string immediately after a newline character will still be matched.

REG_NOTEOL

When this flag is used, the match-end-of-line operator $ does not match the empty string at the end of string. If REG_NEWLINE was used when compiling preg the empty string immediately before a newline character will still be matched.

These flags are useful when different portions of a string are passed to regexec and the beginning or end of the partial string should not be interpreted as the beginning or end of a line.

If REG_NOSUB was used when compiling preg, nmatch is zero, or pmatch is NULL, then the pmatch argument is ignored. Otherwise, the submatches corresponding to the parenthesized subexpressions are filled in the elements of pmatch, which must be dimensioned to have at least nmatch elements.

The regmatch_t structure contains at least the following fields:

regoff_t rm_so
Byte offset from start of string to start of substring.
regoff_t rm_eo
Byte offset from start of string to the first character after the substring.

The length of a submatch in bytes can be computed by subtracting rm_eo and rm_so. If a parenthesized subexpression did not participate in a match, the rm_so and rm_eo fields for the corresponding pmatch element are set to -1. When a multibyte character set is in effect, the submatch offsets are given as byte offsets, not character offsets.

The regexec() functions return zero if a match was found, otherwise they return REG_NOMATCH to indicate no match, or REG_ESPACE to indicate that enough temporary memory could not be allocated to complete the matching operation.

The approximate matching functions

#include <tre/regex.h>

typedef struct {
  int cost_ins;
  int cost_del;
  int cost_subst;
  int max_cost;
} regaparams_t;

typedef struct {
  size_t nmatch;
  regmatch_t *pmatch;
  int cost;
} regamatch_t;

int regaexec(const regex_t *preg, const char *string,
             regamatch_t *match, regaparams_t params, int eflags);
int reganexec(const regex_t *preg, const char *string, size_t len,
              regamatch_t *match, regaparams_t params, int eflags);
int regawexec(const regex_t *preg, const wchar_t *string,
              regamatch_t *match, regaparams_t params, int eflags);
int regawnexec( const regex_t *preg, const wchar_t *string, size_t len,
               regamatch_t *match, regaparams_t params, int eflags);

The regaexec() function searches for the best match in string against the compiled regexp preg, initialized by a previous call to any one of the regcomp functions.

The reganexec() function is like regaexec(), but string is not terminated by a null byte. Instead, the len argument is used to tell the length of the string, and the string may contain null bytes. The regawexec() and regawnexec() functions work like regaexec() and reganexec(), respectively, but take a wide character (wchar_t) string instead of a byte string.

The eflags argument is like for the regexec() functions.

The params struct controls the approximate matching parameters:

int cost_ins
The cost of an inserted character, that is, an extra character in string.
int cost_del
The cost of a deleted character, that is, a character missing from string.
int cost_subst
The cost of a substituted character.
int max_cost
The maximum allowed cost of a match. If this is set to zero, an exact matching is searched for, and results equivalent to those returned by the regexec() functions are returned.

The match argument points to a regamatch_t structure. The nmatch and pmatch field must be filled by the caller. If REG_NOSUB was used when compiling the regexp, or match->nmatch is zero, or match->pmatch is NULL, the match->pmatch argument is ignored. Otherwise, the submatches corresponding to the parenthesized subexpressions are filled in the elements of match->pmatch, which must be dimensioned to have at least match->nmatch elements. The match->cost field is set to the cost of the match found.

The regaexec() functions return zero if a match with cost smaller than params->max_cost was found, otherwise they return REG_NOMATCH to indicate no match, or REG_ESPACE to indicate that enough temporary memory could not be allocated to complete the matching operation.