ReDoS

Since the use of RegEx, including in filters for processing user input, is very common, performing a comparison of a dangerous regular expression against an externally controlled sequence can lead to server failure due to the long processing of the malicious request.

Any regular expression whose comparison with a special string can result in an exponential increase in the number of steps to complete the operation is a potential ReDoS sink. Such growth is a consequence of a catastrophic return.

One of the operating principles of RegEx engines is backtracking. Typically, it occurs when the string being compared does not match an expression consisting of successive quantified groups, with the goal of finding a match by covering all possible combinations with the algorithm. Thus, dangerous RegEx necessarily contain a quantified capture pattern or is unoptimized.

Selecting a line to provoke a catastrophic return involves using the RegEx debugger. Manual analysis of a regular expression depends on the quantified pattern principle. For example, with a vulnerable part of a regular expression that includes a greedy quantifier, to reproduce catastrophic backtracking, the step-by-step behavior of the RegEx engine must include a repeatedly repeated process: the capture of a long sequence by the algorithm and its subsequent multi-stage rollback.

For example:

\A[0-9a-zA-Z][0-9a-zA-Z_.-]*[0-9a-zA-Z\^]\.{2,3}[0-9a-zA-Z][0-9a-zA-Z_.-]*[0-9a-zA-Z\^]\z

The comparison of the patterned sequence w..w..w..w..w................... against the aforementioned RegEx causes a catastrophic backtracking. The The problem here lies in (1) the portion [0-9a-zA-Z\^]\z, which compels the algorithm to regress to the initial match and (2) the group [0-9a-zA-Z_.-], supported by the greedy quantifier *, freely spanning the rest of the line.

The exponential increase in the number of comparison steps when adding the same number of characters in this example is not so fast, but is sufficient to crash the server if the threshold for the number of characters allowed per input is high enough. For 800 characters, the engine will need over 130,000 steps to process the expression.

For comparative purposes, the PCRE2 engine, when coupled with the expression (a*)*b\z and comparing the string aaaaaaaaaaaaaa, consisting of 14 characters, against it, takes more than 160,000 steps. Here, the dependence of resource consumption on the number of characters in the compared string differs significantly from that in the first example: the expression contains a nesting of groups with identical capture targets, which greatly increases the number of possible combinations.