Catastrophic backtracking mitigation/prevention
Execution time limitation, recursion limitation, text engines
The simplest solution would be to introduce a limit on the execution time of the comparison operation, for example, .NET allows you to pass the function of setting a new object of the Regex class property MatchTimeout
:
AppDomain domain = AppDomain.CurrentDomain;
// Set a timeout interval of 2 seconds.
domain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT", TimeSpan.FromSeconds(2));
Object timeout = domain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT");
Console.WriteLine("Default regex match timeout: {0}", timeout == null ? "<null>" : timeout);
Regex rgx = new Regex("[aeiouy]");
Console.WriteLine("Regular expression pattern: {0}", rgx.ToString());
Console.WriteLine("Timeout interval for this regex: {0} seconds", rgx.MatchTimeout.TotalSeconds);
Furthermore, the Ruby Regexp class also affords the option of introducing a timeout
parameter when creating a regex object, as demonstrated below:
re = Regexp.new("(b|a+)*c", timeout: 4)
q = 'a' * 25 + 'd' + 'a' * 4 + 's' #=> "aaaaaaaaaaaaaaaaaaaaaaaaaadaaaas"
/(b|a+)*s/ =~ q #=> Regexp::TimeoutError
An additional strategy to lower server load is to introcuce a recursion limit. PCRE engines provide the capability to restrict recursion during comparisons through the (*LIMIT_RECURSION=<LIMIT_SET>)
modifier, with LIMIT_SET
signifying the number of recursive iterations.
Moreover, the use of text-based engines fundamentally avoids backtracking due to their linear iteration principle and reduced complexity. Such engines have a finite set of capture patterns available for use within an expression, always return a single and the largest match, and yield lower performance compared to classical RegEx-directed engines. Patterns such as Lazy/Possessive quantifiers, atomic grouping, and backreferences are inaccessible for processing by text-based engines. This principle is infrequently employed and is more commonly integrated into hybrid engines with dynamic paradigm shift in match selection.
Possessive quantifiers, atomic grouping, secure context
The recommended method would be to change the expression being compared. If a catastrophic backtracking condition occurs in the case of nested quantified patterns, it is necessary to reduce the expression to the form of mutually exclusive conditions so that the engine does not end up in a state of a long recursive iteration. Possessive quantifiers/atomic groups do not backtrack after an unsuccessful match. This is generaly used to improve performance.
initial_ex = /(x+)+y/
possessive_ex = /(x+)++y/
atomic_ex = /(?>x+)+y/
Depending on the expression's intended purpose, it may make sense to embed it within a secure context, ensuring it meets the expression's requirements. For instance, the expression ((([a-z]{0,254}[a-z])((?<!\.s)))\.){0,20}(([a-z]{0,254}[a-z]))(&)
can be securely evaluated by appending \A
before the problematic regex, as follows:
ruby initial_ex = /((([a-z]{0,254}[a-z])((?<!\.s)))\.){0,20}(([a-z]{0,254}[a-z]))(&)/ safe_context_ex = /\A#{initial_ex}/
Another example can be found here: ???