K 10 svn:author V 6 kevans K 8 svn:date V 27 2020-12-30T01:14:06.720961Z K 7 svn:log V 1918 regex(3): Interpret many escaped ordinary characters as EESCAPE MFC NOTE: This only merged the infrastructure back, the new regcomp symbol that actually interprets these as EESCAPE was *dropped*. This is purely to make future commits for libregex easier to merge back so that we can choose to use it. In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for any character to be escaped, but "ORD_CHAR preceded by an unescaped character [gives undefined results]". Historically, we've interpreted an escaped ordinary character as the ordinary character itself. This becomes problematic when some extensions give special meanings to an otherwise ordinary character (e.g. GNU's \b, \s, \w), meaning we may have two different valid interpretations of the same sequence. To make this easier to deal with and given that the standard calls this undefined, we should throw an error (EESCAPE) if we run into this scenario to ease transition into a state where some escaped ordinaries are blessed with a special meaning -- it will either error out or have extended behavior, rather than have two entirely different versions of undefined behavior that leave the consumer of regex(3) guessing as to what behavior will be used or leaving them with false impressions. This change bumps the symbol version of regcomp to FBSD_1.6 and provides the old escape semantics for legacy applications, just in case one has an older application that would immediately turn into a pumpkin because of an extraneous escape that's embedded or otherwise critical to its operation. This is the final piece needed before enhancing libregex with GNU extensions and flipping the switch on bsdgrep. [1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/ (cherry picked from commit adeebf4cd47c3e85155d92f386bda5e519b75ab2) Git Hash: 70233fc21258ab4347fd07401297d3d409a9c4a8 Git Author: kevans@FreeBSD.org END