Andy Chu — 2018/6/7
What are regexes good for?
I define UNIX as 30 definitions of regular expressions living under one roof.
— Donald Knuth
(Examples: grep, awk, Perl, Python, vim, emacs, ...)
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
— Jamie Zawinski
(related to tip #4: follow-on talk)
Use extended regular expressions (ERE) at the shell to minimize the amount of syntax you need to remember (grep/sed/awk vs. Perl/Python/JS/Ruby)
[demo]
grep -E
(egrep
)sed -r
or --regexp-extended
(GNU)awk
(no flags necessary)Always write unit tests for your regular expressions.
myregex(s: str) -> bool
myregex(s: str) -> (group1: str, group2: str, ...)
[demo]
Use re.VERBOSE
in Python for readability (/x
in Perl; JavaScript doesn't
have this)
[demo]
Use regular expressions only to match regular languages (a mathematical idea).
Q: What's good about both "regexes" and regular languages?
A: They are a compact notation for pattern matching on strings (e.g. without
writing an if statement for every character). [example]
[A-Za-z_][A-Za-z0-9_]* # recognize a variable name
# (programming language world)
^(http|https)://.*\.(png|gif|jpg|jpeg)$ # recognize certain URLs
# ("big data" and system
# administration worlds)
Excerpt from Crafting Interpreters
private boolean isDigit(char c) {
return c >= '0' && c <= '9';
}
private boolean isAlpha(char c) {
return (c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
c == '_';
}
private boolean isAlphaNumeric(char c) {
return isAlpha(c) || isDigit(c);
}
private void identifier() {
while (isAlphaNumeric(peek())) advance();
addToken(IDENTIFIER);
}
default:
...
} else if (isAlpha(c)) {
identifier();
...
Perl-style "regexes" | Regular Languages | |
Mental Model | Machine-like / Imperative (order matters, debugging through time) | Mathematical / Algebraic / Linguistic (sets, alphabets, functional programming) |
Execution Model | Exhaustive backtracking search. Potentially exponential. | NFA/DFA. Guaranteed O(n) time, O(1) space. |
Example Constructs |
Everything in regular languages plus: |
Repetition: |
Interpreter Implementations | The ones in Perl, Python, Ruby, JavaScript, Java. PCRE. | libc, awk, re2 |
Compiler Implementations | - | re2c, lex/flex |
People | Larry Wall, Friedl (O'Reilly Book) |
Theory: Kleene, Chomsky (Type 3 Grammars) |