I want to determine the cardinal direction I am facing
I can model this problem with English
I can model this problem with a subset of English
Subset:{Turn Left 90 degrees, Turn Right 90 degrees}
There is a minimum language needed to compute a problem
There is a minimum language needed to compute a problem
Different classes of languages exist
Regular languages can compute very simple problems
A regular language is any language that can be defined by a regular expression
Regular Expression: A pattern that describes a set of strings
Regular Expressions are used to describe regular languages (future lecture)
For now: a tool used to search for text
A pattern that describes a set of strings
How to define the set?
How to define the set?
We write a pattern or a regular expression
Our first pattern
"a"
Describes the set {"a"}
Our second pattern
"hello"
Describes the set {"hello"}
Boring
"hello|hi"
Describes the set {"hello", "hi"}
Boolean Or
"this"|"that"
Describes the set {"this", "that"}
The or operator's scope extends to start or end
"this|that|the other thing"
or until another |
Describes the set {"this", "that", "the other thing"}
Precedence
"cliff|clyff"
{"cliff","clyff"}
A lot of shared characters
"cl(i|y)ff"
Describes the same set
Quantification
"0|1|2|3|4|5|6|7|8|9"
{"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}
What about two digit strings?
"(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)"
Cringe
Quantification
"(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)"
"(0|1|2|3|4|5|6|7|8|9){2}"
What about infinite repetition?
Kleene Operator
"(ha)*"
{"", "ha", "haha", "hahaha",...}
Bracket Expressions
"0|1|2|3|4|5|6|7|8|9"
"[0-9]"
"[a-z]"
Any ascii range, can also or
"[a-zA-Z]"
Any lowercase or uppercase letter
Bracket Expressions
Any ascii range, can also or
"[a-zA-Z]"
"[aeiou]"
{"a", "e", "i", "o", "u"}
Can also negate single characters
"[^aeiou]"
Anything except a,e,i,o,u
Other helpful symbols
Note: will need to be escaped to be matched
"1 \+ 2"
Needs the re module
#regex.py
import re
create a re:
#regex2.py
my_re = re.compile("[0-9]+\.[0-9]+")
Matching
Check if string in re
#regex3.py
my_re = re.compile("[0-9]+\.[0-9]+")
if (my_re.match("12.3")):
print("successfully matched")
else:
print("unsuccessfully matched")
Grouping
Searching is great, parsing is better
#regex4.py
my_re = re.compile("[0-9]+\.([0-9]+)")
m = my_re.match("12.3")
if m:
print("the decimal is " + m.group(1))
else:
print("unsuccessfully matched")
Parenthesis show precedence, AND capture
Grouping
Parenthesis show precedence, AND capture
#regex5.py
my_re = re.compile("(([0-9]){3})-([0-9]{3}-[0-9]{4})")
m = my_re.match("123-456-7890")
if m:
print("the area code is " + m.group(1))
print("the rest is " + m.group(3))
else:
print("unsuccessfully matched")
Group is determined by open paren