Everything is a file (at least in UNIX (for the most part))
Programs live in RAM
Hard drives are for long term storage
Programs read files to pick up where they left off
Used in save or configuration files
Some files are binary files
Some have text
P1b loads data from file
How to do this?
#load_file.rb
f = File.open("file.txt")
line = f.gets #readline
lines = f.readlines
f.close
Commonly we need to go line by line
#line_by_line.rb
f = File.open("file.txt")
line = f.gets
while line
#do something
line = f.gets
end
#fun_strings.rb
# remove ending values
a = "hello\n"
puts a.chomp
a.chomp!
puts a
# substrings
a = "hello"
puts a[1]
puts a[-1]
puts a[1..]
puts a[0..4]
puts a[2..4]
puts a[2...4]
# substitution
a = "hello world"
a["world"] = "cliff"
puts a
a.sub!("hello", "bye")
puts a
a.gsub("f","ph")
puts a
# searching
a = "hello world"
puts a.include?("hello")
puts a.index("o")
Not helpful if we want to find alternative spellings
Color vs Colour
Grey vs Gray
Cliff vs Kliff vs Clyff vs Klyff vs Qulyph
Regular Expression: A pattern that describes a set of strings
Regular Expressions are used to describe regular languages (future lecture)
For now: a tool used to search for text
A pattern that describes a set of strings
More about this in a future lecture
Regexp Syntax: /pattern/
#regex1.rb
puts /pattern/.class
Check to see if a string contains a pattern
#regex2.rb
puts "pattern" =~ /pattern/
Regexp Syntax: /pattern/
# regex3.rb
line = gets
# string literal
if line =~ /cliff/
puts "line contains "cliff"
end
# one string or another
if line =~ /cliff|kliff/
puts "line contains 'cliff or kliff'"
end
# we can add parenthesis
if line =~ /(c|k)liff/
puts "line contains 'cliff or kliff'"
end
# and common letter grouping
if line =~ /[a-z]liff/
puts "includes 'liff'"
end
# repetition makes things easy
if line =~ /[a-z]{5}/
puts "any 5 letter string"
end
# digits can be matched too
if line =~ /[0-9]{1,3}/
puts "any 1,2, or 3 digit number"
end
# combining adds complexity
if line =~ /[A-Za-z0-9]+/
puts "alphanumberic with length 1+"
end
# can match some fun things
if line =~ /-?[0-9]+/
puts "any integer"
end
# can match some useful things
if line =~ /-?[0-9]+(,-?[0-9]+)*/
puts "a list of numbers"
end
# can be tricky
if line =~ /^start \d* end$/
puts "start number end"
end
# can be really tricky
if line =~ /[^start]\$\d+.\d{2}/
puts "start number end"
end
Restriction: can only look forward, not predictive
Grouping
Searching is great, parsing is better
#regex4.rb
line = gets
if line =~ /\(\d{3}\) \d{3}-\d{4}/
puts "is phone number"
end
How can we find out the area code?
Grouping
How can we find out the area code?
Grouping
Parenthesized parts are stored as global variables
Refer to them using $\text{\$}$1,$\text{\$}$2 $\text{\$}$3,etc
#back_reference.rb
line = gets
if line =~ /\((\d{3})\) \d{3}-\d{4}/
puts $1
end
Issue: each match will reset the groupings
Grouping
String's scan method matches and groups
Will return array
# scanning.rb
line = gets
arr = line.scan(/\d+/)
puts arr
arr = line.scan(/[a-zA-Z0-9]{2}/)
puts arr
# parenthesis gives array of array
arr = line.scan(/(\d)(\d)/)
puts arr
Grouping
String's scan method matches and groups
Can also take a code block
# scanning2.rb
line = gets
arr = line.scan(/\d+/){|number|
puts number.to_i*number.to_i
}