I had a log file containing sequences of operations descriptions like this:
/*
DBLoadSchema
2013-06-21T15:01:46.222-04:00
. . .
and I needed to extract just the name (‘DBLoadSchema’) and date-time of each operation, plus their line number within the log file.
The ‘standard’ way would be to use a loop processing all the lines one by one, using some variables to keep track of the current state: ‘found /*’, ‘found operation name XXX’ etc. – a state machine.
The scripting language I use now is mostly F# – but using mutable variables to keep track of the state like that feels ‘wrong’ in a functional language like F# – a solution without mutable variables would be preferable.
Half-remembering something I read in a blog (or maybe was a book?) I came up with using separate functions – each representing a state of the state machine – that simply call each other to transition between states:
let rec processStart (log: StreamReader) lineNumber =
if log.EndOfStream then
Seq.empty
else
let line = log.ReadLine()
let lineNumber = lineNumber + 1
if line="/*" then
processStartComment log lineNumber
else
processStart log lineNumber
and processStartComment (log: StreamReader) lineNumber =
if log.EndOfStream then
Seq.empty
else
let line = log.ReadLine()
let lineNumber = lineNumber + 1
processProc (line.Trim()) log lineNumber
and processProc procName (log: StreamReader) lineNumber =
if log.EndOfStream then
Seq.empty
else
let line = log.ReadLine()
let lineNumber = lineNumber + 1
let dateTimeString = line.Trim()
let (ok, dateTime) = DateTime.TryParse(dateTimeString)
seq {
if ok then
yield (lineNumber, procName, dateTime)
yield! processStart log lineNumber
}
The source stream and the current line number are passed as a parameter to each function, and the state variables are parameters as well – in this case just the name of the operation, that is passed from ‘processStartComment’ to ‘processProc’. The functions return the result as a sequence of triplets line number, operation name, operation date-time. To process a log file simply open it and call ‘processStart’:
let filterLog (path: string) =
use log = new StreamReader(path)
processStart log 1
All in all this ‘functional’ way requires a little bit to get used to – coming from a procedural background – but looks much simpler to use, especially if the state machine gets more complex. The use of sequences is nice as well: once you have a sequence of the parsed values it is easy to create pipelines to do further processing.