Sequence expressions and side-effects in F#

September 6, 2013 at 4:11 PMMichele Mottini

I discovered an interesting problem when using sequence expression to read a text. The original case was more complex, but here is a simple example to reproduce the problem – a function reading all the lines of text from a text stream, returning them as a sequence of string using a sequence expression:

let readlines (reader: System.IO.TextReader) = 
  seq {
    let line = ref (reader.ReadLine())
    while !line <> null do 
      yield !line
      line := reader.ReadLine()
 }

Reading a two-lines text:

readlines (new System.IO.StringReader"A
B") 

produces as expected:

val it : seq<string> = seq ["A"; "B"]

Now define a simple function that checks if a sequence is empty and return either the string “Empty” or the string “XX items”:

let test s = 
  if Seq.isEmpty s then 
    "Empty"
  else
    sprintf "%d items" (Seq.length s)

Apply it to the same sequence as above:

readlines (new System.IO.StringReader"A
B") |> test

and the result is

val it : string = "1 items"

that is very wrong: the sequence contains 2 lines, not 1.

The problem is that the call to Seq.isEmpty reads the first line, moving the text reader to the second line (i.e. causing a side-effect) – and so causing subsequent uses of the sequence to skip the first line.

In this case the problem is with a text reader, but any call causing side-effects within a sequence expression is bound to cause the same unexpected behavior.

One possible solution is to cache the resulting sequence:

let readlines (reader: System.IO.TextReader) = 
  seq {
    let line = ref (reader.ReadLine())
    while !line <> null do 
      yield !line
      line := reader.ReadLine()
  } |> Seq.cache

that prevents double calls to read the same element in the sequence, fixing the problem.

Another solution is to create the reader within the sequence expression:

let readlinesStr str = 
  seq {
    use reader = new System.IO.StringReader(str)
    let line = ref (reader.ReadLine())
    while !line <> null do 
      yield !line
      line := reader.ReadLine()
  }

that causes a new reader to be created each time sequence is accessed, avoiding side-effects.

Posted in: Programming

Tags: , , ,

State machine using recursive functions

July 8, 2013 at 7:58 AMMichele Mottini

I had a log file containing sequences of operations descriptions like this:

/*
  DBLoadSchema
  2013-06-21T15:01:46.222-04:00
  . . .

and I needed to extract just the name (‘DBLoadSchema’) and date-time of each operation, plus their line number within the log file.

The ‘standard’ way would be to use a loop processing all the lines one by one, using some variables to keep track of the current state: ‘found /*’, ‘found operation name XXX’ etc. – a state machine.

The scripting language I use now is mostly F# – but using mutable variables to keep track of the state like that feels ‘wrong’ in a functional language like F# – a solution without mutable variables would be preferable.

Half-remembering something I read in a blog (or maybe was a book?) I came up with using separate functions – each representing a state of the state machine – that simply call each other to transition between states:

let rec processStart (log: StreamReader) lineNumber =
  if log.EndOfStream then
    Seq.empty
  else
    let line = log.ReadLine()
    let lineNumber = lineNumber + 1
    if line="/*" then 
      processStartComment log lineNumber
    else 
      processStart log lineNumber
and processStartComment (log: StreamReader) lineNumber =
  if log.EndOfStream then
    Seq.empty
  else
    let line = log.ReadLine()
    let lineNumber = lineNumber + 1
    processProc (line.Trim()) log lineNumber
and processProc procName (log: StreamReader) lineNumber =
  if log.EndOfStream then
    Seq.empty
  else
    let line = log.ReadLine()
    let lineNumber = lineNumber + 1
    let dateTimeString = line.Trim()
    let (ok, dateTime) = DateTime.TryParse(dateTimeString)
    seq {
      if ok then
        yield (lineNumber, procName, dateTime)
      yield! processStart log lineNumber
    }

The source stream and the current line number are passed as a parameter to each function, and the state variables are parameters as well – in this case just the name of the operation, that is passed from ‘processStartComment’ to ‘processProc’. The functions return the result as a sequence of triplets line number, operation name, operation date-time. To process a log file simply open it and call ‘processStart’:

let filterLog (path: string) =
  use log = new StreamReader(path)
  processStart log 1

All in all this ‘functional’ way requires a little bit to get used to – coming from a procedural background – but looks much simpler to use, especially if the state machine gets more complex. The use of sequences is nice as well: once you have a sequence of the parsed values it is easy to create pipelines to do further processing.

Posted in: Programming

Tags: ,