I discovered an interesting problem when using sequence expression to read a text. The original case was more complex, but here is a simple example to reproduce the problem – a function reading all the lines of text from a text stream, returning them as a sequence of string using a sequence expression:
let readlines (reader: System.IO.TextReader) =
seq {
let line = ref (reader.ReadLine())
while !line <> null do
yield !line
line := reader.ReadLine()
}
Reading a two-lines text:
readlines (new System.IO.StringReader"A
B")
produces as expected:
val it : seq<string> = seq ["A"; "B"]
Now define a simple function that checks if a sequence is empty and return either the string “Empty” or the string “XX items”:
let test s =
if Seq.isEmpty s then
"Empty"
else
sprintf "%d items" (Seq.length s)
Apply it to the same sequence as above:
readlines (new System.IO.StringReader"A
B") |> test
and the result is
val it : string = "1 items"
that is very wrong: the sequence contains 2 lines, not 1.
The problem is that the call to Seq.isEmpty reads the first line, moving the text reader to the second line (i.e. causing a side-effect) – and so causing subsequent uses of the sequence to skip the first line.
In this case the problem is with a text reader, but any call causing side-effects within a sequence expression is bound to cause the same unexpected behavior.
One possible solution is to cache the resulting sequence:
let readlines (reader: System.IO.TextReader) =
seq {
let line = ref (reader.ReadLine())
while !line <> null do
yield !line
line := reader.ReadLine()
} |> Seq.cache
that prevents double calls to read the same element in the sequence, fixing the problem.
Another solution is to create the reader within the sequence expression:
let readlinesStr str =
seq {
use reader = new System.IO.StringReader(str)
let line = ref (reader.ReadLine())
while !line <> null do
yield !line
line := reader.ReadLine()
}
that causes a new reader to be created each time sequence is accessed, avoiding side-effects.