Project Rosalind

November 6, 2013 at 3:38 PMMichele Mottini

I was never particularly attracted by biology – my bent is definitely more towards physics/math. Genetics always picked my curiosity though, as a biology subject that looks a lot like math or computer science. Having said that, I know next to nothing about genetics, and the little I know is pretty confused.

Looking on-line for some enlightenment I came across Rosalind, that is a set of bioinformatics problems that can be solved directly on-line.

Each problem include an explanation of the biological background – with links to further reading and resources, a description of the problem itself and a test data set. A button on the page downloads a new input data-set and requires to post the corresponding answer, that is then automatically checked. The problems (100+) are organized in a tree, solving the easier ones ‘unlocks’ the more complex.

It is very nice resource: you learn some biology/genetics, and you solve related problems at the same time – learning specific programming techniques along the way.

The language of choice is Python (the site includes a Python tutorial), but the problems can solved in any language because the site checks just the results. (Users can optionally upload the program code as well as the results, but it is not required).

I started working on the Rosalind problems back in the summer and so far I solved 26 of them. I am using F# – its interactive mode is perfect for quickly writing and testing code, and most problems lend themselves quite well to a functional solution.

Recently I posted the code I wrote so far on GitHub. The solution of the problems requires writing not only bioinformatics specific functions but also functions doing graph manipulation, combinatorics/probability and general string handling. I tried to write these function as generic as possible, and I placed then in different modules, hopefully they could be of help not only for the specific Rosalind problems.

Posted in: Programming

Tags: ,