All research should be reproducible. This fact gets engraved into the brains of all potential researchers and that is for a very good reason. Reproducible research means it can be tested or improved by people in a different lab maybe at the other end of the world and it can also reveal mistakes that have been carried out during the research, which may have changed the results of the study altogether. I’d like to say that with peer reviews there is, in theory, a very efficient mechanism in place to ensure all published research is reproducible. On the other hand though, as the Retraction Watch blog shows us time and time again, it doesn’t always work and there are some bogus papers out there (and some researchers seem to be running some sort of a retraction leaderboard).
Therefore it is even more important that we ourselves make sure our research is always reproducible by anyone. Now I’m very sure that I am not the only PhD student out there who starts with the best intentions to make all their research reproducible. In my experience of talking to other people, however, it happens far too often that all these good intentions get forgotten about when making your research reproducible requires steps that might delay the publication of your next paper by five minutes, because you would be spending those five minutes documenting your code, your software versions or your thought process.
In the hope that it may help other people getting a useful workflow together, too, I therefore thought I’d share mine with the world. One of the key items in my workflow is my lab notebook, which actually still is in paper, although I wanted to get rid of using paper completely this year. However, I do work a lot with long equations, which are just so much faster and easier scribbled and manipulated on paper than the tablet (and some research facilities may require you to keep a paper log book anyway). The first thing I do when I arrive at my desk in the morning is take it out, open it to the next empty page and put it next to my computer. This way I can see all day whether I have actually written something down or not and if I haven’t then it makes me do so. Since employing this simple technique I haven’t spent a day at the office without writing into my lab book. When I was doing my project on eutrophication I was actually using Evernote as my lab notebook or if you’re a Python user then you might want to have a look at the IPython Notebook.
The second tool I use in my workflow is Git on Bitbucket to track every single step in the development of both code and paper. I highly prefer Bitbucket over Github because it can do everything that Bitbucket does, but allows you unlimited private repositories. While that may at first sound like it is completely counterproductive on the way to more reproducible research I do want my papers to be inaccessible before I have published them to avoid “theft”. After a paper is submitted I’ll switch the repository over to public. To set up Git on Bitbucket they have a really good and self-explanatory tutorial, which takes you through all the necessary steps. After setting up Git on my system I’ve created some aliases in my shell, which speed up the whole process:
alias ga='git add'
alias gc='git commit'
alias gp='git push'
alias gs='git status
Just save those commands in your .bashrc file. When using Git I try to make sure that every commit I do only contains a single piece of the workflow, for example in one commit I might have worked on the paper and in another I may have fixed a bug. While I may have done these things without committing in between I would always first commit the paper and then the bugfix, but never together. That way you’ll make sure that you don’t re-introduce a bug if want to go back to an old version of your paper.
Last but not least there are Makefiles, which I have written about earlier. For each project I’ve set one up that compiles and runs all code that is necessary to produce the figures I need in my paper. Doing so is extremely simple, you only have to add the corresponding files to the dependency part of the bit that creates your pdf document.
These changes have helped me a lot in creating more re-accessible research and I hope they might also be helpful for other people out there.