I currently work in method development for proteomics, and I think about this al...

FrojoS · on Jan 11, 2013

> Finally, it takes significant time and experience to produce (and maintain) code that others can use and expect to work most of the time. That's a waste of time/money for a research lab, and an inefficient use of public funds.

I haven't seen people asking for maintained, (re)usable code. We just want the crappy code that was used to produce the results. There is even an appropriate license, the Community Research and Academic Programming License (CRAPL).

[1] http://matt.might.net/articles/crapl/

bostonpete · on Jan 11, 2013

I would think most people would be unwilling to make their "crappy" code public because no matter how many disclaimers they provide with it, they will be judged by others on it.

beagle3 · on Jan 11, 2013

Why on earth would anyone trust descriptions that they cannot verify?

Trusting without the ability to verify goes against everything scientific.

If you think your code is too "crappy" for publication, why do you believe it is bug free enough to produce dependable answers?

dagw · on Jan 11, 2013

Re-running their crappy code and getting the same result they got doesn't really prove or verify anything. Re-implementing the algorithm they describe in the paper and getting the same result (or not) is far more interesting.

beagle3 · on Jan 11, 2013

> Re-running their crappy code and getting the same result they got doesn't really prove or verify anything.

Yes it does.

Very often, the data selected for publication is cherry-picked. Running the same crappy code on a more complete data set, (or alternatively, on a partial data set) would give a very quick indication of the robustness of the results - and unlike re-implementing, might be doable in a day rather than months of effort.

Furthermore, when you actually re-implement (if you do), it is extremely helpful to compare intermediate results, which is impossible unless you have the original everything.

> Re-implementing the algorithm they describe in the paper and getting the same result (or not) is far more interesting.

Yes, but very rarely done in fields that are not CS or EE (and not very common in these either). Usually, results are just taken as gospel.

Also, there is a ridiculous amount of negligence (and even fraud) in publications. just running the crappy code, seeing the results, and having a cursory look at the code and data would reveal a lot of that.

bostonpete · on Jan 12, 2013

> Why on earth would anyone trust descriptions that they cannot verify? > > Trusting without the ability to verify goes against everything scientific.

Hasn't this always been true about scientific papers? Descriptions can be verified by reproducing the experiment. Why is a paper any less trustworthy just because there's code involved?

jerf · on Jan 12, 2013

The need for reproducibility in experiments is an accident of the fact that our universe is horrifically complicated and true reproducibility is a myth, thus we must make a deliberate, conscious effort to come as close as possible, or no progress can be made. When that is no longer true and it becomes possible to run (under certain constrained circumstances) fully deterministic experiments that can be freely replicated to the bit by anybody, it's time to rethink the assumptions made lo these many centuries ago.

People arguing against source code release often argue as if those of us in favor think that re-running the original simulation is the end-all, be-all of reproducibility. Clearly that is not the case. No one simulation can truly prove anything, and independent reverification will always have a place. But since we do have the source artifacts and original data, why not release them and show exactly what was done and how it was done? Again, the idea that experiments should not do so is merely an artifact of the fact that scientific papers could only be 10 very expensive pages or so in a journal; why carry unexamined assumptions based on that now outdated fact forward into the future?

Accidents of the past are nothing more than accidents of the past, not holy writ. And I'm not aware of a good argument against release of source code that doesn't boil down to well, that's just not how we do it when deeply examined.

beagle3 · on Jan 12, 2013

> Hasn't this always been true about scientific papers? Descriptions can be verified by reproducing the experiment. Why is a paper any less trustworthy just because there's code involved?

It was always true to an extent.

Code is a force multiplier that makes it significantly harder to evaluate the paper with out it (and without reproducing an equivalent).

I'm not in academia myself, but I've heard from friends more than once that when they actually received code (and/or data) they requested from an author, the code turned out to be not precisely described in the paper, and the data is often massaged to fit in a way that's not precisely described either.

The question shouldn't be "why aren't you satisfied with what was good 20 years ago?", but rather "when sharing the bits that makes everything reproducible is a 'git push' away, why isn't it considered mandatory?"

It is a common error that science is about proving things; The scientific method is actually about trying to disprove things and failing to do so. If what you want to do is science, why don't you make it easiest to disprove your results?

jmilloy · on Jan 13, 2013

I just want to emphasize the difference between "code that was used to produce the results" and as algorithm that is the key element of the paper.

If I can't reproduce the results because you're using methods that I can't reasonably find/replicate, then the paper should not be accepted. That's still true for code, and sometimes providing source code is the missing piece. That's definitely not what the OP is about.

bendmorris · on Jan 11, 2013

>If you can't code yourself, leave code production to those who do it well and don't whine that you can't just plug-in whatever hot new method just came out without any effort or proper understanding.

This is missing the point. I can code, and I'm not promoting code sharing to more easily use other people's research. What I want is to be able to verify that the research I read is accurate, and there's simply not enough time available to reproduce everything I read from scratch. Reproducibility is important; I shouldn't have to just accept the authors' word that their results are exactly as described. Independent verification should be happening at the review stage at the very least, and authors should be required to make that possible if they want to publish.

tokipin · on Jan 11, 2013

i would imagine that the algorithm would gain a lot more traction if it actually came with an implementation