Open Science

The Open Science movement is rapidly gathering pace. I'm not actively involved in it, but my fiancée is, being a Panton Fellow whose duty it is to spread the word to the existing academic community and, in her particular case, to develop a graduate training program to help encourage ideal working practices in the next generation of scientists. My contribution so far has been to help in the filming and production of some introductory videos. Still, it means that I learn, allbeit second hand, about the problems, and I have an opinion on it. To emphasise, these are my own opinions, and not endorsed by any official body that I may represent in a professional capacity.

Publishing in Journals

The central problem with science is that its working practices haven't developed sufficiently in the internet age. We're still stuck with out-dated publishing practices of paying to publish our work in journals, for which everyone else has to pay to access. We submit our papers electronically because that's easiest for us scientists, and the journals are happy to accept it because it massively reduces their copy-editing process and what does that mean? The bulk of the work in the publication process is the refereeing, used to decide the quality of the paper. This is expected of the same academics who submit to the journal, and it's expected to happen for free. We're expected to take time out from our research time to review other people's work, on, often, rather tangential subjects. The result is a thoroughly random spread where, a lot of the time, decisions are made based on whether known names are authors of the paper, or on the institution of the authors. Referee reports are commonly short and unhelpful, often little more than a yes or no. For my part, I do my best to go through a paper in sufficient detail to really determine its quality, and a rejection contains a long list of detailed reasons which at least form a critique that the authors can learn from. That, however, is extremely time consuming and I refuse point blank to review research articles that are more than about 20 pages long. The time demands are just too great.

We're also trapped in a vicious cycle. Let's say that we don't like the publishing practices. So, we're going to boycott publishing in journals. The problem with that is that your next job is decided in competition with other researchers who are still publishing, and the basic metric on who is better derives from how much you're publishing and in what journals. So, we can't afford to boycott anything. Government-level intervention can help, but only marginally since science is so international, UK based people will be in competition with external candidates for their jobs.

Towards a Solution

Actually, within quantum information, I would argue that we're only a few short steps away from a good solution. More so than some branches of science, we have adopted the arXiv wholeheartedly. Almost all papers appear as preprints on the arXiv when they are submitted to a journal. So, effectively, all our papers are already open access. Certainly when I'm travelling and don't have IP authetication on a journal's website, do I bother to log in and get the paper? No, I just go to the arXiv and get the paper from there. When I send someone a link to a paper, I send them the arXiv link, knowing that they can definitely access it. That just relegates a journal to the role of aggregator, and there's no sense in a journal charging for access. They might as well make themselves open access, and there's no justification for charging higher rates for publication either, as the new wave of open access journals are attempting. Instead, article fees can be used to pay referees. Personally, I think there should be a choice for a referee to be paid xp in cash or py in article credits (y > x), where p is the length of the paper in pages. Given that you want 3 referees per paper, an approximately sustainable method would seem to set y=1/3, or they can have a bit less cash instead. The New Journal of Physics does this to a limited degree, although it's a fixed article credit per paper you referee, independent of its length, and is only a relatively small fraction.

If you adopted such a process, you'd probably want to manage the refereeing process differently. Every referee should be able to access a list of recently submitted manuscripts, and choose to take responsibility for refeereing as few or as many papers as they wish. There would still need to be oversight in order to avoid friends clubs or vendettas arising and skewing the process, but it would let me choose to referee the papers that I want to read anyway, not the ones that I have very little interest in. Perhaps your fraction y would shift depending on the paper's popularity.

There is also a strong case for the referee process to be much more open. Whether this is a more transparent process that the journals go through of encouraging good refereeing practices, or actually making referee reports available with the paper, either anonymously or not, I'm undecided.

The other issue that the internet lets us tackle is reproducibility. For starters, length limits on a paper are ridiculous, and simply encourage authors to cut out important parts of the working. However, we shouldn't stop there. Sites such as the arXiv offer the perfect opportunity to act as data repositories where we can put all of our experimental data, computer code, output etc. If I want to verify somebody's numerics, I should be able to download the code they used, run it, and get similar results (there may be some random sampling involved which would limit exact reproducibility), and then it's down to me to pick through the code and see that it does the calculation it ought to. In practice, few people are likely to do that, but even the possibility of it happening is useful.

I'm sure a common fear would be that you might spend a long time building up, for instance, a new computer model describing a system and, after the first paper you submit on the subject, you want the opportunity to capitalise on the time investment you've made in that code and generate more results before everyone else can get at the code and, if they work quicker than you do, beat you to the punch using your own code. That's totally reasonable which is why, for data integrity (and just to make sure it actually happens), you might impose that the code has to be uploaded at the same time as the paper, but provide an embargo facility so that it doesn't get released publically for a fixed time, perhaps a year.

I've been told about a group in cancer biology (reference needed) that attempted to reproduce the experimental results of a range of papers in their field. Apparently they only succeeded in 10-20% of the cases. That is pitiful, and really calls into question the reliability of results that are the basis for future careers, research directions and, potentially in the long term, clinical decisions. Presumably most of this is just obfuscation; authors not reproducing the full details of their protocols in their papers, keeping certain trade secrets for themselves and their next project, but the issue here is about confidence in the results.

Still, I don't go as far as others. If you read the Panton Principles, they would like you to release everything under a licence that is completely unrestricted. Anybody (including the commercial sector) can use your material directly, without attribution. Most of us have egos, and while I have no problem with my results being put to use (after all, that's what we're in it for), acknowledgment via attribution is a bare minimum. I would erven suggest that royalties are appropriate if the core of a product is one of your ideas. Not much necessarily - after all, you didn't have the idea or drive to commercialise it - but something.

As I've outlined above, I have some distinct ideas about how I would like the scientific dissemination of data to evolve. The practicalities of doing away with journals seem, for now, insurmountable, but a paradigm shift in which they are merely aggregators of already freely available information seems more realistic. Not that I know how to achieve this, except to try and communicate this opinion, raising awareness of these ideas within the broader community, and hope that this can generate some pressure where required. I would like to think it's a more positive attitude than those who are simply saying "we don't like the present system", whom I rarely hear offering any alternatives (those who are actively involved in open science clearly have their own ideas as well, but this is a small community for now).