Better Science Through Better Data #scidata17

· · Read in about 5 min · Comments · Source
·

Doughnuts!

Better Science through Better Doughnuts · Jez Cope

Update: fixed the link to the slides so it works now!

Last week I had the honour of giving my first ever keynote talk, at an event entitled Better Science Through Better Data hosted jointly by Springer Nature and the Wellcome Trust. It was nerve-wracking but exciting and seemed to go down fairly well. I even got accidentally awarded a PhD in the programme — if only it was that easy! The slides for the talk, "Supporting Open Research: The role of an academic library", are available online (doi:10.15131/shef.data.5537269), and the whole event was video'd for posterity and viewable online.

I got some good questions too, mainly from the clever online question system. I didn't get to answer all of them, so I'm thinking of doing a blog post or two to address a few more.

There were loads of other great presentations as well, both keynotes and 7-minute lightning talks, so I'd encourage you to take a look at at least some of it. I'll pick out a few of my highlights.

Dr Aled Edwards (University of Toronto)

There's a major problem with science funding that I hadn't really thought about before. The available funding pool for research is divided up into pots by country, and often by funding body within a country. Each of these pots have robust processes to award funding to the most important problems and most capable researchers. The problem comes because there is no coordination between these pots, so researchers all over the world end up getting funded to research the most popular problems leading to a lot of duplication of effort.

Industry funding suffers from a similar problem, particularly the pharmaceutical industry. Because there is no sharing of data or negative results, multiple companies spend billions researching the same dead ends chasing after the same drugs. This is where the astronomical costs of drug development come from.

Dr Edwards presented one alternative, modelled by a company called M4K Pharma. The idea is to use existing IP laws to try and give academic researchers a reasonable, morally-justifiable and sustainable profit on drugs they develop, in contrast to the current model where basic research is funded by governments while large corporations hoover up as much profit as they possibly can. This new model would develop drugs all the way to human trial within academia, then license the resulting drugs to companies to manufacture with a price cap to keep the medicines affordable to all who need them.

Core to this effort is openness with data, materials and methodology, and Dr Edwards presented several examples of how this approach benefited academic researchers, industry and patients compared with a closed, competitive focus.

Dr Kirstie Whitaker (Alan Turing Institute)

This was a brilliant presentation, presenting a practical how-to guide to doing reproducible research, from one researcher to another. I suggest you take a look at her slides yourself: Showing your working: a how-to guide to reproducible research.

Dr Whitaker briefly addressed a number of common barriers to reproducible research:

  • Is not considered for promotion: so it should be!
  • Held to higher standards than others: reviewers should be discouraged from nitpicking just because the data/code/whatever is available (true unbiased peer review of these would be great though)
  • Publication bias towards novel findings: it is morally wrong to not publish reproductions, replications etc. so we need to address the common taboo on doing so
  • Plead the 5th: if you share, people may find flaws, but if you don't they can't — if you're worried about this you should ask yourself why!
  • Support additional users: some (much?) of the burden should reasonably on the reuser, not the sharer
  • Takes time: this is only true if you hack it together after the fact; if you do it from the start, the whole process will be quicker!
  • Requires additional skills: important to provide training, but also to judge PhD students on their ability to do this, not just on their thesis & papers

The rest of the presentation, the "how-to" guide of the title' was a well-chosen and passionately delivered set of recommendations, but the thing that really stuck out for me is how good Dr Whitaker is at making the point that you only have to do one of these things to improve the quality of your research. It's easy to get the impression at the moment that you have to be fully, perfectly open or not at all, but it's actually OK to get there one step at a time, or even not to go all the way at all!

Anyway, I think this is a slide deck that speaks for itself, so I won't say any more!

Lightning talk highlights

There was plenty of good stuff in the lightning talks, which were constrained to 7 minutes each, but a few of the things that stood out for me were, in no particular order:

  • Code Ocean — share and run code in the cloud
  • dat project — peer to peer data syncronisation tool
    • Can automate metadata creation, data syncing, versioning
    • Set up a secure data sharing network that keeps the data in sync but off the cloud
  • Berlin Institute of Health — open science course for students
  • InterMine — taking the pain out of data cleaning & analysis
  • Nix/NixOS as a component of a reproducible paper
  • BoneJ (ImageJ plugin for bone analysis) — developed by a scientist, used a lot, now has a Wellcome-funded RSE to develop next version
  • ESASky — amazing live, online archive of masses of astronomical data

Coda

I really enjoyed the event (and the food was excellent too). My thanks go out to:

  • The programme committee for asking me to come and give my take — I hope I did it justice!
  • The organising team who did a brilliant job of keeping everything running smoothly before and during the event
  • The University of Sheffield for letting me get away with doing things like this!

Comments

Comments powered by Disqus