I’m just back from IDCC20, so here are a few reflections on this year’s conference. You can find all the available slides and links to shared notes on the conference programme. There’s also a list of all the posters and an overview of the Unconference
Skills for curation of diverse datasets
Here in the UK and elsewhere, you’re unlikely to find many institutions claiming to apply a deep level of curation to every dataset/software package/etc deposited with them. There are so many different kinds of data and so few people in any one institution doing “curation” that it’s impossible to do this for everything. Absent the knowledge and skills required to fully evaluate an object the best that can be done is usually to make a sense check on the metadata and flag up with the depositor potential for high-level issues such as accidental disclosure of sensitive personal information.
The Data Curation Network in the United States is aiming to address this issue by pooling expertise across multiple organisations. The pilot has been highly successful and they’re now looking to obtain funding to continue this work. The Swedish National Data Service is experimenting with a similar model, also with a lot of success.
As well as sharing individual expertise, the DCN collaboration has also produced some excellent online quick-reference guides for curating common types of data.
We had some further discussion as part of the Unconference on the final day about what it would look like to introduce this model in the UK. There was general agreement that this was a good idea and a way to make optimal use of sparse resources. There were also very valid concerns that it would be difficult in the current financial climate for anyone to justify doing work for another organisation, apparently for free.
In my mind there are two ways around this, which are not mutually exclusive by any stretch of the imagination. First is to Just Do It: form an informal network of curators around something simple like a mailing list, and give it a try. Second is for one or more trusted organisations to provide some coordination and structure. There are several candidates for this including DCC, Jisc, DPC and the British Library; we all have complementary strengths in this area so it’s my hope that we’ll be able to collaborate around it. In the meantime, I hope the discussion continues.
Artificial intelligence, machine learning et al
As you might expect at any tech-oriented conference there was a strong theme of AI running through many presentations, starting from the very first keynote from Francine Berman. Her talk, The Internet of Things: Utopia or Dystopia? used self-driving cars as a case study to unpack some of the ethical and privacy implications of AI. For example, driverless cars can potentially increase efficiency, both through route-planning and driving technique, but also by allowing fewer vehicles to be shared by more people. However, a shared vehicle is not a private space in the way your own car is: anything you say or do while in that space is potentially open to surveillance.
Aside from this, there are some interesting ideas being discussed, particularly around the possibility of using machine learning to automate increasingly complex actions and workflows such as data curation and metadata enhancement. I didn’t get the impression anyone is doing this in the real world yet, but I’ve previously seen theoretical concepts discussed at IDCC make it into practice so watch this space!
Training is always a major IDCC theme, and this year two of the most popular conference submissions described games used to help teach digital curation concepts and skills.
Mary Donaldson and Matt Mahon of the University of Glasgow presented their use of Lego to teach the concept of sufficient metadata. Participants build simple models before documenting the process and breaking them down again. Then everyone had to use someone else’s documentation to try and recreate the models, learning important lessons about assumptions and including sufficient detail. Kirsty Merrett and Zosia Beckles from the University of Bristol brought along their card game “Researchers, Impact and Publications (RIP)”, based on the popular “Cards Against Humanity”. RIP encourages players to examine some of the reasons for and against data sharing with plenty of humour thrown in. Both games were trialled by many of the attendees during Thursday’s Unconference.
I realised in Dublin that it’s 8 years since I attended my first IDCC, held at the University of Bristol in December 2011 while I was still working at the nearby University of Bath. While I haven’t been every year, I’ve been to every one held in Europe since then and it’s interesting to see what has and hasn’t changed. We’re no longer discussing data management plans, data scientists or various other things as abstract concepts that we’d like to encourage, but dealing with the real-world consequences of them.
The conference has also grown over the years: this year was the biggest yet, boasting over 300 attendees. There has been especially big growth in attendees from North America, Australasia, Africa and the Middle East. That’s great for the diversity of the conference as it brings in more voices and viewpoints than ever. With more people around to interact with I have to work harder to manage my energy levels but I think that’s a small price to pay.