Naked Fairphone
I’ve had my eye on the Fairphone 2 for a while now, and when my current phone, an aging Samsung Galany S4, started playing up I decided it was time to take the plunge. A few people have asked for my thoughts on the Fairphone so here are a few notes.

Why I bought it

The thing that sparked my interest, and the main reason for buying the phone really, was the ethical stance of the manufacturer. The small Swedish company have gone to great lengths to ensure that both labour and materials are sourced as responsibly as possible. They regularly inspect the factories where the parts are made and assembled to ensure fair treatment of the workers and they source all the raw materials carefully to minimise the environmental impact and the use of conflict minerals.

Another side to this ethical stance is a focus on longevity of the phone itself. This is not a product with an intentionally limited lifespan. Instead, it’s designed to be modular and as repairable as possible, by the owner themselves. Spares are available for all of the parts that commonly fail in phones (including screen and camera), and at the time of writing the Fairphone 2 is the only phone to receive 10/10 for reparability from iFixit. There are plans to allow hardware upgrades, including an expansion port on the back so that NFC or wireless charging could be added with a new case, for example.

What I like

So far, the killer feature for me is the dual SIM card slots. I have both a personal and a work phone, and the latter was always getting left at home or in the office or running out of charge. Now I have both SIMs in the one phone: I can recieve calls on either number, turn them on and off independently and choose which account to use when sending a text or making a call.

The OS is very close to “standard” Android, which is nice, and I really don’t miss all the extra bloatware that came with the Galaxy S4. It also has twice the storage of that phone, which is hardly unique but is still nice to have.

Overall, it seems like a solid, reliable phone, though it’s not going to outperform anything else at the same price point. It certainly feels nice and snappy for everything I want to use it for. I’m no mobile gamer, but there is that distant promise of upgradability on the horizon if you are.

What I don’t like

I only have two bugbears so far. Once or twice it’s locked up and become unresponsive, requiring a “manual reset” (removing and replacing the battery) to get going again. It also lacks NFC, which isn’t really a deal breaker, but I was just starting to make occasional use of it on the S4 (mostly experimenting with my Yubikey NEO) and it would have been nice to try out Android Pay when it finally arrives in the UK.


It’s definitely a serious contender if you’re looking for a new smartphone and aren’t bothered about serious mobile gaming. You do pay a premium for the ethical sourcing and modularity, but I feel that’s worth it for me. I’m looking forward to seeing how it works out as a phone.


XKCD: automation

I’m a nut for automating repetitive tasks, so I was dead pleased a few years ago when I discovered that IFTTT let me plug different bits of the web together. I now use it for tasks such as:

  • Syndicating blog posts to social media
  • Creating scheduled/repeating todo items from a Google Calendar
  • Making a note to revisit an article I’ve starred in Feedly

I’d probably only be half-joking if I said that I spend more time automating things than I save not having to do said things manually. Thankfully it’s also a great opportunity to learn, and recently I’ve been thinking about reimplementing some of my IFTTT workflows myself to get to grips with how it all works.

There are some interesting open source projects designed to offer a lot of this functionality, such as Huginn, but I decided to go for a simpler option for two reasons:

  1. I want to spend my time learning about the APIs of the services I use and how to wire them together, rather than learning how to use another big framework; and
  2. I only have a small Amazon EC2 server to pay with and a heavy Ruby on Rails app like Huginn (plus web server) needs more memory than I have.

Instead I’ve gone old-school with a little collection of individual scripts to do particular jobs. I’m using the built-in scheduling functionality of systemd, which is already part of a modern Linux operating system, to get them to run periodically. It also means I can vary the language I use to write each one depending on the needs of the job at hand and what I want to learn/feel like at the time. Currently it’s all done in Python, but I want to have a go at Lisp sometime, and there are some interesting new languages like Go and Julia that I’d like to get my teeth into as well.

You can see my code on github as it develops: Comments and contributions are welcome (if not expected) and let me know if you find any of the code useful.

Image credit: xkcd #1319, Automation


I admit it: I’m a grammar nerd. I know the difference between ‘who’ and ‘whom’, and I’m proud.

I used to be pretty militant, but these days I’m more relaxed. I still take joy in the mechanics of the language, but I also believe that English is defined by its usage, not by a set of arbitrary rules. I’m just as happy to abuse it as to use it, although I still think it’s important to know what rules you’re breaking and why.

My approach now boils down to this: language is like clothing. You (probably) wouldn’t show up to a job interview in your pyjamas1, but neither are you going to wear a tuxedo or ballgown to the pub.

Getting commas and semicolons in the right place is like getting your shirt buttons done up right. Getting it wrong doesn’t mean you’re an idiot. Everyone will know what you meant. It will affect how you’re perceived, though, and that will affect how your message is perceived.

And there are former rules2 that some still enforce that are nonetheless dropping out of regular usage. There was a time when everyone in an office job wore formal clothing. Then it became acceptable just to have a blouse, or a shirt and tie. Then the tie became optional and now there are many professions where perfectly well-respected and competent people are expected to show up wearing nothing smarter than jeans and a t-shirt.

One such rule IMHO is that ‘data’ is a plural and should take pronouns like ‘they’ and ‘these’. The origin of the word ‘data’ is in the Latin plural of ‘datum’, and that idea has clung on for a considerable period. But we don’t speak Latin and the English language continues to evolve: ‘agenda’ also began life as a Latin plural, but we don’t use the word ‘agendum’ any more. It’s common everyday usage to refer to data with singular pronouns like ‘it’ and ‘this’, and it’s very rare to see someone referring to a single datum (as opposed to ‘data point’ or something).

If you want to get technical, I tend to think of data as a mass noun, like ‘water’ or ‘information’. It’s uncountable: talking about ‘a water’ or ‘an information’ doesn’t make much sense, but it uses singular pronouns, as in ‘this information’. If you’re interested, the Oxford English Dictionary also takes this position, while Chambers leaves the choice of singular or plural noun up to you.

There is absolutely nothing wrong, in my book, with referring to data in the plural as many people still do. But it’s no longer a rule and for me it’s weakened further from guideline to preference.

It’s like wearing a bow-tie to work. There’s nothing wrong with it and some people really make it work, but it’s increasingly outdated and even a little eccentric.

  1. or maybe you’d totally rock it.

  2. Like not starting a sentence with a conjunction…


Well, I did a great job of blogging the conference for a couple of days, but then I was hit by the bug that’s been going round and didn’t have a lot of energy for anything other than paying attention and making notes during the day! I’ve now got round to reviewing my notes so here are a few reflections on day 2.

Day 2 was the day of many parallel talks! So many great and inspiring ideas to take in! Here are a few of my take-home points.

Big science and the long tail

The first parallel session had examples of practical data management in the real world. Jian Qin & Brian Dobreski (School of Information Studies, Syracuse University) worked on reproducibility with one of the research groups involved with the recent gravitational wave discovery. “Reproducibility” for this work (as with much of physics) mostly equates to computational reproducibility: tracking the provenance of the code and its input and output is key. They also found that in practice the scientists’ focus was on making the big discovery, and ensuring reproducibility was seen as secondary. This goes some way to explaining why current workflows and tools don’t really capture enough metadata.

Milena Golshan & Ashley Sands (Center for Knowledge Infrastructures, UCLA) investigated the use of Software-as-a-Service (SaaS, such as Google Drive, Dropbox or more specialised tools) as a way of meeting the needs of long-tail science research such as ocean science. This research is characterised by small teams, diverse data, dynamic local development of tools, local practices and difficulty disseminating data. This results in a need for researchers to be generalists, as opposed to “big science” research areas, where they can afford to specialise much more deeply. Such generalists tend to develop their own isolated workflows, which can differ greatly even within a single lab. Long-tail research also often struggles from a lack of dedicated IT support. They found that use of SaaS could help to meet these challenges, but with a high cost required to cover the needed guarantees of security and stability.

Education & training

This session focussed on the professional development of library staff. Eleanor Mattern (University of Pittsburgh) described the immersive training introduced to improve librarians’ understanding of the data needs of their subject areas in delivering their RDM service delivery model. The participants each conducted a “disciplinary deep dive”, shadowing researchers and then reporting back to the group on their discoveries with a presentation and discussion.

Liz Lyon (also University of Pittsburgh, formerly UKOLN/DCC) gave a systematic breakdown of the skills, knowledge and experience required in different data-related roles, obtained from an analysis of job adverts. She identified distinct roles of data analyst, data engineer and data journalist, and as well as each role’s distinctive skills, pinpointed common requirements of all three: Python, R, SQL and Excel. This work follows on from an earlier phase which identified an allied set of roles: data archivist, data librarian and data steward.

Data sharing and reuse

This session gave an overview of several specific workflow tools designed for researchers. Marisa Strong (University of California Curation Centre/California Digital Libraries) presented Dash, a highly modular tool for manual data curation and deposit by researchers. It’s built on their flexible backend, Stash, and though it’s currently optimised to deposit in their Merritt data repository it could easily be hooked up to other repositories. It captures DataCite metadata and a few other fields, and is integrated with ORCID to uniquely identify people.

In a different vein, Eleni Castro (Institute for Quantitative Social Science, Harvard University) discussed some of the ways that Harvard’s Dataverse repository is streamlining deposit by enabling automation. It provides a number of standardised endpoints such as OAI-PMH for metadata harvest and SWORD for deposit, as well as custom APIs for discovery and deposit. Interesting use cases include:

  • An addon for the Open Science Framework to deposit in Dataverse via SWORD
  • An R package to enable automatic deposit of simulation and analysis results
  • Integration with publisher workflows Open Journal Systems
  • A growing set of visualisations for deposited data

In the future they’re also looking to integrate with DMPtool to capture data management plans and with Archivematica for digital preservation.

Andrew Treloar (Australian National Data Service) gave us some reflections on the ANDS “applications programme”, a series of 25 small funded projects intended to address the fourth of their strategic transformations, single usereusable. He observed that essentially these projects worked because they were able to throw money at a problem until they found a solution: not very sustainable. Some of them stuck to a traditional “waterfall” approach to project management, resulting in “the right solution 2 years late”. Every researcher’s needs are “special” and communities are still constrained by old ways of working. The conclusions from this programme were that:

  • “Good enough” is fine most of the time
  • Adopt/Adapt/Augment is better than Build
  • Existing toolkits let you focus on the 10% functionality that’s missing
  • Succussful projects involved research champions who can: 1) articulate their community’s requirements; and 2) promote project outcomes


All in all, it was a really exciting conference, and I’ve come home with loads of new ideas and plans to develop our services at Sheffield. I noticed a continuation of some of the trends I spotted at last year’s IDCC, especially an increasing focus on “second-order” problems: we’re no longer spending most of our energy just convincing researchers to take data management seriously and are able to spend more time helping them to do it better and get value out of it. There’s also a shift in emphasis (identified by closing speaker Cliff Lynch) from sharing to reuse, and making sure that data is not just available but valuable.


The main conference opened today with an inspiring keynote by Barend Mons, Professor in Biosemantics, Leiden University Medical Center. The talk had plenty of great stuff, but two points stood out for me.

First, Prof Mons described a newly discovered link between Huntingdon’s Disease and a previously unconsidered gene. No-one had previously recognised this link, but on mining the literature, an indirect link was identified in more than 10% of the roughly 1 million scientific claims analysed. This is knowledge for which we already had more than enough evidence, but which could never have been discovered without such a wide-ranging computational study.

Second, he described a number of behaviours which should be considered “malpractice” in science:

  • Relying on supplementary data in articles for data sharing: the majority of this is trash (paywalled, embedded in bitmap images, missing)
  • Using the Journal Impact Factor to evaluate science and ignoring altmetrics
  • Not writing data stewardship plans for projects (he prefers this term to “data management plan”)
  • Obstructing tenure for data experts by assuming that all highly-skilled scientists must have a long publication record

A second plenary talk from Andrew Sallons of the Centre for Open Science introduced a number of interesting-looking bits and bobs, including the Transparency & Openness Promotion (TOP) Guidelines which set out a pathway to help funders, publishers and institutions move towards more open science.

The rest of the day was taken up with a panel on open data, a poster session, some demos and a birds-of-a-feather session on sharing sensitive/confidential data. There was a great range of posters, but a few that stood out to me were:

  • Lessons learned about ISO 16363 (“Audit and certification of trustworthy digital repositories”) certification from the British Library
  • Two separate posters (from the Universities of Toronto and Colorado) about disciplinary RDM information & training for liaison librarians
  • A template for sharing psychology data developed by a psychologist-turned-information researcher from Carnegie Mellon University

More to follow, but for now it’s time for the conference dinner!


I’m at the International Digital Curation Conference 2016 (#IDCC16) in Amsterdam this week. It’s always a good opportunity to pick up some new ideas and catch up with colleagues from around the world, and I always come back full of new possibilities. I’ll try and do some more reflective posts after the conference but I thought I’d do some quick reactions while everything is still fresh.

Monday and Thursday are pre- and post-conference workshop days, and today I attended Developing Research Data Management Services. Joy Davidson and Jonathan Rans from the Digital Curation Centre (DCC) introduced us to the Business Model Canvas, a template for designing a business model on a single sheet of paper. The model prompts you to think about all of the key facets of a sustainable, profitable business, and can easily be adapted to the task of building a service model within a larger institution. The DCC used it as part of the Collaboration to Clarify Curation Costs (4C) project, whose output the Curation Costs Exchange is also worth a look.

It was a really useful exercise to be able to work through the whole process for an aspect of research data management (my table focused on training & guidance provision), both because of the ideas that came up and also the experience of putting the framework into practice. It seems like a really valuable tool and I look forward to seeing how it might help us with our RDM service development.

Tomorrow the conference proper begins, with a range of keynotes, panel sessions and birds-of-a-feather meetings so hopefully more then!


I like the personal kanban way of working. It satisfies my need to make lists and track everything in one place, while being flexible enough to evolve and adapt with minimal friction, and I like the feeling it gives of tasks flowing through my workflow. I also prefer digital tools in general, because (battery life permitting) I can generally use them wherever I am.

For online kanban-ing I really like Trello but recently I’ve been trying out another product, LeanKit, so I wanted to note down my thoughts on how they measure up.


Trello is fairly simple in concept, though probably inspired by the ideas of kanban. The overall structure is that you create cards, arrange them vertically into columns (“lists”) and group the lists into boards. That’s all the structure there is, but you can have any number of boards, lists and cards.

  • What I like:
    • Create cards via email (especially when combined with automation tool IFTTT)
    • Smooth interface with a sense of fun and amazing mobile apps
    • Can have lots of boards with different backgrounds
    • Flexible sharing features for individuals and organisations
    • Keyboard shortcuts for many features
  • What I don’t like:
    • Inflexibility of structure (partly overcome with multiple boards)
    • The lists all look the same so it’s hard to orient yourself when quickly glancing at one


Where Trello is all about simplicity, LeanKit is more about power. The overarching concept is similar: you’re still arranging cards in columns (“lanes” in LeanKit). The key difference is that LeanKit has much more flexibility in how you arrange your lanes: you can split them vertically or horizontally as many times as you like, allowing much more hierarchical grouping structures.

  • What I like:
    • Very flexible: you can freely split lanes vertically & horizontally to create your desired structure
    • It hides away old cards in a fully searchable archive
    • Bulk editing/moving of cards
    • Some premium features (e.g. sub-boards within cards, analytics)
  • What I don’t like
    • The best features are paid-only:
      • Sharing boards
      • Moving stuff between boards
      • More than 3 boards
    • The interface feels stuck in the mid-2000s
    • Poor mobile support: only third-party apps are available and their support of some features is limited
    • Possibly too flexible: it tends to lead me down process-tweaking rabbit-holes when I should be getting things done

What am I doing now?

LeanKit was an interesting experiment, and I think it has a lot of value for those who need those more advanced features and are prepared to pay. At the end of the day though, I’m not one of those people so I’ve moved back to Trello.

I have, though, learned a lot about flexible use of boards from my experience with LeanKit and I’m experimenting a lot more with how I use them now I’m back in Trello. For example I’m increasingly creating separate boards for particular types of task (e.g. people I want to meet) and for larger projects.

In summary: if you can justify paying the cash and don’t mind the clunkiness, try LeanKit, but otherwise, just use Trello!

I’d be interested to know which company is the more profitable: does LeanKit’s focus on big enterprise customers pay off or hold them back by putting off individuals?


Funders, publishers, research institutions and many other groups are increasingly keen that researchers make more of their data more open. There are some very good reasons for doing this, but many researchers have legitimate concerns that must be dealt with before they can be convinced. This is the first in what I hope will be a series of posts exploring arguments against sharing data.

“We really want to share our data more widely, but we’re worried that it’s going to give the crackpots more opportunity to pick holes in our findings.”

A PhD student asked me something like this recently, and it’s representative of some very real concerns for a lot of researchers. While I answered the question, I didn’t feel satisfied with my response, so I wanted to unpack it a bit more in preparation for next time.

It seems to me that there are three parts to this. No-one likes to:

  • Have their time wasted
  • Be wrongfully and unfairly discredited
  • Have genuine flaws found in their work

Having genuine errors challenged is a very useful thing, but spurious challenges (i.e. those with no valid basis) can be a stressful time-sink. Such challenges may be made by someone with an interest in seeing you (or your results) discredited; they may also be made by someone who simply fails to understand a key concept of your research1. Either way, they’re a nuisance and rightly to be avoided.

Perhaps the scariest aspect of this is the possibility that your critics might actually be on to something. No-one really enjoys finding out that they’ve made a mistake, and we naturally tend to avoid situations where an error we didn’t know was there might be brought to light.

If all this is so, why should you share your data? Ultimately, there will always be crackpots, or at least people with an ax to grind. Publishing your data won’t change this, but it will add weight to your own arguments. Firstly it says that you’re confident enough in your work to put it out there. But secondly it gives impartial readers the opportunity to verify your claims independently and come to their own judgement about any potential criticism. It’s much harder for the “crackpots” to pick holes in your work when your supporting evidence is available and the validity of your argument can be easily demonstrated.

There’s also a need to accept, and indeed seek out, valid criticism. None of us is perfect and everyone makes mistakes from time to time. When that happens it’s important to find out sooner rather than later and be ready to make corrections, learn and move on.

  1. Don’t forget Hanlon’s razor: “Never attribute to malice that which can adequately be explained by incompetence.”


It’s been a bit hectic lately because I’ve been finishing up my old job (at Imperial College) and getting started on my new one (Research Data Manager at the University of Sheffield), with a bit of a holiday in between. Hopefully things will calm down a bit now and get back to normal (whatever that looks like…).

In the meantime here are three things I will miss from Imperial:

  • Lovely, friendly, supportive, competent and professional colleagues
  • Lunchtime walks in Hyde Park
  • Imperial College Scifi & Fantasy library (part of the Students’ Union)

And three things I won’t miss:

  • Rude people & overcrowding on the tube/bus/etc
  • Masses of air and noise pollution
  • Travelling between Leeds & London all the time

And finally, three things I’m looking forward to in Sheffield:

  • Taking up a new challenge with a new set of disciplines to work with
  • Catching up with old friends and making a few new ones
  • Lunchtime walks in Weston Park, Crookes Valley Park & the Ponderosa