A couple of months ago,
I went to Oxford for an intensive, 2-day course
run by Software Carpentry and Data Carpentry
for prospective new instructors.
I’ve now had confirmation that I’ve completed the checkout procedure so it’s official:
I’m now a certified Data Carpentry instructor!
As far as I’m aware, the certification process is now combined,
so I’m also approved to teach Software Carpentry material too.
And of course there’s Library Carpentry too…
For most of the last few years I've been lucky enough to attend the International Digital Curation Conference (IDCC). One of the main audiences attending is people who, like me, work on research data management at universities around the world and it's begun to feel like a sort of "home" conference to me. This year, IDCC was held at the Royal College of Surgeons in the beautiful city of Edinburgh.
For the last couple of years, my overall impression has been that, as a community, we're moving away from the "first-order" problem of trying to convince people (from PhD students to senior academics) to take RDM seriously and into a rich set of "second-order" problems around how to do things better and widen support to more people. This year has been no exception. Here are a few of my observations and takeaway points.
Go ahead and take a look.I’ll still be here when you come back.
According to my Twitter profile,
I joined in February 2009 as user #20,049,102.
It was nearing its 3rd birthday and,
though there were clearly a lot of people already signed up at that point,
it was still relatively quiet, especially in the UK.
OpenRefine is a great tool
for exploring and cleaning datasets prior to analysing them.
It also records an undo history of all actions
that you can export as a sort of script in JSON format.
One thing that bugs me though is that,
having spent some time interactively cleaning up your dataset,
you then need to fire up OpenRefine again
and do some interactive mouse-clicky stuff
to apply that cleaning routine to another dataset.
You can at least re-import the JSON undo history to make that as quick as possible,
but there’s no getting around the fact that
there’s no quick way to do it from a cold start.
I’ve been meaning to give Yesterbox a try for a while.
The general idea is
that each day you only deal with email that arrived
yesterday or earlier.
This forms your inbox for the day,
hence “yesterbox”.
Once you’ve emptied your yesterbox,
or at least got through some minimum number
(10 is recommended)
then you can look at emails from today.
Even then you only really want to be dealing with
things that are absolutely urgent.
Anything else can wait til tomorrow.
Whenever I’m involved in a discussion about how to encourage researchers to adopt new practices, eventually someone will come out with some variant of the following phrase:
“That’s all very well, but researchers will never do XYZ until it’s made a criterion in hiring and promotion decisions.”
With all the discussion of carrots and sticks I can see where this attitude comes from, and strongly empathise with it, but it raises two main problems:
“The single most important rule of testing is to do it.”
— Brian Kernighan and Rob Pike, The Practice of Programming (quote taken from SC Test page
One of the trickiest aspects of developing software
is making sure that it actually does what it’s supposed to.
Sometimes failures are obvious:
you get completely unreasonable output
or even (shock!) a comprehensible error message.
But failures are often more subtle.
Would you notice if your result was out by a few percent,
or consistently ignored the first row of your input data?
I really love Markdown1. I love its simplicity; its readability; its plain-text nature. I love that it can be written and read with nothing more complicated than a text-editor. I love how nicely it plays with version control systems. I love how easy it is to convert to different formats with Pandoc and how it’s become effectively the native text format for a wide range of blogging platforms.
This competition will be an opportunity for the next wave of developers to show their skills to the world — and to companies like ours.
— Dick Hardt, ActiveState (quote taken from SC Track page)
All code contains bugs,
and all projects have features that users would like
but which aren’t yet implemented.
Open source projects tend to get more of these
as their user communities grow and start requesting improvements to the product.
As your open source project grows,
it becomes harder and harder to keep track of and prioritise
all of these potential chunks of work.
What do you do?
Nine years ago, when I first release Python to the world, I distributed it with a Makefile for BSD Unix. The most frequent questions and suggestions I received in response to these early distributions were about building it on different Unix platforms. Someone pointed me to autoconf, which allowed me to create a configure script that figured out platform idiosyncracies Unfortunately, autoconf is painful to use – its grouping, quoting and commenting conventions don’t match those of the target language, which makes scripts hard to write and even harder to debug. I hope that this competition comes up with a better solution — it would make porting Python to new platforms a lot easier!
— Guido van Rossum, Technical Director, Python Consortium (quote taken from SC Config page)