eRambler

Jez Cope's blog on becoming a research technologist

Oxford Open Science meeting

On Wednesday 22 August 2012, I gave an invited presentation at the August meeting of Oxford Open Science, hosted at the Oxford e-Research Centre. The theme of the evening was “How do we prepare postgraduate research student for the era of big data?”

There were some interesting presentations around that subject:

  • Juliet Ralph and Oliver Bridle from the Bodleian discussed information seeking behaviour amongst students;
  • Open Knowledge Foundation’s Laura Newman told us about the School of Data, a project to produce learning resources for those working with data;
  • Anna Collins from DSpace Cambridge talked about “long tail in the shadow of big data”.

My own presentation discussed some of the work I’ve done providing social media and data management training for PGRs, and the slides can be viewed here:

As an experiment, the LaTeX source of the slides is also available on github. Let me know if they’re any use.

Comments

Scraping for gold at the Olympics

What if it wasn’t all about the gold medals? The Olympic medal table is always ranked in order of gold medals first, then silver, then bronze.

That seems reasonable, but if you looked at the table at the end of 6 August, for example, you’d have seen that Germany had an impressive 22 medals, including 5 golds, but ranked one place behind Kazakhstan, who had only 7 medals, but 6 of which were gold.

So I thought it was time to do a few things I’ve wanted to try for a while: scrape some publicly available data, do something interesting with it, and write and deploy a Ruby webapp beyond my desktop.

Finding the data

It just so happens that the BBC’s medal table is marked up with some nice semantic attributes:

  • Each <tr> tag has two attributes: data-country-name and data-country-code;
  • Each <td> tag uses the class gold, silver or bronze and contains only the number of medals of that type for that country.

Just scraping by

I could have just scraped that data from within the webapp, but I wanted a) to have a bit more robustness if the source page changed format or disappeared; and b) to make the data easily available to others.

So I wrote this London 2012 medal table scraper in ScraperWiki. ScraperWiki lets you write scrapers in Ruby, Python or PHP using their API and some standard parsing modules to scrape data and store it in an SQLite table. The data is then available as JSON via a REST API, and remains so even if the source page vanishes (it just sends you a notification so you can fix your scraper).

Let’s go Camping

I briefly thought about using Ruby on Rails, but that’s a pretty heavy solution to a very small problem, so instead I turned to Camping, a “web framework which consistently stays at less than 4kB of code.”

Camping is very MVC-based, but your whole app can live in a single file, like a simple CGI script.

Putting it all together

So, here’s my alternative Olympic medal table app, and here’s the code on GitHub.

What are the effects? Well, if you sort by total medals, there’s quite a big shake up. Russia with 41 medals (only 7 gold) shoot up from 6th to 3rd place, pushing Britain down to 4th. North Korea, on the other hand, drop down from 8th to 24th.

Using a weighted sum of the medals (with a gold worth 3 points, silver 2 and bronze 1) yields a similar but less dramatic upheaval, with Russia still up and North Korea still down, but GB restored to 3rd place.

Can you think of a different way to sort the medals? Stick a feature request on the GitHub tracker, or fork it and have a go yourself.

Comments

Open Source #ioe12

This blog post is part of my contribution to the open online course Introduction to Openness in Education.

Ok, so the last post was a bit long. Like essay long. I started writing and then I kept on writing til I’d got it all out. I’m pretty happy with the content, but it took too long to write and it takes too long to read.

So here’s my pithy(ish) introduction to Open Source.

‘Open’, as you might expect, refers to the free sharing of stuff. The ‘Source’ part refers to source code: the human-readable form in which computer software is written. So we’re talking about software distributed in human-modifiable form, not the compiled, click-to-run executable most people are used to.

There are two key arguments in favour of Open Source: the moral one and the economic one.

The moral argument goes like this. In the beginning only a few dedicated hackers had computers. They put their craft first, worked together well and shared their developments with each other. They were able to learn from and build on each other’s code, and everyone was happy.

As the computer industry grew, the business types who started up companies to exploit new developments realised that they could make money by keeping the source code secret and only releasing the executable code to customers. So they made non-free software the norm and the world a poorer place for it.

But there are many people who feel this is naive and unrealistic. To convince them, you also need the economic argument.

Conventional wisdom has it that if you try to build software with a team that’s too large, you get bogged down in communication between team-members and the whole enterprise becomes unmanageable.

This is fairly accurate for closed-source software: the nature of commercial companies is that everything has to be managed in a certain way and everyone has to be in communication with everyone else.

Mathematicians may recognise this as a complete graph — in which every node in connected to every other node — and the problem is that the number of links grows much quicker than the number of people.

Open source projects, like Linux, involve huge numbers of people, so on paper they shouldn’t work. But on a large open source project, most people contribute only to a small part of the whole, only communicating with a few others. Only a small number, by dint of personality type or happenstance, coordinate with many others to keep the whole thing together.

And because these projects don’t suffer from the communication difficulties, they can capitalise on the much larger group of minds working on a problem.

Thanks to this effect, hobbyist programmers really can built high quality software and that’s why OS projects Linux and Apache dominate the modern web between them.

But why should we use open source software?

As Cory Doctorow points out in his recent talk “The coming war on general computation”, the computer is fully general: there’s no program that they can’t in theory run.

That scares a lot of people: it means you can run whatever you like, even software that (shock horror!) makes it possible to break the law. So should governments or corporations be restricting what we can run?

Cars can be used to commit crime, but only a police state would try to restrict where you can drive to, or insist on low-jacking each one. Open source software is controlled by the community, and so is naturally resistant this type of centralised control — you may not agree but I think that’s worth defending.

And as Benjamin Franklin once wrote, “Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety.”

Comments

Sharing and flaky butter buns

You know you’ve made it on the web when you’re asked to take something down.

The story

For Christmas I received, amongst other lovely presents, a copy of Dan Lepard’s book The Handmade Loaf. I really enjoy breadmaking, with all of the processes and the minor biological miracle that turns flour and water into a cohesive loaf.

Since then I’ve been trying out at least one, sometimes several, recipes from the book each weekend. Two weeks ago I made flaky butter buns, posting a photo of the result (delicious) on Twitter and Google+.

I was asked for the recipe, but as a) I’m fairly conscientious and b) I’ve been learning a lot about copyright recently this raised a question: is it a breach of copyright to share someone else’s recipe.

A couple of online conversations and one Guardian article later, I had my answer: recipes are not protected by copyright under either UK or US law. A recipe is an idea, not the expression of an idea, and is therefore not copyrightable.

A recipe may be covered by a patent or trade secret, but for a patent to be granted it would have to differ significantly from any other previous recipe and, having been published in a book, it clearly cannot be a trade secret.

So conscience satisfied, I went ahead and posted the recipe for flaky butter buns on my other blog.

Rather than use Lepard’s own words, which would have infringed copyright, I wrote it in my own style, which tends to skip over steps — you either need to have some baking knowledge or take the hint and buy the book. I also raved (as I have done before) about the book itself, including an Amazon link so that readers could go ahead and buy it for themselves.

I felt in doing so that I behaved appropriately both legally and morally, and thought no more about it.

Several days later, I got an email notifying my of a comment on the post (this rarely happens). As it turns out, this comment was from a member of Lepard’s team accusing me of infringing his copyright.

Now as I’ve said, I don’t believe that I did infringe copyright (if you’re a lawyer, I’d love to hear a legal opinion on this), but since I respect Lepard as a professional and small businessman I chose to respect his wishes (or at least those of his employee) and remove the recipe anyway.

The point

In the end this isn’t a question about the law. It’s about whether sharing (and letting other people share) your stuff is a good idea or not.

Even the most cursory Google search for recipe titles suggests that, should I want to, I could recreate the entire collection for free. But one of the reasons I like this book is that it’s more than a collection of recipes. It’s a well crafted book about bread. In addition to the recipes it contains both photographs (by the author) and descriptions of encounters with bakers around the world.

Yes, you can get most if not all of those recipes for free online and not have to pay a penny, but anyone who’s going to do that was never going to buy the book in the first place. In fact you could get the whole experience of the book for free, just by going down to your local library.

On the other hand, I like to think that a few of my friends might have been motivated to buy their own copy of the book on the basis of my recommendation — word of mouth being the best form of advertising and free to boot.

So now there is no recipe, and no endorsement, and no link to buy the book on Amazon. I’ll probably think twice before recommending the book in the future (wait, didn’t I just do that again three paragraphs ago?).

I’d be kidding myself if I thought this will make the slightest difference to the book’s sales, but you have to wonder: if you have a book to sell, is it worth paying someone to spend time trawling the internet (which is a pretty big place) just to ensure your book is the only place the contents can be found?

People will still send recipes by email, or photocopy them, or pass them on by word of mouth. They will clip them out of the paper, note them down in notebooks and then post the clippings to loved ones.

This has always happened and always will, and though some instances are covered by copyright law, it’s completely unenforcible in such cases.

The internet makes this sharing more visible, but it presents an opportunity too. The classic example is YouTube: increasingly rights owners are taking the option to place ads around potentially infringing videos rather than blindly demand takedowns.

By the way, Martin Weller’s made his whole book, The Digital Scholar available online for free, and some mugs (me included) still seem to be paying for it. Perhaps we’re all just idiots.

All I really want to say is this: if you have a book to sell (or any other creative work), consider carefully the pros and cons of permitting parts to be shared freely.

Policing takes time and time is money, and even if the pros and cons balance out all you’re doing is spending that money to achieve zero result. Perhaps that time would be better spent engaging with your readers in positive ways.

Comments

Open Licensing #ioe12

A copyright will protect you from PIRATES This blog post is part of my contribution to the open online course Introduction to Openness in Education.

At the heart of the various forms of “open” lies the concept of intellectual property: who owns it, who can use it and for what.

A physical object, such as the computer I’m writing this blog post on, is in one place at a time, and its ownership is pretty clear cut: I paid for it and it’s in my house, and if you took it without my permission we’d call that theft.

Things get trickier when you start talking about creative works. If I write a piece of music and you make a copy, I still have the piece of music, but so do you. I can take a photograph of a painting by Degas, and it stays hanging in the gallery, but in some sense I have a copy that I can enjoy independently of the original work.

If this situation goes unchecked, then there’s not a lot of incentive to become an artist, or a composer, or a writer. Even if you charge for your work there’s nothing to stop me buying one copy and then selling hundreds, for which you would see no profit whatsoever.

Under most modern legal systems, the concept of copyright exists to right this imbalance. It does this by allowing the creator of a work the opportunity to exploit that work in whatever way they see fit, effectively creating a monopoly.

As the creator of a work, it’s still possible to grant certain rights to third parties, and this is done by the granting of licenses. This is the mechanism which allows you to “sell” rights to a work in exchange for money or some other consideration.

Fair use/fair dealing

If you were to film an interview in the high street of your town, you might think that it would be difficult to infringe copyright in any way. If you’re not infringing copyright, you don’t need to pay anyone for a license. Yet if, say, a TV set in the background was showing reruns of The Simpsons, then you could well be in from a visit from lawyers representing the Fox Broadcasting Company.

Some jurisdictions include a concept of “fair use” (or fair dealing in the UK), which permits such incidental reuses under a specific set of circumstances. This can make documentary-making, for example, much easier.

However, many organisations (Fox being a common example) are quite happy to threaten legal action and demand that you pay tens or hundreds of thousands of pounds(/dollars/euros/etc.) for a license, even if you may in fact be covered by fair use rules. They are able to do this because most people are unaware of their legal rights, or even if they are do not have the money to fight the ensuing lawsuit.

Even if the law gives you a fair use right to use some work or other, other organisations to which you might sell your own work may not be so forgiving. Because of the litigation culture surrounding copyright, a lot of organisations take a very paranoid approach and insist on rights being cleared and licenses purchased even if they’re not strictly necessary.

Orphaned works

The situation becomes worse when the holder of the rights that must be cleared cannot be found. This usually happens when no contact details can be found for the creator of a work, or when those that can be found are out of date. In many cases, it’s impossible even to know whether the rights holder is still alive, and works like this are referred to as “orphaned works”.

In the early days of copyright this would not have been a problem: for copyright to exist it was necessary to the creator to explicitly assert their rights, and to renew them periodically.

However it is now the case in the US and the UK that copyright automatically exists for the lifetime of the creator and for 70 years after their death. If the creator has passed away, their estate still owns the copyright, but may be impossible to trace until they discover the breach.

For this reason, it is almost impossible to safely use orphaned works — if you do, you do so at your own risk.

Open licensing

As you can see, copyright creates incentives to create, but the way it’s currently implemented can also have a chilling effect on certain types of creation, especially those that involve mashing up existing content.

There’s not a lot most of us can do about the depredations of Fox and their ilk, other than lobbying our MPs for a change in the law. But thankfully we can make it easier for others to make use of our own works.

Open licensing gives creators legal tools to relinquish some or all of their rights over a piece of work, in the interests of supporting the creativity of others.

Creative Commons was set up to provide a set of open licenses which creators can use to make it very easy to understand what can and can’t be done with their work.

The key terms which can be applied by the standard Creative Commons licenses are:

  • Attribution: the creator of the work must be acknowledged in any works which incorporate it;
  • Share-alike: the work can only be used if the resulting work is released under the same license;
  • Non-commercial: the work may only be used if the user doesn’t profit financially from doing so;
  • No derivatives: the work may only be redistributed unchanged from its original form.

By combining these terms, it is possible to specify exactly what rights you want to retain on each individual work.

In higher education, we often find ourselves needing a photo or video to illustrate a point in a class or at a conference, or increasingly in a blog post (like this one). Thanks to Creative Commons, finding content to be used legally in this way is as easy as doing a simple web search — no more excuses!

Conclusion

This was intended to be a short blog post, and it’s already longer than I intended! There are a whole raft of other important issues, such as the creeping extension of copyright terms, which I haven’t had space to cover, but hopefully I’ll come back to those some other time.

For now, I hope you’ve got a good idea of why open licensing is necessary and how you can apply it to your own creative works. It’s worth noting that this whole blog is released under a CC license — just scroll to the bottom!

In writing this post, I made heavy use of this open licensing material, which I encourage you to take a look at if you want to learn more.

Photo credit: Ioan Sameli via Flickr

Comments

The Research Technologist part 2: research focus

This is the second part in my exploration of what it means to be a research technologist. If you haven’t already, check out part 1: proactivity and innovation.

Research focus

There’s another area where the role diverges from the typical member of IT staff: a focus on the unique needs of researchers. Network infrastructure, file storage, email are necessary but not sufficient to meet the needs of a modern researcher.

It’s vitally important to pay close attention to the unique needs of researchers and to find appropriate tools and techniques to adapt to serve those needs as well as possible. Research is after all the primary business of a university, alongside teaching.

So we need to find ways to fulfil the needs not just of an institution’s researchers, but of a faculty’s researchers, or a department’s or even a single research group’s.

I actually think that once we start doing this well, there will be a lot more commonality than there appears to be right now. But first we’ve got to get there.

Serving the long tail

The much abused Pareto Principle holds that in many circumstances 80% of your profit comes from 20% of the people/products/whatever. But we’re not looking to profit from our users, we’re looking to serve them. Questions of how to fund that not withstanding, taking this attitude means you’re ignoring of the people!

If there’s one thing we’ve learned from successes like eBay, Amazon and many more, it’s that if we’re smart we can use modern technology to efficiently provide large numbers of niche products and services without drowning in the overhead traditionally associated with trying to do so.

Research attitude

Again, this can be a problem for centralised IT services, because it’s seen as inefficient for them to put significant R&D time into things which may only ever be of use to a minority of their users.

In an academic department, however, the culture is different. Success in research demands innovation, which requires risk. Scientists and engineers, for example, intrinsically understand the need to experiment, and no-one questions the idea that many of those experiments will fail.

Notice that word fail. In this context failure is not a loss, it’s merely a failure to produce the anticipated results. Most researchers still don’t like failure — they’re human after all. But they learn not to get so hung up on it, because if you set up your experiment right (which is really the key to the whole enterprise) then you learn as much or more from failing as you do from succeeding.

And that’s really the point. We want to help our researchers to do their jobs even better than they already do, which means we need to learn, which in turn means we need to make mistakes. There are no lectures and degree courses to teach us about ideas which don’t exist yet.

So to steal one of those trite little phrases life coaches and the like love so much: fail early, fail often, fail smart and learn from it.

Comments

My first MOOC — Introduction to Online Education

I’ve decided to sign up and join David Wiley’s MOOC, Introduction to Open Education 2012. A MOOC (Massively Open Online Course) is an online course, typically run by a lecturer at a university, which is freely accessible and built around the ideas of connectivism and social learning.

The content of the course, which is about the various ‘kinds’ of openness currently practised in higher education, fits nicely with what I’m doing at the moment so I thought I’d give it a try.

Although I could theoretically find, study and blog about all of the content in this course on my own, I think that the social aspect and the defined set of objectives (in the form of “badges”) combined make it more likely that I will follow through.

Let’s see if that’s actually true…

Comments

Amazon Kindle — 12 months on

Amazon Kindle PDF I’ve now had my Kindle for just over 12 months — it was last year’s Christmas gift from my wonderful wife — and I can quite honestly say that it’s completely changed the way I read.

I’ve always been a keen reader, but sometimes found it difficult to find time to read while also having a book available. I also tended only to buy books one at a time when I was in a physical bookshop. As a consequence, most of my reading happened at home, either in bed or in the bath, and I would get through books at around one a month.

Since getting my Kindle (well, since first getting the Kindle app for iPhone 14 months ago) I have read 45 books. I never used to read non-fiction books, but have just finished my third of the last few months. My decision on what to read next would generally wait until I’d finished my last book, but now I have 14 books waiting to be read and about another 20 on an Amazon wishlist waiting to be purchased.

What’s caused this change? As you might guess, it’s a combination of several things. Compared to a paper book, my Kindle weighs almost nothing, so I can slip it in a bag or a pocket. I can hold it in one hand while drinking tea, or lie on my back and read, both of which I found too tiring to do with paper books.

I also have iPhone and desktop Kindle apps, which are always in sync. I always have my current book with me, so I have many more opportunities to read.

When I finish a book, I can immediately start the next, whether I have one already lined up or I need to go online and buy one. I’ve basically turned into a chain-reader, going from book to book without pause.

Irritatingly, the prices do not reflect the near-zero marginal cost of distributing digital content — if you shift content in the volume that Amazon can, your income is almost pure profit.

However, digital books are still cheaper than the print editions. The difference for popular fiction is pretty small, but I appreciate it nonetheless. For specialist non-fiction, on the other hand, where low volumes make print copies prohibitively expensive, digital editions come at a significant discount — often half price or better in my experience.

I actually wrote the entire first draft of this post without mentioning either screen quality or battery life. Both are so good that it didn’t even occur to me to mention them.

There are downsides too. Because I’m locked into Amazon’s infrastructure, I can’t lend books to friends or family (this feature still hasn’t been enabled outside the US). I also can’t donate books to charity shops once I’ve finished them.

Both of these facts still make me uneasy, and I’m not sure that I want all my books to be controlled by a single company for the rest of my life. And I haven’t even started on the problem of how many books I need to read on Kindle to break even on the carbon footprint, or even whether that’s possible.

That said, my pragmatic side is winning at the moment. Reading on Kindle just works, and it seems to suit my lifestyle much better than books made of dead tree.

I know a lot of people have been given Kindles this Christmas, so I’d love to know if any of my readers have thoughts on this.

Comments

The Research Technologist part 1: proactivity and innovation

I began writing this a couple of months ago, shortly after ALT-C, the Association of Learning Technology Conference. Then it turned into “one of those posts” that I had to perfect before I could publish it. And that’s silly, so I’m going to publish it now and continue it in further posts, because this is a blog, not a thesis.

Anyway, as is often the case at conferences when you meet a lot of people, I kept having to answer the question “What do you do?”. My actual job title is “ICT Project Manager”, which while impressive sounding doesn’t go any way to explain what I do. In the end, I came up with the following stock response:

“I’m a research technologist: I have a very similar role to learning technologists, except that I support academics as researchers instead of as teachers.”

There are a few roles out there which sound similar, or which have similar names, so I thought I’d mention a few things that set this role apart from other similar sounding jobs. This post is the first part of a series exploring those aspects.

First, a disclaimer

I’m not foolish enough to think I’m the only person doing this type of job, or to pick out these features as important, or even to come up with that name. I’m quite certain there are people doing this in IT departments, in research development departments, certainly in academic departments and quite possibly in e-learning departments too. It’s more that there seems to be no standard position for this role (except where institutions have dedicated e-research teams) and I’m setting out to find other people in similar roles to share ideas with.

Proactivity and innovation

Although part of my role is to support existing systems and respond to queries from users, that’s not the whole of it. I feel it’s important to keep abreast of the latest technology innovations and explore how they can be used to support research. This contrasts with the typical approach of central university IT services, which generally have a core set of “supported” software and services with rigorous procedures and checks in place to control changes to that set.

I don’t wish to suggest that this centralised model is inappropriate: on the contrary it’s absolutely necessary. University IT services have the very challenging job of providing an acceptable and consistent standard of service to a huge and diverse user base. To do this efficiently it’s necessary to make sure that all IT staff have a reasonable understanding of every supported service, which just can’t happen if that set of services is too large.

The trouble is that as well as providing users with a very stable, high level of support for essential services (networking, email, payroll and so on), it also tends to stifle innovation. If a new service is to be offered, a lot of time and resources must be invested in doing so at the level of existing services; quite a risk if there’s no guarantee that the new service will succeed. That means that there’s no scope to start something small, with the option of either growing it organically if it takes off or letting it die peacefully if it’s not right.


I’ll be exploring this further soon, but for now I’d be interested in your take, especially if you disagree or recognise some of what I say in your own role.

Comments

Nose to the blogstone

Well, I’m just back from the launch meeting of the JISC Managing Research Data programme, of which our Research360 project at Bath is a part, and coming to terms with the fact that blogging is now an inescapable part of my job.

Looks like it’s time to get back into my blogging rhythm once more. Time to make a few tweaks that I’ve been planning to the layout too. Let me know what you think.

Comments