eRambler

Jez Cope's blog on becoming a research technologist

#cfhe12: another attempt at MOOCing

It’s October, which means the autumn TV season has started, which means that Strictly Come Dancing is back on for another year, which means it’s time for a flurry of blog posts as I leave my wonderful other half to shout at the TV on weekend evenings.

I’ve decided to have another go at joining in with another MOOC to give me some blog fuel, and this time round it’s Current & Future State of Higher Education 2012.

My last MOOC attempt, IOE12, sort of fizzled out (my participation, not the course itself) as I didn’t really have the time to keep it going. Hopefully I’ll do better this time, but if not I’m sure I’ll learn something anyway.

So, hello fellow MOOCers and watch this space!

Comments

#altc2012 Part 2: Apps & networks

It’s been a little while since ALT-C 2012 now, so I thought I’d better write up the rest of my notes. Here’s day 2 in all it’s glory.

My day started off with James Clay’s workshop entitled “A few of my favourite things” — just an opportunity for gadget lovers to share some of their favourite apps (mostly iPad/iPhone, but a few Androids in there too).

There were a lot of popular apps in there, like the ever-present Evernote and Instagram, but there were a few interesting ones I hadn’t come across, or was able to see in a new light:

JotNot
Lets you take a photo of a page and semi-automatically straightens it and enhances it so you get a flat, high-contrast version — a scanner in your pocket. Looks like this is abandonware, but instead I discovered Genius Scan, which has many more features.
TunePal
One for lovers of traditional music: search for info on and dots for a traditional tune by playing a bit of it into your phone.

Next followed an interesting session introducing some tools from projects on the JISC Digital Literacies programme. I particularly liked the digital literacies lens on the SCONUL Seven Pillars of Information Literacy. There’s a lot of (perhaps true but not very helpful) talk going round at the moment about “everyone having a different definition of digital literacy”, so it’s good to see a fairly concise high-level view of what we’re actually talking about on that subject.

As a recovering mathematician, I found Natasa Milic-Frayling’s keynote on network analysis fascinating. Her team at Microsoft Research have developed an Excel plugin, NodeXL for analysing networks (and obtaining data from social networks to analyse).

She described some interesting work analysing voting patterns of US senators, and correlating connections in social networks with geographic distribution.

Students introduced to NodeXL were able to get straight into playing with network data, and quickly took on board the basic concepts (various ideas of the importance of a network node) without needing to grasp the underlying maths (such as the various equations for centrality).

My last session of the day was from Clive Young of University College London, talking about “blended” roles in e-learning. These are typically those people who provide general admin support to lecturers, and are increasingly being expected to managed VLE modules and other online elements of courses on behalf of the lecturers.

At UCL, these teaching administrators with blended roles had self-organised into a support network, as they were getting no targeted support on how to use Moodle from the e-learning team. This was, of course, rectified, and in the end 10% of the staff identified in blended roles went on to achieve CMALT status.

All interesting stuff, and I’ll be back to post my thoughts on day 3 soon.

Comments

#altc2012 Part 1: Bring on the data!

So today was day 1 of ALT-C 2012. Here are a few thoughts from the day.

The conference kicked off with an inspiring keynote from Eric Mazur. Eric is a physicist at Harvard, and when he’s not doing photonics research, he brings the scientific method to bear on his teaching practice.

He gave three examples that were interesting in their own right, but the key takeaway message was this: data is essential to improving teaching practice. Rather than coming up with anecdotes that go “well, my students seem to like it when I blah blah blah”, why not set up a simple experiment to actually test what helps those students learn.

After lunch, Cathy and I did a workshop on using research data for teaching, as part of the Research360 project. I won’t go into too much detail (it did what it said on the tin), other than to say that I felt like it went pretty well — all the attendees got into the exercises and some really productive discussions took place.

Take a look at the session page to see the slides and exercises.

After that, I saw a couple of demonstrations of some cool stuff (NoobLab, curatr), and caught up with a few of the JISC digital literacy projects.

So far, then, another interesting conference. The catering’s been pretty good too. A lot of carbohydrate, though: lunch was served with cous cous, chips and boiled potatoes (and bread rolls if you wanted) and dinner was equally carbalicious. Perhaps it’ll help me run faster in the morning.

It’s late. I’m wittering. Bye for now!

Comments

Oxford Open Science meeting

On Wednesday 22 August 2012, I gave an invited presentation at the August meeting of Oxford Open Science, hosted at the Oxford e-Research Centre. The theme of the evening was “How do we prepare postgraduate research student for the era of big data?”

There were some interesting presentations around that subject:

  • Juliet Ralph and Oliver Bridle from the Bodleian discussed information seeking behaviour amongst students;
  • Open Knowledge Foundation’s Laura Newman told us about the School of Data, a project to produce learning resources for those working with data;
  • Anna Collins from DSpace Cambridge talked about “long tail in the shadow of big data”.

My own presentation discussed some of the work I’ve done providing social media and data management training for PGRs, and the slides can be viewed here:

As an experiment, the LaTeX source of the slides is also available on github. Let me know if they’re any use.

Comments

Scraping for gold at the Olympics

What if it wasn’t all about the gold medals? The Olympic medal table is always ranked in order of gold medals first, then silver, then bronze.

That seems reasonable, but if you looked at the table at the end of 6 August, for example, you’d have seen that Germany had an impressive 22 medals, including 5 golds, but ranked one place behind Kazakhstan, who had only 7 medals, but 6 of which were gold.

So I thought it was time to do a few things I’ve wanted to try for a while: scrape some publicly available data, do something interesting with it, and write and deploy a Ruby webapp beyond my desktop.

Finding the data

It just so happens that the BBC’s medal table is marked up with some nice semantic attributes:

  • Each <tr> tag has two attributes: data-country-name and data-country-code;
  • Each <td> tag uses the class gold, silver or bronze and contains only the number of medals of that type for that country.

Just scraping by

I could have just scraped that data from within the webapp, but I wanted a) to have a bit more robustness if the source page changed format or disappeared; and b) to make the data easily available to others.

So I wrote this London 2012 medal table scraper in ScraperWiki. ScraperWiki lets you write scrapers in Ruby, Python or PHP using their API and some standard parsing modules to scrape data and store it in an SQLite table. The data is then available as JSON via a REST API, and remains so even if the source page vanishes (it just sends you a notification so you can fix your scraper).

Let’s go Camping

I briefly thought about using Ruby on Rails, but that’s a pretty heavy solution to a very small problem, so instead I turned to Camping, a “web framework which consistently stays at less than 4kB of code.”

Camping is very MVC-based, but your whole app can live in a single file, like a simple CGI script.

Putting it all together

So, here’s my alternative Olympic medal table app, and here’s the code on GitHub.

What are the effects? Well, if you sort by total medals, there’s quite a big shake up. Russia with 41 medals (only 7 gold) shoot up from 6th to 3rd place, pushing Britain down to 4th. North Korea, on the other hand, drop down from 8th to 24th.

Using a weighted sum of the medals (with a gold worth 3 points, silver 2 and bronze 1) yields a similar but less dramatic upheaval, with Russia still up and North Korea still down, but GB restored to 3rd place.

Can you think of a different way to sort the medals? Stick a feature request on the GitHub tracker, or fork it and have a go yourself.

Comments

Open Source #ioe12

This blog post is part of my contribution to the open online course Introduction to Openness in Education.

Ok, so the last post was a bit long. Like essay long. I started writing and then I kept on writing til I’d got it all out. I’m pretty happy with the content, but it took too long to write and it takes too long to read.

So here’s my pithy(ish) introduction to Open Source.

‘Open’, as you might expect, refers to the free sharing of stuff. The ‘Source’ part refers to source code: the human-readable form in which computer software is written. So we’re talking about software distributed in human-modifiable form, not the compiled, click-to-run executable most people are used to.

There are two key arguments in favour of Open Source: the moral one and the economic one.

The moral argument goes like this. In the beginning only a few dedicated hackers had computers. They put their craft first, worked together well and shared their developments with each other. They were able to learn from and build on each other’s code, and everyone was happy.

As the computer industry grew, the business types who started up companies to exploit new developments realised that they could make money by keeping the source code secret and only releasing the executable code to customers. So they made non-free software the norm and the world a poorer place for it.

But there are many people who feel this is naive and unrealistic. To convince them, you also need the economic argument.

Conventional wisdom has it that if you try to build software with a team that’s too large, you get bogged down in communication between team-members and the whole enterprise becomes unmanageable.

This is fairly accurate for closed-source software: the nature of commercial companies is that everything has to be managed in a certain way and everyone has to be in communication with everyone else.

Mathematicians may recognise this as a complete graph — in which every node in connected to every other node — and the problem is that the number of links grows much quicker than the number of people.

Open source projects, like Linux, involve huge numbers of people, so on paper they shouldn’t work. But on a large open source project, most people contribute only to a small part of the whole, only communicating with a few others. Only a small number, by dint of personality type or happenstance, coordinate with many others to keep the whole thing together.

And because these projects don’t suffer from the communication difficulties, they can capitalise on the much larger group of minds working on a problem.

Thanks to this effect, hobbyist programmers really can built high quality software and that’s why OS projects Linux and Apache dominate the modern web between them.

But why should we use open source software?

As Cory Doctorow points out in his recent talk “The coming war on general computation”, the computer is fully general: there’s no program that they can’t in theory run.

That scares a lot of people: it means you can run whatever you like, even software that (shock horror!) makes it possible to break the law. So should governments or corporations be restricting what we can run?

Cars can be used to commit crime, but only a police state would try to restrict where you can drive to, or insist on low-jacking each one. Open source software is controlled by the community, and so is naturally resistant this type of centralised control — you may not agree but I think that’s worth defending.

And as Benjamin Franklin once wrote, “Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety.”

Comments

Sharing and flaky butter buns

You know you’ve made it on the web when you’re asked to take something down.

The story

For Christmas I received, amongst other lovely presents, a copy of Dan Lepard’s book The Handmade Loaf. I really enjoy breadmaking, with all of the processes and the minor biological miracle that turns flour and water into a cohesive loaf.

Since then I’ve been trying out at least one, sometimes several, recipes from the book each weekend. Two weeks ago I made flaky butter buns, posting a photo of the result (delicious) on Twitter and Google+.

I was asked for the recipe, but as a) I’m fairly conscientious and b) I’ve been learning a lot about copyright recently this raised a question: is it a breach of copyright to share someone else’s recipe.

A couple of online conversations and one Guardian article later, I had my answer: recipes are not protected by copyright under either UK or US law. A recipe is an idea, not the expression of an idea, and is therefore not copyrightable.

A recipe may be covered by a patent or trade secret, but for a patent to be granted it would have to differ significantly from any other previous recipe and, having been published in a book, it clearly cannot be a trade secret.

So conscience satisfied, I went ahead and posted the recipe for flaky butter buns on my other blog.

Rather than use Lepard’s own words, which would have infringed copyright, I wrote it in my own style, which tends to skip over steps — you either need to have some baking knowledge or take the hint and buy the book. I also raved (as I have done before) about the book itself, including an Amazon link so that readers could go ahead and buy it for themselves.

I felt in doing so that I behaved appropriately both legally and morally, and thought no more about it.

Several days later, I got an email notifying my of a comment on the post (this rarely happens). As it turns out, this comment was from a member of Lepard’s team accusing me of infringing his copyright.

Now as I’ve said, I don’t believe that I did infringe copyright (if you’re a lawyer, I’d love to hear a legal opinion on this), but since I respect Lepard as a professional and small businessman I chose to respect his wishes (or at least those of his employee) and remove the recipe anyway.

The point

In the end this isn’t a question about the law. It’s about whether sharing (and letting other people share) your stuff is a good idea or not.

Even the most cursory Google search for recipe titles suggests that, should I want to, I could recreate the entire collection for free. But one of the reasons I like this book is that it’s more than a collection of recipes. It’s a well crafted book about bread. In addition to the recipes it contains both photographs (by the author) and descriptions of encounters with bakers around the world.

Yes, you can get most if not all of those recipes for free online and not have to pay a penny, but anyone who’s going to do that was never going to buy the book in the first place. In fact you could get the whole experience of the book for free, just by going down to your local library.

On the other hand, I like to think that a few of my friends might have been motivated to buy their own copy of the book on the basis of my recommendation — word of mouth being the best form of advertising and free to boot.

So now there is no recipe, and no endorsement, and no link to buy the book on Amazon. I’ll probably think twice before recommending the book in the future (wait, didn’t I just do that again three paragraphs ago?).

I’d be kidding myself if I thought this will make the slightest difference to the book’s sales, but you have to wonder: if you have a book to sell, is it worth paying someone to spend time trawling the internet (which is a pretty big place) just to ensure your book is the only place the contents can be found?

People will still send recipes by email, or photocopy them, or pass them on by word of mouth. They will clip them out of the paper, note them down in notebooks and then post the clippings to loved ones.

This has always happened and always will, and though some instances are covered by copyright law, it’s completely unenforcible in such cases.

The internet makes this sharing more visible, but it presents an opportunity too. The classic example is YouTube: increasingly rights owners are taking the option to place ads around potentially infringing videos rather than blindly demand takedowns.

By the way, Martin Weller’s made his whole book, The Digital Scholar available online for free, and some mugs (me included) still seem to be paying for it. Perhaps we’re all just idiots.

All I really want to say is this: if you have a book to sell (or any other creative work), consider carefully the pros and cons of permitting parts to be shared freely.

Policing takes time and time is money, and even if the pros and cons balance out all you’re doing is spending that money to achieve zero result. Perhaps that time would be better spent engaging with your readers in positive ways.

Comments

Open Licensing #ioe12

A copyright will protect you from PIRATES This blog post is part of my contribution to the open online course Introduction to Openness in Education.

At the heart of the various forms of “open” lies the concept of intellectual property: who owns it, who can use it and for what.

A physical object, such as the computer I’m writing this blog post on, is in one place at a time, and its ownership is pretty clear cut: I paid for it and it’s in my house, and if you took it without my permission we’d call that theft.

Things get trickier when you start talking about creative works. If I write a piece of music and you make a copy, I still have the piece of music, but so do you. I can take a photograph of a painting by Degas, and it stays hanging in the gallery, but in some sense I have a copy that I can enjoy independently of the original work.

If this situation goes unchecked, then there’s not a lot of incentive to become an artist, or a composer, or a writer. Even if you charge for your work there’s nothing to stop me buying one copy and then selling hundreds, for which you would see no profit whatsoever.

Under most modern legal systems, the concept of copyright exists to right this imbalance. It does this by allowing the creator of a work the opportunity to exploit that work in whatever way they see fit, effectively creating a monopoly.

As the creator of a work, it’s still possible to grant certain rights to third parties, and this is done by the granting of licenses. This is the mechanism which allows you to “sell” rights to a work in exchange for money or some other consideration.

Fair use/fair dealing

If you were to film an interview in the high street of your town, you might think that it would be difficult to infringe copyright in any way. If you’re not infringing copyright, you don’t need to pay anyone for a license. Yet if, say, a TV set in the background was showing reruns of The Simpsons, then you could well be in from a visit from lawyers representing the Fox Broadcasting Company.

Some jurisdictions include a concept of “fair use” (or fair dealing in the UK), which permits such incidental reuses under a specific set of circumstances. This can make documentary-making, for example, much easier.

However, many organisations (Fox being a common example) are quite happy to threaten legal action and demand that you pay tens or hundreds of thousands of pounds(/dollars/euros/etc.) for a license, even if you may in fact be covered by fair use rules. They are able to do this because most people are unaware of their legal rights, or even if they are do not have the money to fight the ensuing lawsuit.

Even if the law gives you a fair use right to use some work or other, other organisations to which you might sell your own work may not be so forgiving. Because of the litigation culture surrounding copyright, a lot of organisations take a very paranoid approach and insist on rights being cleared and licenses purchased even if they’re not strictly necessary.

Orphaned works

The situation becomes worse when the holder of the rights that must be cleared cannot be found. This usually happens when no contact details can be found for the creator of a work, or when those that can be found are out of date. In many cases, it’s impossible even to know whether the rights holder is still alive, and works like this are referred to as “orphaned works”.

In the early days of copyright this would not have been a problem: for copyright to exist it was necessary to the creator to explicitly assert their rights, and to renew them periodically.

However it is now the case in the US and the UK that copyright automatically exists for the lifetime of the creator and for 70 years after their death. If the creator has passed away, their estate still owns the copyright, but may be impossible to trace until they discover the breach.

For this reason, it is almost impossible to safely use orphaned works — if you do, you do so at your own risk.

Open licensing

As you can see, copyright creates incentives to create, but the way it’s currently implemented can also have a chilling effect on certain types of creation, especially those that involve mashing up existing content.

There’s not a lot most of us can do about the depredations of Fox and their ilk, other than lobbying our MPs for a change in the law. But thankfully we can make it easier for others to make use of our own works.

Open licensing gives creators legal tools to relinquish some or all of their rights over a piece of work, in the interests of supporting the creativity of others.

Creative Commons was set up to provide a set of open licenses which creators can use to make it very easy to understand what can and can’t be done with their work.

The key terms which can be applied by the standard Creative Commons licenses are:

  • Attribution: the creator of the work must be acknowledged in any works which incorporate it;
  • Share-alike: the work can only be used if the resulting work is released under the same license;
  • Non-commercial: the work may only be used if the user doesn’t profit financially from doing so;
  • No derivatives: the work may only be redistributed unchanged from its original form.

By combining these terms, it is possible to specify exactly what rights you want to retain on each individual work.

In higher education, we often find ourselves needing a photo or video to illustrate a point in a class or at a conference, or increasingly in a blog post (like this one). Thanks to Creative Commons, finding content to be used legally in this way is as easy as doing a simple web search — no more excuses!

Conclusion

This was intended to be a short blog post, and it’s already longer than I intended! There are a whole raft of other important issues, such as the creeping extension of copyright terms, which I haven’t had space to cover, but hopefully I’ll come back to those some other time.

For now, I hope you’ve got a good idea of why open licensing is necessary and how you can apply it to your own creative works. It’s worth noting that this whole blog is released under a CC license — just scroll to the bottom!

In writing this post, I made heavy use of this open licensing material, which I encourage you to take a look at if you want to learn more.

Photo credit: Ioan Sameli via Flickr

Comments

The Research Technologist part 2: research focus

This is the second part in my exploration of what it means to be a research technologist. If you haven’t already, check out part 1: proactivity and innovation.

Research focus

There’s another area where the role diverges from the typical member of IT staff: a focus on the unique needs of researchers. Network infrastructure, file storage, email are necessary but not sufficient to meet the needs of a modern researcher.

It’s vitally important to pay close attention to the unique needs of researchers and to find appropriate tools and techniques to adapt to serve those needs as well as possible. Research is after all the primary business of a university, alongside teaching.

So we need to find ways to fulfil the needs not just of an institution’s researchers, but of a faculty’s researchers, or a department’s or even a single research group’s.

I actually think that once we start doing this well, there will be a lot more commonality than there appears to be right now. But first we’ve got to get there.

Serving the long tail

The much abused Pareto Principle holds that in many circumstances 80% of your profit comes from 20% of the people/products/whatever. But we’re not looking to profit from our users, we’re looking to serve them. Questions of how to fund that not withstanding, taking this attitude means you’re ignoring of the people!

If there’s one thing we’ve learned from successes like eBay, Amazon and many more, it’s that if we’re smart we can use modern technology to efficiently provide large numbers of niche products and services without drowning in the overhead traditionally associated with trying to do so.

Research attitude

Again, this can be a problem for centralised IT services, because it’s seen as inefficient for them to put significant R&D time into things which may only ever be of use to a minority of their users.

In an academic department, however, the culture is different. Success in research demands innovation, which requires risk. Scientists and engineers, for example, intrinsically understand the need to experiment, and no-one questions the idea that many of those experiments will fail.

Notice that word fail. In this context failure is not a loss, it’s merely a failure to produce the anticipated results. Most researchers still don’t like failure — they’re human after all. But they learn not to get so hung up on it, because if you set up your experiment right (which is really the key to the whole enterprise) then you learn as much or more from failing as you do from succeeding.

And that’s really the point. We want to help our researchers to do their jobs even better than they already do, which means we need to learn, which in turn means we need to make mistakes. There are no lectures and degree courses to teach us about ideas which don’t exist yet.

So to steal one of those trite little phrases life coaches and the like love so much: fail early, fail often, fail smart and learn from it.

Comments