If I’m going to train an algorithm to read my weird & awful writing, I’m going to need a decent-sized training set to work with. And since one of the main things I want to do with it is to blog “by hand” it makes sense to focus on that type of material for training. In other words, I need to write out a bunch of blog posts on paper, scan them and transcribe them as ground truth. The added bonus of this plan is that after transcribing, I also end up with some digital text I can use as an actual post — multitasking!
So, by the time you read this, I will have already run it through a manual transcription process using Transkribus to add it to my training set, and copy-pasted it into emacs for posting. This is a fun little project because it means I can:
- Write more by hand with one of my several nice fountain pens, which I enjoy
- Learn more about the operational process some of my colleagues go through when digitising manuscripts
- Learn more about the underlying technology & maths, and how to tune the process
- Produce more lovely content! For you to read! Yay!
- Write in a way that forces me to put off editing until after a first draft is done and focus more on getting the whole of what I want to say down.
That’s it for now — I’ll keep you posted as the project unfolds.
Tee hee! I’m actually just enjoying the process of writing stuff by hand in long-form prose. It’ll be interesting to see how the accuracy turns out and if I need to be more careful about neatness. Will it be better or worse than the big but generic models used by Samsung Notes or OneNote. Maybe I should include some stylus-written text for comparison.