Ideas to tackle online media captioning

By obie

For years I’ve thought about how best to deal with captioning for webcasts and now podcasts in a cost-effective and scalable way. I believe that the time is right to tackle this with a combination of Wikis and a slick transcribing interface modeled on dotSUB, a collaborative subtitling program.

Beyond the critical need to make webcasts and podcasts accessible to hearing-impaired there are additional benefits:

  • Written transcript often easier to peruse and annotate
  • Keyword/phrase search results synchronized to exact moments in the media
  • Same language subtitling helps literacy and/or learning scientific terminology
  • Learning disabled accessibility
  • English as a second language learning
  • Transcripts can be translated to extend the media to a worldwide audience

Challenge #1: Producing the transcript
Hiring a professional transcriber is about $90/hour. The CastingWords service leverages MTurk to bring the price down to about $.42/minute, or $25/hour. This has proven cost-effective and accurate enough for folks like Jon Udell to go for it. I’ve heard some reports about inaccuracies that lead to further clean-up.

I believe that volunteers are inspired to transcribe educational media. This is inspired by the work of OOPS (Opensource Opencourseware Prototyping System). Based in Taiwan, the volunteers translate Creative Commons licensed work primarily from MIT’s OpenCourseWare in order to bring the knowledge to Taiwan and to learn English.

How it works, from their site:

As a volunteer-based project, we ask you to adopt the file(s) you wish to transcribe… Once you have located the lecture(s) you wish to transcribe, simply click on the “Edit” link on the top right-hand side of the page… Once you have finished transcribing, come back to this site and copy and paste the text into a placeholder that will have been setup for you.

As you can see from the list of courses, however, it may take awhile for all of the courses to be adopted and finalized. OOPS also does not seem to use a Wiki interface for their work. At the end of the day, CastingWords is probably the most direct route but I’m still fascinated by the Wiki.

I’ve looked into Wikia but am amazed to not find one for transcribing (and I don’t want to kick it off!). Please let me know if one exists. Perhaps we should simply set up our own private wiki for this. Ideas welcome.

Challenge #2: Synchronizing the media
Transcript in hand, the next step is to produce timecodes for captioning. At Berkeley we love Automatic Sync Technologies for this. Their software automatically takes a transcript and the media (an MP3 of the video is preferred) and quickly produces timecoded transcripts in a variety of formats. This still costs  roughly $60 per hour of content, though less if hours are bought in bulk.

Again, I believe that with the right interface volunteers can chunk the transcripts, too.

Transcribing interface
Ideally, we would have a UI layered on top for people to transcribe while viewing or listening to the media. The UI would have quickkeys for controlling the playback (start, stop, pause). The input fields would be pre-chunked to facilitate the timecoding.

This UI was inspired by dotSUB that I first saw demoed during Andrew Baron’s keynote at the Podcast Expo. dotSUB is still in beta, and their focus is collaborative translation of existing timecoded transcripts

According to dotSub’s film submission page, they require 1) the video to be uploaded; 2) the timecoded transcript.

Regarding #1, frankly I’m not keen on sending my videos to yet another 3rd-party. Furthermore, there is a Filmmaker Agreement which presents an instant hurdle.

Many of our videos are already up on Google Video, as are those of many others including YouTube, Blip.tv, etc. All of these services provide HTML snippets to embed videos into a webpage. dotSUB’s business model may include making the subtitled videos available as hi-res and/or portable distribution (DVD, etc.). If dotSUB accepted 3rd-party embeds, they’d have a lot more people using their system. I wonder if there is a technical issue with syncing up timecodes with videos being served from an external source.

Regarding #2, I briefly chatted with them and there is a possibility for opening up the interface for pure transcribing.

Exporting timecoded transcripts
Finally, I would love to be able to export the timecoded transcripts. I could then submit them alongside my Google Videos for close captioning, an important feature that they recently announced. iTunes U can now search upon keywords, but I don’t believe the results bring you to those moments in the media. Of course we’d like to add these to our local webcast.berkeley site where we do have caption search implemented.

Summary
Adopt-a-podcast feature + Wiki for collaborative transcribing and finalization + dotSub-like interface for a useful transcribing interface that also produces timecodes. It would be an interesting experiment, with the first step to set up a Wiki.

17 Responses to “Ideas to tackle online media captioning”

  1. edutechblog.com » Ideas to tackle online media captioning Says:

    [...] Obadiah writes, “For years I’ve thought about how best to deal with captioning for webcasts and now podcasts in a cost-effective and scalable way. I believe that the time is right to tackle this with a combination of…” Read more. By admin Feedbacks on this entry via RSS 2.0 Please leave a Comment or discuss via Trackback! Comments Please Leave a Comment! [...]

  2. Thor Says:

    If you have a look at the dotSUB staging server (http://stage.dotsub.com), I think you’ll see most of what you’re looking for in terms of synchronizing and transcribing. We also provide a complete API for getting subtitles and time-stamps back out via an API. All work in progress, but we’re getting there.

  3. Cole Says:

    This is a huge issue. We are just now getting our heads around all of it. I think it is worth a team of people across multiple Universities attacking it. Would it pay to create such an opportunity?

  4. obie Says:

    Thor – This is great! Thanks.

    Looking at the quickkey controls and thinking about how timecode gets sync’d up, does this preclude the embedding of videos from a 3rd-party like GooTube or Blip.tv?

    For anyone wishing to try the pure transcribing interface, 1) Register/login; 2) Click the film “1st Avenue Machine” which so far has no captions; 3) Click “Transcribe Film” under the Advanced Tools box in the right-hand column.

  5. obie Says:

    Cole – It could definitely pay, especially as a multi-campus collaboration. My first instinct is to create a wiki and an “adopt-a-podcast” program. Just as a social experiment. My sense from talking to MIT about OOPS (gotta love that acronym for this) is that it takes toooo long. But with good visibility and interface this could be quite interesting. Let’s talk.

  6. Michael Says:

    Obie

    We are in discussions an AI based engine that allows audio podcasts (with or without video) to be converted into text and searched with 80 – 95% accuracy. The dotSUB browser based transcription tool can allow final editing and fine tuning to whatever accuracy level the person doing it wishes to devote the time.

    That transcript is automatically converted into our subtitling/translation format ready to go.

    I think we have in place much of what you are looking for – let’s get started on a trial program.

    Michael

  7. Thor Says:

    > Looking at the quickkey controls and thinking about
    > how timecode gets sync’d up, does this preclude the
    > embedding of videos from a 3rd-party like GooTube
    > or Blip.tv?

    Nope, not at all. The underlying video file can be anywhere (well any http:// url). We’re working on adding to the “Post A Film” function so that you can either upload a file or paste in a URL (with some automagic handling of YouTube/Google url’s).

  8. Matt Pasiewicz Says:

    CastingWords might be another option to explore …
    http://castingwords.com/

    It doesn’t provide captioning, but would provide an transcript of a recording … right now, they only accept audio.

    ‘Sorry we didn’t catch up in Dallas ….

    Matt

  9. MattP Says:

    Another post of interest here.

  10. Media Movers, Inc. Says:

    We specialize in foreign language Dubbing & Subtitling.It was interesting to read your blog/ideas which can be definetly implemented with a team effort.
    Do keep us posted on this & if we can be a part of the same at any stage.

    thanks,
    Lawrence Vishnu
    CEO
    Media Movers, Inc.

  11. Kevin Says:

    If we could get our act together (as eductional institutions) to tackle this collaboratively I think this is something that Mellon would consider funding.

  12. More about the media captioning idea « Obadiah Tarzan Greenberg’s Weblog Says:

    [...] When I last threw out some ideas about online media captioning there was a lot of interest.  Must be onto something. [...]

  13. wholeshebang Says:

    This is a wonderful idea. I am not in education but I can see the applications for transcribing coursework as well as open-liscenced media and private media to share with people of other languages. I hope this goes well and that is spreads to many applications.

  14. the whole shebang Says:

    Community-based media captioning and translation

    This is the next big benefit to society of the Internet Age – Ideas to tackle online media captioning (and translation). There are efforts about to collaborate on an application and method to allow not only the closed-captioning/text-captioning of audi…

  15. Community-based media captioning and translation « the whole shebang Says:

    [...] This is the next big benefit to society of the Internet Age – Ideas to tackle online media captioning (and translation). There are efforts about to collaborate on an application and method to allow not only the closed-captioning/text-captioning of audio and video by the community but to do so in multiple languages. The effort is mainly aimed at education, as current captioning activities are disjointed and poor countries try to access a wealth of higher-education knowledge. [...]

  16. ninjageek Says:

    I am a ‘turker’, a transcriptor who does transcription for castingwords.com.

    Basically, the MP3 file that you submit to them gets chopped up into five or eight minute segments. These are then placed online for anyone to download as a mturk.com hit. The payment for each segment varies according to how quickly the client needs the transcription, how difficult the audio is to listen to, and how accurately the resulting transcription is). Transcribing five minutes of audio can take one to two hours depending on how many people are speaking. For than amount of work, a ‘turker’ will get paid around $2 to $3.

    Once returned, the transcription is graded – a ‘tuker’ gets paid about $0.50 to do this. If further cleanup is required, another ‘turker’ gets to improve the quality – a payment of $1.00 to $2.00 . Finally an editor is required to splice the different segments together, make sure the speaker names are accurate, no spelling mistakes, no misinterpretations of words (eg. innuit vs. innate). This isn’t easy when the speakers have had four or five glasses of wine during an interview or restaurant get-together. This pays around $2 to $3

    ‘turkers’ also get graded according to how well the quality of their work is. Poor grades will mean that they will be unable to work on the high paying expedited transcription work.

    For me, it’s interesting – everything from film director interviews, to lawyer conferences, conference calls and career interviews. I do the work more for the shared knowledge than the money.

  17. sandrar Says:

    Hi! I was surfing and found your blog post… nice! I love your blog. :) Cheers! Sandra. R.

Leave a Reply