Crowdsourcing to closed-caption videos with Amara

pictograms used by the United States National ...
pictograms used by the United States National Park Service. A package containing all NPS symbols is available at the Open Icon Library (Photo credit: Wikipedia)

Yesterday’s Hangout On Air on American Sign Language (ASL) and Deaf culture is now a video on YouTube, and that video is being crowdsourced for subtitles at Amara. If you’ve never heard of Amara (I hadn’t until yesterday), it is a website dedicated to crowdsourcing the captioning of videos. How it works is that anyone can embed a video on Amara, and anyone can caption it on a volunteer basis. Captioning is very time-consuming. It involves both transcription, line division, and time coding. The average rate of speech is somewhere around 5 syllables per second (Kendall, 2009, p. 145). You have to listen to a few seconds of a video, pause the video, type what you just heard, and repeat the process. The transcription has to be time-coded; i.e., the words have to be matched with the time they appear on the video, usually at about 32 characters per line[1], so that’s time-consuming too. For these reasons, when it comes to help with closed-captioning, the more the merrier, especially because so many people make videos pro bono. This video is over 48 minutes, and of course it’s pro bono. If you would like to closed-caption a few lines of the video on Amara, please do. A little work by a lot of people will get the job done.

Footnotes

1. I don’t like to repeat statistics without sources, but 32 and 35 characters appeared often on webpages. Screen Subtitling’s white paper “Closed caption subtitling” [PDF] said “the number of characters per line or row is a set limitation” (Screen, 2008, p. 2) with no specification of the limit or reference to the authority. I searched the Internet for the “set limitation” on characters per line, and I found the same numbers repeated in different places with no traceable references. AutoCaption.com’s “Closed captioning defined” page said, “the features of traditional captioning are: … 32 characters per line” with no citation. Welstech wiki said the Department of Education required 35 characters per line, yet when I searched the US Department of Education website, I could find no such specification. CPC.com’s Closed Captioning FAQ answered the question, “What features are supported by CEA-608 closed captions for standard definition?” thus: ” […] A caption block can have up to 4 lines and up to 32 characters per line, although for accessibility reasons, it is recommended not to exceed 2 lines and 26 characters per line […].” I searched “CEA-608” to find the source, and I found the Consumer Electronics Association (CEA) CEA-608-E Standard Details page. Unfortunately, the standards are published in a printed book that costs $300, $225 for members. Can anyone quote the source of authority? If so, please leave a comment.

References

Kendall, Tyler S. (2009). Speech rate, pause, and linguistic variation: An examination through the sociolinguistic archive and analysis project (Doctoral dissertation). Retrieved from http://ncslaap.lib.ncsu.edu/kendall/kendall-dissertation-final.pdf.

Screen.  (July 2008). Closed caption subtitling. Retrieved from  http://www.screen.subtitling.com/downloads/Closed%20Caption%20subtitling.pdf


Posted

in

by

Comments

9 responses to “Crowdsourcing to closed-caption videos with Amara”

  1. Christian Vogler Avatar
    Christian Vogler

    You can find the origin of the 32 character limit in the rules for the Telecommunications Act of 1996: http://www.gpo.gov/fdsys/pkg/CFR-2011-title47-vol1/xml/CFR-2011-title47-vol1-sec15-119.xml

    CEA-608 is based on these rules.

    Quote 47 CFR 15.119:

    (d) Screen format. The display area for captioning and text shall fall approximately within the safe caption area as defined in paragraph (n)(12) of this section. This display area will be further divided into 15 character rows of equal height and 32 columns of equal width, to provide accurate placement of text on the screen. Vertically, the display area begins on line 43 and is 195 lines high, ending on line 237 on an interlaced display. All captioning and text shall fall within these established columns and rows. The characters must be displayed clearly separated from the video over which they are placed. In addition, the user must have the capability to select a black background over which the captioned letters are displaced.

    Like

    1. Daniel Greene Avatar

      Thank you, Christian!

      Now that we know the source of the 32-character-per-line rule, I’m still curious: do you think it applies as a “best practice” for people who caption their YouTube videos?

      Like

      1. Christian Vogler Avatar
        Christian Vogler

        Oh boy. You like opening cans of worms, don’t you? Short answer: I think that there is an optimum line length for captions, but I don’t know what it is, and it might not necessarily be 32 characters per line. I am not sure anyone has quantified this in hard research. As far as I know, the 32 character per line limit for CEA-608 is more of a function of the limited TV screen resolution and sharpness than anything based on cognitive factors. I may be wrong, though.

        Slightly longer answer: Research for print seems to indicate an optimum line length for reading, which depending on what reference you look up varies between 40 and 75 characters per line. However, these numbers may not apply to reading on a screen. Another factor to consider is that the brain has to do more work trying to focus on both a moving picture *and* the caption text than when it is reading plain printed text, so this might affect the optimum line length.

        There is a new captioning standard in development called WebVTT, and the folks there seem to think that the majority of manual line breaks in captions are an anachronism to begin with. They think that line breaks should normally be dependent on font size (which must be adjustable by the viewer) and screen characteristics, rather than something determined by the author, except when a line break comes at a logical place (such as the dialogue switching to another person).

        Like

        1. Daniel Greene Avatar

          Again, thank you so much for sharing all this information!

          Like

  2. Claude Almansi Avatar

    Hi Daniel,
    At one point I was on the war path against this axiomatic length limit – see the French captions of Fondation Agalma – Entretien avec Claudia Mejìa Quijano. Well, there I was also a bit irked by their having 2 our 3 camera operators producing something that looked like TV 40 years ago, when all 3 debaters, being university teachers, were talking in written style, with no body language to speak of.
    So I kept thinking as I was captioning, “why the heck don’t they just do audio recordings themselves and give a plain transcript too that will stay put for people to read at leisure, instead of having their snake-like sentences split into bouncing captions?”. Hence the choice to have one caption per sentence, except when the caption would have covered their faces.
    I’ve relented a bit since, thanks to the deaf participants in the CCAC mailing list. At first I was baffled by their insistence on having captions for signed videos that were actually interpretations of written “texts that stay put” etc. Then I understood that it was not just a matter of accessing the content, but of participating in a flowing-in-time communication. And that gets rendered by the bouncy reading of shortish captions.
    Nevertheless, I still think that the 32 characters rule may make sense on TV, where viewers have no control on the flow, but is too rigid for digital – and particularly for online – videos which they can easily stop and rewind if they want/need to reread a caption.

    Like

  3. Natalie Williams (@nataliewms) Avatar

    Ugh, I read this and bugs me that I can’t remember the standard length per line. I did a small workshop for our district 7 years or so ago about captioning standards and such. Question is – where did I put that file since then? Ha ha! We were captioning a bunch of school videos and wanted to make sure we were consistent in our captioning.

    So I pulled up the Described and Captioned Media Program (DCMP). They have a TON of information re: standards, software – you name it. A great starter is their “Caption It Yourself™” section – http://www.dcmp.org/ciy/. The specifics of using 32 characters is mentioned on page – http://www.dcmp.org/captioningkey/text.html – but it doesn’t reference where that standard comes from.

    Bonus – DCMP has a whole library of materials that are captioned – they are mailed out postage and return postage paid. For use by deaf students and teachers with deaf students in their classes. The website has more information.

    PS – thanks for sharing this resource! :o)

    Like

    1. Daniel Greene Avatar

      Thank you for the information, Natalie. Good to know about the resource, and good to know there’s another concordance to the 32 character per line rule. I can only hope this oft’ quoted “fact” stems from a real standard and not just what “they say.” Apparently, when it comes to YouTube, 32 characters per line is not a limitation, though it could be a “rule of thumb” for readability. I changed my CC viewing settings on YouTube to display captions at the smallest font size available. When I did this, I got one line that was 64 characters long, and there was still room for more since the format was HD and the aspect ration 16:9. I took a screen shot:

      CC 64 character line wrap

      And this line wrapped at 47 characters when I increased the font size:

      CC 47 character line wrap

      Like

  4. Claude Almansi Avatar

    Hi Daniel,
    Than a bunch for your interpretation in the Hangout – btw, did you interpret both ways or just into signing? -and for encouraging your readers to collaborate in its captioning on Amara.
    re caption length: I think fixed length is something more of the old media, pre CC, when you had a captive audience who couldn’t switch them off, and it kind of stuck even when TVs moved CC: a few months ago, a Swiss TV journalist told me 35 signs. But if you look
    up CC’d videos on YouTube (it’s one of the criteria for advanced search) you’ll often find longer captions.
    The Amara transcribing box gets redder and redder in the face as a caption gets longer busts, until it busts a gasket and cuts it: but that’s around 100 signs, if I remember correctly. And you can still add to it in the “done” captions list if a couple more words make a more sensate caption. Just don’t overdo it 😀
    Claude

    Like

    1. Daniel Greene Avatar

      Thanks, Claude. I interpreted both from ENG-ASL and ASL-ENG. Interesting to know about the warning method on Amara! I haven’t used it yet myself. (I’m guessing when you say “signs” you mean characters, right?).

      I Google “MovieCaptioner ‘characters per line’” just now, because that’s the software I licensed and use, and one of their webpages lists 32 characters per line for the Sonic Scenarist (SCC) captions standard.

      Like

Comments welcome