by Angela Guess
Natasha Lomas reports in TechCrunch, “For human lip readers, context is key in deciphering words stripped of the full nuance of their audio cues. But a technology model for lip-reading developed at the University of East Anglia in the UK has been shown to be able to interpret mouthed words with a greater degree of accuracy than human lip readers, thanks to the application of machine learning tech to classify the visual aspect of sounds. And the kicker is the algorithm doesn’t need to know the context of what you’re discussing to be able to identify the words you’re using.”
Lomas goes on, “While the model remains a piece of research at this stage, there are scores of potential applications for technology that could automagically transform visual cues into accurate speech — whether it’s helping people who have audio impairments, or enhancing audio-less security video footage with additional speech data — or even to try to figure out exactly what charged word one footballer spat at another in the heat of a match.”
She continues, “Such a tech could also be applied as a fallback for poor audio quality on a mobile or video call. Or for automating subtitles. Or even perhaps to power a front-facing camera-based mobile ‘voice’ assistant which you wouldn’t actually have to speak to but could just discreetly mouth commands at (how cool would that be?). Safe to say, the list of applications-in-waiting for machine powered lip-reading is as long as the dictionary is deep. So there’s bags of future potential if only researchers can deliver the goods.”
Photo credit: Flickr/ Denise Coronel