The latest tech buzz to get me excited is the news that Google is developing mobile speech-to-speech translation: a real life babelfish! The software would provide real-time, speech-to-speech translation to a smartphone, and will hopefully be ready in two years. The reason I particularly love this cross-over between sci-fi and reality is that I spent nearly a year at uni developing a very basic and research driven system along these lines. The problem I experienced, as I imagine Google did too is that two of the three main components of such a system: speech recognition and natural (human) language translation, have their own significant and complex challenges.
Speech recognition technology is terribly difficult when you take into account the number of varied accents each language is spoken in. As we sometimes have difficulty deciphering what one another are saying it’s easy to imagine the trouble a machine would have doing this! To be successful in achieving this with computers one has to take a step back from the recognised ways in which we develop software systems.
It’s this step that gives way to the most exciting element of this system for me: the development of learning technology. Both the speech recognition and translation components have to learn from previous uses to make them useful and accurate, which is why I’m so optimistic that Google will be successful when they bring out their software in a couple of years.
Their speech recognition component is already available in the nexux one smartphone in the form of sms and email dictation and web search queries. The system records a voice input and sends it to Google’s speech recognition servers to transcribe. Part of the reason this component can be so successful is that the nature of mobile phones makes them personal to almost exclusively one user. The phone should therefore be able to improve its prediction of what you’re saying according the way in which you’ve spoken in the past.
Google also have an advantage with their translation software as it can combine the more basic elements of such systems, like algorithms for grammatical construction with its database of translated websites and documents. As with the recognition element the more information that can be collated for translation, the better the quality will be.
It’s this abundance of information that’s available to Google that makes me think they may just achieve this fantastic goal. Although a number of experts believe this to be an impossible task with current technology I’m sure that Google are well placed to overcome the impossibilities, besides isn’t it always the impossible ideas that drive technology forward?Tweet