A great advancement for Linux speech recognition.

This is great news! Peter Grasch has announced a new speech recognition project called simon, on the KDE
accessibility mailing list,(1) and it wil support the recently announced dbus port of at-spi, allowing both QT/KDE
and GTK/GNOME to use the same accessibility infrastructure, which means accessibility to both KDE and GNOME desktops.
Rock on!!

Links
(1) http://lists.kde.org/?l=kde-accessibility&m=120070568122006&w=2

Submitted by Anonymous (not verified) on Sun, 01/20/2008 - 01:34.

Most interesting project that Julius. It seems quite competent. Shame that it is worthless for people that do not want to chatter with their computers in English or Japanese. No one has taken on teaching the basics of other languages to it... That's a big shame and quite common.

I wrote recently a thesis about ubicomp and one of the things I touched was speech recognition. In the end of the day nearly no one wants to chat with their computers. The most important reason is that in (semi-)public places a lot of the content are conceived more private. An other reason is that forming speech is a major task for the brain and blocks many other functions. Typing and other forms of communication are lighter and feel more convenient.

Amusingly, this actually means that for instance the EU forcing car drivers to use headsets while talking on phone only removes part of the distraction for the driving. To really improve safety the using of cell phones should be forbidden while driving and there seems to be quite convincing research supporting it already.

Anyways... It's nice to have high quality speech recognition around, but it's mostly just an accessibility feature and not anything most of the people will want to use constantly.

Submitted by Peter Grasch (not verified) on Sun, 01/20/2008 - 05:54.

Hi!

Wow that is a first - seeing something I announced spread over the internet :D.

Glad you are interested!

Sadly I have to say that - as stated in my mail - AT-SPI integration is not _yet_ working. But we are working on it! :)

-- bedahr

Submitted by Nathan Dbb (not verified) on Thu, 01/24/2008 - 07:39.

While this is good news, we are putting the cart before the horse. In this case the horse is the engine/library and the cart is the interaction with the applications and frameworks.

I think that speech recognition on linux will be in demand next year because of Mobile Internet Devices. These small computers will have the processing power of an outdated laptop, but most will have no keyboards (or super-small keyboards).

We still need a working engine and speech sample libraries for the engine to run over. Sphinx is working, as far as I can tell, but is not ready for MID use. While the Java interpreted speech recognition is slow, the compiled commercial recognition products run faster then real time on a 550 MHz PIII.

Julius/Julian (written in C) may be faster then Sphinx (Java), but I could not find numbers. Also, the different Julius sites list different licenses (GNU General Public License (GPL) / Revised BSD / Other ) depending on where you look.

Right now, we need people to be recording samples, as any engine will need a large selection of samples.
https://wiki.ubuntu.com/SpeechRecognition/SpeechMaker

That is just a page about getting more submissions to this project:
http://www.voxforge.org/

Submitted by The_Engineer (not verified) on Thu, 06/18/2009 - 20:56.

As mentioned, speech recognition probably has ,limited or just gimmicky value in normal PC use except where it can be used to overcome a practical problem such as manual disability or where a keyboard is impractical.

However there might well be a case in using such software to command robots. At present robots are at very early stage of development and are arguably still rather useless to most people. However such work is one more stage of getting real robots working!

Therefore the speech systems could well eventually be used in ways not originally conceived.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.

More information about formatting options