Not this Sphinx, naming is really a important fact. Anyway, I tried to play with CMU Sphinx. It, the pocketsphinx, provides Python binding, though no real documentation. There are two modes that you can do recognition, by on-the-fly or by block of data. By default models, on-the-fly gives useless results, I don’t know if it can do better after training a bit. However, I have no idea how to do that, too. By decoding a block a data gives acceptable results.

I was actually caught by gnome-voice-control. It does work, but it also crashes. I checked out the repository (I couldn’t compile version 0.3) and installed sphinxbase 0.4.1 and pocketsphinx 0.5.1.

Since it crashes every time, I wanted to write a simple or similar one using Python. Unfortunately, the result isn’t good, though reduce word bank and do word slice on our own plus decode by block may help to improve the accuracy, but I think that’s much effort to do and I have much knowledge of speech recognition. I stopped here.

I still organized a simple code, which uses pyalsaaudio 0.4 to capture audio. It records till you press Ctrl+C, then do recognition.

You can also try this Python script1, which is a GUI and uses Sphinx’s Gstreamer plugin.

[1]http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/GStreamer is gone.