Audio Recognition

I want to input commands to my computer using speech recognition. This would let me rest my arms and hopefully enable me to heal better. I considered speech recognition programs: I tried out CMU Sphinx but the demo program was not any good. There is also JULIUS but I didn't try it out. I had a go with googles voice recognition API and with 'playing go' in mind I said "K6" and it thought I said 'kisses' and then 'gay sex'.

Reading the AIMA book on Speech recognition after all this really made me wonder why they bothered writing that book. Beyond some basic stuff which you could pick up elsewhere I'm very skeptical that there's much to be learned from it.

So I'm leaving aside general speech recognition: I think that it isn't ready yet, but if you just have a small set of sounds (speech or not) surely a computer could distinguish between them with nearly perfect accurary.


STFT

I think that a very reasonable way to solve the problem of classifying a short sound sample into one of several buckets (given lots of test recordings) is to first perform a Short Time Fourier Transform to transform raw audio into data that clearly shows off its frequency components over time. Then perhaps a neural-network or some other tool that can be trained on a data set would be able to perform the classification successfully.


The Short Time Fourier Transform stuff is very interesting. There are a few important points to know about it. First of all digital sound recordings are done by sampling at some frequency, the Nyquist frequency is an important concept to be aware of. <http://mathworld.wolfram.com/NyquistFrequency.html>. There is a great picture demonstrating aliasing from subsampling on <https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem>

The sampling rate must be double the highest fourier coefficient you want to be able to observe. I tried to measure the frequency of my voice using the spectrogram in audacity but I got a completely wrong answer.


There is an uncertainty principle in STFT (as you would expect there to be on anything involving waves) <https://en.wikipedia.org/wiki/Short-time_Fourier_transform#Resolution_issues> <https://en.wikipedia.org/wiki/Uncertainty_principle#Signal_processing>. The relationship is between time and frequency, you can't tell exactly when a sound starts and ends if you are very precise about its frequency and vice versa. It will be important to tune the program to get a good tradeoff on this.


An important point about sounds is the volume we perceive sounds at different frequencies is not linear, it is logarithmic: <https://en.wikipedia.org/wiki/Time%E2%80%93frequency_analysis_for_music_signals>. Applying this logarithmic scale to the data should then improve its ability to differentiate sounds the same way we do (More precisely, to be able to recognize sounds we would consider equal as the same).


Saw a great video of FFT here <https://www.youtube.com/watch?v=WKGsNnFIzcc>! And its running on a 20MHz AVR, he kindly posted all his code for it online. It's in assembly.


Undergrad Commutative Algebra

I've started reading this book: Reid - Undergrad Commutative Algebra.

I got really stuck on finding a 'nice' proof of a specific theorem in my algebraic number theory work and it was getting a bit silly so I've decided to pull myself away from that for a bit. This book actually culminates in that result ("the" DVR theorem) and it is very very short! So I think that I might try to do a minimum of 1 chapter a week and get through it. If I manage that I will have reason to be proud.

Today I worked on the exercises in chapter 0. I did up to ex 0.6