Source Code – https://github.com/ehelin/SpeechRecognition
I am sorry I have not been around for a while, but my new job is keeping me busy! I love my team and there are lots of interesting projects. While I have been busy though, I have been working on my next blog post. A couple of months ago, I started working on what I had hoped would be a speech recognition module I could use with The Globe In My Bucket Application (TGIMBA). Didn’t turn out that way.
First, it turned out to be far more complicated than I had thought (but that is ok…that is why I am doing this…to learn :)). I started with a Windows 10 Store Application because I hadn’t worked on Windows 10 yet. I also assumed grabbing audio from the existing microphone on my laptop would be easy. I was wrong. Part of what made this complicated is the Store Application Application Programmatic Interfaces (API) are very different from the .NET 4.6.x API’s I use at work. Plus, I had been playing with Amazon Lambda and getting the TGIMBA Android mobile application published prior to this (i.e. my head was elsewhere). Secondly, my work requirements are changing and I need to start focusing on NodeJS a lot more than I had planned at this point. So, I have run out of time for this project.
So, this blog post will be about getting audio recorded in a Windows 10 Store Application in a loop, breaking it into separate files by words and then played back. For example, this application records a phase like ‘I am here’ as one audio file, splits it into three separate audio files for ‘I’ + ‘am’ + ‘here’ and then plays all of the files back in sequence as well as graphs a rough wave plot (based heavily on reference # 7).
I based a lot of the windows 10 code on what other posters had done (see references (notably #1)). It is not perfect, but if I had time now, I was going to take this and train a Perceptron (class already exists) to recognize certain words as commands. Then, when the system ‘heard’ these words, it would execute the command associated with it. I have had promising results in other Perceptron projects I have worked on in the past.
In working through this, I encountered a number of issues. Specifically:
- Threads in Store Applications don’t really exist any more. They have been replaced with the method decorator ‘async’, Task<x> return type and ‘await’ keyword which will pause the code until the new thread that is initiated returns. Most of this application is synchronous and depends on things executing in series. However, many of these functions do not return data and if nothing is returned, an error will occur when you compile. It says that you cannot await on a void return type. There seems to be no easy way around this since everything seems to be asynchronous. In theory, asynchronous programming this is much better than blocking the GUI thread, but in my case, was a little frustrating. My way around it was to return a Boolean from everything that launched as a asynchronous thread (noted by //HACK!).
- I still get periodic exceptions that cannot seem to be traced. The specific error talks about an unhand led exception with very little to go on. Most of the internet resources that talk about this error seem to point to it as being a general error masking another issue. I will be able to run the program fine for a while and then it will happen periodically. I am guessing that I may be-causing an issue with my use of async/await.
- One issue I knew I would be complicated before I started would be parsing the audio file. I had done an audio recording/playing application in Java in the past. Prior to starting, I dug that old code out and started reviewing and I then remembered that .wav files can be complicated (byte size, sampling rate, etc). I had hoped that the windows 10 store application API’s would provide audio management classes. I was a little dismayed when they didn’t jump out during my research on what was available. I did find a third party set of impressive audio related classes called ‘NAudio’ (see reference #12). I ended up using
‘AudioFileReader’ and ‘WaveFileWriter’. ‘AudioFileReader’ provides meta data about an audio file that I then fed into ‘WaveFileWriter’ which created a file. ‘NAudio’ does not appear to be compatible with Store Applications. To use it, I had to create a separate class library.
When launched, the software will cycle through the recording, file creation and playing steps
When and if I pick this project up again, I intend to fill in the ‘Perceptron’ class with code that train it to recognize ‘fred’ and then indicate when it ‘hears’ that word during a recording loop. If that works, I want to take commands like ‘Add’ and ‘Delete’ that can be used with TGIMBA. This is my hope 🙂