In my last AI blog post, I started to explore KMeans clustering as an algorithm for machine learning. As it seems to work (see reference #2), I am still wondering how to structure my data. This seems to be the ‘key’ to any effective machine learning algorithm.

In that post, I replicated (mostly) what reference #3 does. This example shares the premise that I had been trying to use with the basic perceptron algorithm. Namely, separate my data it two groups – the ‘recognized’ group and the not ‘recognized’ group. Initially, I thought kmeans clustering was similar to this since the first example (reference #3) used two clusters. Reading further into the next example from the same blog series (reference #6), it became apparent that this was not the case. Their pattern recognition example (reference #2) uses 4 clusters (one for each corner of the 2d X/Y chart). So, I am still left with figuring out how can I use kmeans clustering in sound wave pattern recognition. How do you decide what number of clusters to have? Reference #2 picked 4 to represent each order of the 2 d graph. Reference #3 (one I did my last blog post on) picked 2 (arbitrary?). Centoids (aka clusters) are groupings you want your data to fall in so it can be identified. For a wave form, do I want to use 2 or 4?

Additionally, reviewing a list of machine learning algorithms (reference #6), I also learned the kmeans clustering seems to be intended for unsupervised learning (aka finding patterns in unknown data sets). Since I am trying to get my speech recognition program to recognize when a person says a word that the computer has been trained on, I am now thinking the supervised machine learning may be more my cup of tea for this project. So, I am going to re-group and pick another algorithm and maybe return to kmeans when I have a few more algorithms under my belt. To start this journey, I reviewed this list (reference #4).

One possibility is a Naive Bayes Classification. It uses probability to decide what an unrecognized input is. A super excellent examples is in reference #8. Another possibility is a neural net (reference #10). I am a little leery of a neural net since its smaller cousin the perceptron has not performed well for me on anything other that 01.11.00, & 10 examples.

I will read up and report back once I have decided which one to pursue.

Stay tuned!

References

1. https://www.datascience.com/blog/introduction-to-k-means-clustering-algorithm-learn-data-science-tutorials

2. https://mnemstudio.org/clustering-k-means-example-3.htm

3. https://mnemstudio.org/clustering-k-means-example-1.htm

4. http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html

5. http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html/2

6. https://mnemstudio.org/clustering-k-means-example-2.htm

7. https://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-bayes-classification

8. http://blog.aylien.com/naive-bayes-for-dummies-a-simple-explanation/

9. http://matwbn.icm.edu.pl/ksiazki/amc/amc15/amc15211.pdf

10. https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a

## One thought on “JavaScript – Kmeans is for unsupervised machine learning and I need supervised machine learning…doh!”