JavaScript – Machine Learning – Linear Regression – Another Algorithm with Same DataSets

Git Hub Code

In this blog post (reference #3), I stated that I had been pointed to Andrew Ng’s free Coursera course on Machine Learning.  The first machine learning algorithm it covers is Linear Regression.  Even though Professor Ng is an excellent teacher, I did not immediately understand the subtle nuances of the single variable algorithm.  So, I took a step back and researched other examples on line.  While doing research, I stumbled upon Dr. Jason Brownlee’s math example.  In that same blog post (and two succeeding blog posts (references 4 & 5)), I implemented his algorithm in Javascript and was able to get it to work with two simple datasets and one more advanced house price dataset.

With that under my belt, I went back to Dr Ng’s lecture notes (references 1 & 2) and tried again.  In concept, Dr Brownlee’s and Dr. Ng’s algorithms are very similar for the hypothesis function.  However, they differ in how to get B0 and B1.

Dr. Brownlee’s Algorithm (reference 6)

y = B0 + B1 * x
B1 = sum((xi-mean(x)) * (yi-mean(y))) / sum((xi – mean(x))^2)
B0 = mean(y) – B1 * mean(x)

Dr. Ng’s Algorithm (reference 1 & 2)

y = B0 + B1 * x
B0 = 1/2 m sum((xi-yi)^2)

I will only talk about Dr. Ng’s algorithm here since the previous blog posts cover Dr. Brownlee’s algorithm.  What confused me earlier with Dr. Ng’s algorithm is you must guess B1 value to find B0.  In doing so, you keep the lowest B0 value and use it in the hypothesis function.   Data set 2 is one from Dr. Ng’s lecture, so I did it first.

Screen Shot 2017-11-11 at 9.50.20 PM

To make Dr. Ng’s algorithm work, I put together a couple of functions.  The first is one that returns the largest x for a dataset.

Screen Shot 2017-11-11 at 10.25.11 PM

The second function takes the largest x and returns an array of B1 values to be used in calculating the lowest B0.

Screen Shot 2017-11-11 at 10.25.27 PM

When run, the output shows that the lowest B0 is 0 and its corresponding B1 value is 1.

Screen Shot 2017-11-11 at 9.53.19 PM

Using B0 and B1, predictions can be made.

Screen Shot 2017-11-11 at 9.53.30 PM

The other dataset that this algorithm works with is the one Dr. Brownlee used.

Screen Shot 2017-11-11 at 10.30.21 PM

 

The dataset this algorithm does not work with is the housing price one.

Screen Shot 2017-11-11 at 10.31.04 PM

Conclusions

I find Dr. Brownlee’s Algorithm a lot simpler and easier to use than Dr. Ng’s.  It works with all three datasets.  Dr. Ng’s requires a lot of guess work on how to find B0 based on guessing at values for B1.  At least in this implementation, this doesn’t seem to work for the house price data set.  I am sure it is something I am doing wrong and my next blog post will deal with finding the source of the issue.

Stay tuned!

References

  1. https://www.coursera.org/learn/machine-learning/lecture/N09c6/cost-function-intuition-i
  2. https://www.coursera.org/learn/machine-learning/lecture/db3jS/model-representation
  3. https://erichelin.wordpress.com/2017/10/29/javascript-machine-learning-linear-regression/
  4. https://erichelin.wordpress.com/2017/10/29/javascript-machine-learning-linear-regression-w2-new-datasets-how-did-we-do/
  5. https://erichelin.wordpress.com/2017/11/02/javascript-machine-learning-linear-regression-visual-results/
  6. https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s