JavaScript – Machine Learning – Linear Regression – Attempt to fix dataset 3…not so good.

Git Hub Code

After yesterday’s blog post, I pondered the issue of why my implementation of Dr. Ng’s algorithm performed so poorly on the home price dataset.  So, I started debugging line by line and noticed that the B1 value was still descending as it moved through the method that finds lowest B0B1 values.  The method that worked for the first two datasets was to get the largest x and run the cost function for each x,y value starting with 0,0 increasing with an increment of .25 up to the maximum x value.  In this case, that maximum x value is 4478.

So, I increased the x value to 100000 and then B0 started getting smaller.  So, my first thought was that the range was not big enough.  However, much larger numbers produce out of memory issues.  So, for comparison, I ran my implementation of Dr. Brownlee’s algorithm and got these values for B0 and B1:

B0:  132895.7055152212

B1:  97.0552698578078

The lowest B0 and B1 number returned by using the dataset largest x value of 4478 is:

B0: 35801763334.11702

B1: 4478

So, I increased the range of possible values to be from -10,000 to 10,000,000 and this produced a lower B0 and B1 values, but far from the one Dr. Brownlee’s example produced.

B0: 23963439835.81915,

B1: 10059.5

So, assuming that I am in the wrong areas of the negative/positive spectrum, I made range from -10,000,000 to 10060 since the last B1 was 10059.5.  This effort produced the same result.  Clearly, I am not guessing correctly at the values for B0.

So, I went back over Dr. Ng’s course material and it is still not apparent to me what I am doing wrong.  Unfortunately I am either using this algorithm incorrectly for the house data set or it is not meant for the house data set.  He also teaches a variation of this algorithm called gradient descent.  In another blog post, I will deal with it.  For now, I will assume I missed something in its implementation.  My suspicion is that I am not correctly trying to guess B0 and B1 in the larger dataset.

I did however add the cost function visual.  The specific runs are as follows:

  • Data Set One Plot w/prediction line

Screen Shot 2017-11-12 at 7.12.35 PM

  • Data Set One Cost Function

Screen Shot 2017-11-12 at 7.12.54 PM

  • Data Set Two Plot w/prediction line

Screen Shot 2017-11-12 at 7.13.19 PM

  • Data Set Two Cost Function

Screen Shot 2017-11-12 at 7.13.35 PM

  • Data Set Three Plot w/prediction line

Screen Shot 2017-11-12 at 7.13.54 PM

  • Data Set Three Cost Function

Screen Shot 2017-11-12 at 7.14.11 PM

Stay tuned!



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s