After yesterday’s blog post, I pondered the issue of why my implementation of Dr. Ng’s algorithm performed so poorly on the home price dataset. So, I started debugging line by line and noticed that the B1 value was still descending as it moved through the method that finds lowest B0B1 values. The method that worked for the first two datasets was to get the largest x and run the cost function for each x,y value starting with 0,0 increasing with an increment of .25 up to the maximum x value. In this case, that maximum x value is 4478.
So, I increased the x value to 100000 and then B0 started getting smaller. So, my first thought was that the range was not big enough. However, much larger numbers produce out of memory issues. So, for comparison, I ran my implementation of Dr. Brownlee’s algorithm and got these values for B0 and B1:
The lowest B0 and B1 number returned by using the dataset largest x value of 4478 is:
So, I increased the range of possible values to be from -10,000 to 10,000,000 and this produced a lower B0 and B1 values, but far from the one Dr. Brownlee’s example produced.
So, assuming that I am in the wrong areas of the negative/positive spectrum, I made range from -10,000,000 to 10060 since the last B1 was 10059.5. This effort produced the same result. Clearly, I am not guessing correctly at the values for B0.
So, I went back over Dr. Ng’s course material and it is still not apparent to me what I am doing wrong. Unfortunately I am either using this algorithm incorrectly for the house data set or it is not meant for the house data set. He also teaches a variation of this algorithm called gradient descent. In another blog post, I will deal with it. For now, I will assume I missed something in its implementation. My suspicion is that I am not correctly trying to guess B0 and B1 in the larger dataset.
I did however add the cost function visual. The specific runs are as follows:
- Data Set One Plot w/prediction line
- Data Set One Cost Function
- Data Set Two Plot w/prediction line
- Data Set Two Cost Function
- Data Set Three Plot w/prediction line
- Data Set Three Cost Function