After yesterday’s blog post, I pondered the issue of why my implementation of Dr. Ng’s algorithm performed so poorly on the home price dataset. So, I started debugging line by line and noticed that the B1 value was still descending as it moved through the method that finds lowest B0B1 values. The method that worked for the first two datasets was to get the largest x and run the cost function for each x,y value starting with 0,0 increasing with an increment of .25 up to the maximum x value. In this case, that maximum x value is 4478.

So, I increased the x value to 100000 and then B0 started getting smaller. So, my first thought was that the range was not big enough. However, much larger numbers produce out of memory issues. So, for comparison, I ran my implementation of Dr. Brownlee’s algorithm and got these values for B0 and B1:

B0: 132895.7055152212

B1: 97.0552698578078

The lowest B0 and B1 number returned by using the dataset largest x value of 4478 is:

B0: 35801763334.11702

B1: 4478

So, I increased the range of possible values to be from -10,000 to 10,000,000 and this produced a lower B0 and B1 values, but far from the one Dr. Brownlee’s example produced.

B0: 23963439835.81915,

B1: 10059.5

So, assuming that I am in the wrong areas of the negative/positive spectrum, I made range from -10,000,000 to 10060 since the last B1 was 10059.5. This effort produced the same result. Clearly, I am not guessing correctly at the values for B0.

So, I went back over Dr. Ng’s course material and it is still not apparent to me what I am doing wrong. Unfortunately I am either using this algorithm incorrectly for the house data set or it is not meant for the house data set. He also teaches a variation of this algorithm called gradient descent. In another blog post, I will deal with it. For now, I will assume I missed something in its implementation. My suspicion is that I am not correctly trying to guess B0 and B1 in the larger dataset.

I did however add the cost function visual. The specific runs are as follows:

- Data Set One Plot w/prediction line

- Data Set One Cost Function

- Data Set Two Plot w/prediction line

- Data Set Two Cost Function

- Data Set Three Plot w/prediction line

- Data Set Three Cost Function

Stay tuned!

Reference

- https://www.coursera.org/learn/machine-learning/lecture/N09c6/cost-function-intuition-i
- https://www.coursera.org/learn/machine-learning/lecture/db3jS/model-representation
- https://github.com/anirudhjayaraman/Machine-Learning

]]>

In this blog post (reference #3), I stated that I had been pointed to Andrew Ng’s free Coursera course on Machine Learning. The first machine learning algorithm it covers is Linear Regression. Even though Professor Ng is an excellent teacher, I did not immediately understand the subtle nuances of the single variable algorithm. So, I took a step back and researched other examples on line. While doing research, I stumbled upon Dr. Jason Brownlee’s math example. In that same blog post (and two succeeding blog posts (references 4 & 5)), I implemented his algorithm in Javascript and was able to get it to work with two simple datasets and one more advanced house price dataset.

With that under my belt, I went back to Dr Ng’s lecture notes (references 1 & 2) and tried again. In concept, Dr Brownlee’s and Dr. Ng’s algorithms are very similar for the hypothesis function. However, they differ in how to get B0 and B1.

**Dr. Brownlee’s Algorithm (reference 6)**

y = B0 + B1 * x

B1 = sum((xi-mean(x)) * (yi-mean(y))) / sum((xi – mean(x))^2)

B0 = mean(y) – B1 * mean(x)

**Dr. Ng’s Algorithm (reference 1 & 2)**

y = B0 + B1 * x

B0 = 1/2 m sum((xi-yi)^2)

I will only talk about Dr. Ng’s algorithm here since the previous blog posts cover Dr. Brownlee’s algorithm. What confused me earlier with Dr. Ng’s algorithm is you must guess B1 value to find B0. In doing so, you keep the lowest B0 value and use it in the hypothesis function. Data set 2 is one from Dr. Ng’s lecture, so I did it first.

To make Dr. Ng’s algorithm work, I put together a couple of functions. The first is one that returns the largest x for a dataset.

The second function takes the largest x and returns an array of B1 values to be used in calculating the lowest B0.

When run, the output shows that the lowest B0 is 0 and its corresponding B1 value is 1.

Using B0 and B1, predictions can be made.

The other dataset that this algorithm works with is the one Dr. Brownlee used.

The dataset this algorithm does not work with is the housing price one.

**Conclusions**

I find Dr. Brownlee’s Algorithm a lot simpler and easier to use than Dr. Ng’s. It works with all three datasets. Dr. Ng’s requires a lot of guess work on how to find B0 based on guessing at values for B1. At least in this implementation, this doesn’t seem to work for the house price data set. I am sure it is something I am doing wrong and my next blog post will deal with finding the source of the issue.

Stay tuned!

References

- https://www.coursera.org/learn/machine-learning/lecture/N09c6/cost-function-intuition-i
- https://www.coursera.org/learn/machine-learning/lecture/db3jS/model-representation
- https://erichelin.wordpress.com/2017/10/29/javascript-machine-learning-linear-regression/
- https://erichelin.wordpress.com/2017/10/29/javascript-machine-learning-linear-regression-w2-new-datasets-how-did-we-do/
- https://erichelin.wordpress.com/2017/11/02/javascript-machine-learning-linear-regression-visual-results/
- https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/

]]>

Its not perfect, but I was able to add a https library (reference #1) and have it call the https://www.tgimba.com site after the deploy is complete. When the code pipeline is run, this is the log entry after the site is deployed and the verification code is called.

It is completed in two parts:

- JavaScript Module – Contains code to verify the site is up after the code pipe line runs.

- Buildspec.yml – I added the ‘npm run verifyDeploy’ command.

Stay tuned!

References:

- https://www.npmjs.com/package/https
- https://stackoverflow.com/questions/5998694/how-to-create-an-https-server-in-node-js

]]>

The second of three Continuous Integration (CI) pipelines has been completed for TGIMBA

The first was the TypeScript Node JS Application Programmatic Interface (API) I plan to replace the current .NET 4.6X Windows Communication Foundation (WCF) service with. With .NET core in more use everyday, TGIMBA needs to leave its older technology suites behind. Given some Node JS applications I am working on now at work, it seemed appropriate to embrace that with TGIMBA. That post is here.

However, before I can do that upgrade, I want to create CI pipelines so that everything is automatically delivered to the user. Doing that for the .NET 4.6.X TGIMBA website was a lot more complicated than I thought.

First, .NET 4.6.X is considered ‘Classic .NET’. This was very evident to me when I looked through AWS’s environment options in the CodeBuild suite. The only .NET option is .NET Core 1. So, to learn my options, I did some reading on the ways you can compile .NET 4.6.X applications. What I learned is that outside of Visual Studio and TFS, there are not a lot of easy options. So, given that I will update the .NET part of TGIMBA (and there will always be a .NET component) to .NET Core 1 or 2, I opted to pursue this CI pipe line in a rather unusual manner.

Like the TypeScript API CI Pipeline, this pipeline pulls from GitHub. So, I decided that any git commits (I am using a GitHub Visual Studio plug in) for this project will be compiled and ready to deploy. This means adding the bin directory to the repository. Not great, but it works. I will change this when I start the .NET Core update. The other interesting thing was I opted to use a Node JS environment for the CodeBuild part of this pipeline.

The following steps complete the process:

- buildspec.yml
- Remove portions of the Git repository files not needed for the deployment
- Install three node libraries
- FTP – Used to delete and upload the files to my website.
- fs/fs-finder – These two libraries are used to update the web.config with the production database connection string.

- ftpLogin.js – This is a Javascript module I wrote to delete old files, upload the new files and update the web.config.
- The FTP credentials and Database Connection string are obtained from environment variables inside the CodeBuild.

While not ideal, it does work. One issue I noticed is that if someone tries to access the site while a deploy is in progress, it can cause issues. Given this is a fun project, I don’t know if I will do anything about this for now. If I do, I will most likely put up a maintenance page and take the site off line. I am not sure I can automate this in my current setup, but I will eventually.

The third CI Pipeline will be for the TGIMBA Android Application.

Stay tuned!

References

]]>

To follow up on my last two blog posts, I was able to use some Google JavaScript Charts to give a visual on the results. I was concerned at first that the housing example data set from Professor Andrew Ng’s Machine Learning class (reference #4) had some questionable results using my JavaScript implementation of Jason Brownlee’s single variable Linear Regression example (reference #2). However, after reviewing the results, it seems to work. Judge for yourself. For reference, the previous posts are:

- https://erichelin.wordpress.com/2017/10/29/javascript-machine-learning-linear-regression-w2-new-datasets-how-did-we-do/
- https://erichelin.wordpress.com/2017/10/29/javascript-machine-learning-linear-regression/

Results

- Using Jason Brownlee’s numbers

- Using a simple example from Professor Ng’s class

- Using the housing example from Professor Ng’s class

Stay tuned!

References

- https://stackoverflow.com/questions/6859298/how-to-draw-google-line-charts-when-some-of-the-values-are-missing
- https://www.coursera.org/learn/machine-learning
- https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/
- https://github.com/girishkuniyal/Predict-housing-prices-in-Portland

]]>

Ok, I added two data sets from Professor Andrew Ng’s course (see references) and the results are good…I think. The first dataset is a simple and the results were as expected:

The second data set was more complex and is the one Professor Ng uses in his class. The results were not what I was expecting, but close.

First, thetaZero and thetaOne were much larger. Secondly, the prediction for 4215 was not close to the value of y that was provided in the dataset. The same for 852. But, the results were plausible.

My next step is to try other single variable data sets with known outcomes to ascertain how good the working example Mr Brownlee provided (see references) and/or how poor my understanding of the process still is.

Stay tuned!

References

- https://www.coursera.org/learn/machine-learning
- https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/
- https://github.com/girishkuniyal/Predict-housing-prices-in-Portland

]]>

So many topics, so little time. I had intended to focus only on setting AWS code pipelines for each component of TGIMBA. However, as with many things in life, you need to take advantage of opportunities when they present themselves. One such opportunity presented itself to me in the form of a Machine Learning course. More specifically, Professor Andrew Ng’s Machine Learning course on Coursera (see reference #1). I am taking the course at my own pace (tough to keep up while working full time). The first algorithm covered is Linear Regression.

Conceptually, it is pretty straight forward. Given a list of x,y coordinates, the algorithm (once computed) provides a f(x) line slope formula that will give an average of the available x,y coordinates. If you wanted to predict the location of y for a given x, you can.

However, after I started working through the example, the math didn’t add up. I worked through what I thought the correct implementation of the algorithm, but it wasn’t correct. I posted some questions, but I was not able to get a concrete example of the algorithm. So, I looked around online and found an excellent example. In the article “Simple Linear Regression Tutorial for Machine Learning”, Jason Brownlee illustrates a complete math example of a single variable linear regression.

In this blog post, I will walk you through my JavaScript implementation of his example.

To run:

- Make a get request

- View Results

The ‘structure’ of the program follows the article’s math flow.

The process is

- Get the data sets

- Calculate the mean X and Y values

- Calculate the numerator

- Calculate Numerator – Part One

- Calculate Numerator – Part Two

- Calculate the Denominator

- Make a prediction

What was really cool about this experience was having a complete example. My next blog post will deal with taking a data set from Professor Ng’s class and seeing if Mr. Brownlee’s math holds up. If so, I will expand it to multiple variables.

Stay tuned!

References

- https://www.coursera.org/learn/machine-learning
- https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/

]]>

I apologize for being absent, but work and other priorities have been keeping me busy. That said, I am getting back to TGIMBA! There are a ton of things I want to update and change, but I want to start with a firm foundation. To that end, I have created the first of three Continuous Deployment Continuous Integration (CDCI) pipelines for the TGIMBA system. The current system consists of an Android Application, HTML 5/JavaScript website, a .NET WCF/Web API backend and a SQL Server database.

This pipeline is for a TypeScript Node JS API I created a while back. My long term plan is to ‘break’ the API Backend up a bit. More specifically, the HTML 5/JavaScript website will go through the WCF/Web API which will access the SQL Server Database with the TypeScript Node JS API. The Android Application will use the TypeScript Node JS directly. I want to try this because I really had to hack the Android Application to work with the WCF/Web API. In the more distant future, the IOS Application will also use the TypeScript Node JS API.

The AWS CodePipeline is pretty easy to use out of the box. The one I created here pulls from github, builds it using AWS CodeBuild and then deploys it to a staging AWS Elastic BeanStalk.

I opted for a staging environment because I plan to add a production environment once I am ready to have TGIMBA start using it.

The most difficult part of this was knowing how to construct the build artifacts. To that end, it requires a buildspec.yml. Mine looks like this:

Stay Tuned!

]]>

]]>

In my post on Machine Learning (ML) (see below), I was ‘looking’ for the best way to update my speech recognition program to recognize certain words.

To that end, I focused on supervised learning and read Adam Geitgey’s blog series (see reference #3). Using his house example in the first part, I created a JavaScript version of his algorithm and tried running with it. About half way in, it dawned on me that he hadn’t provided a complete example (you apparently can get the step by step by subscribing to his class (see reference #4)). I tried completing it without reviewing the video to see how I did. Didn’t get very far. I will be signing up for his class shortly

So, I then decided to try the more basic perceptron example I have seen in many forms across the web. I chose a version I found on coding vision (see reference #2). I was able to quickly replicate that author’s success.

My application structure looks like this:

To view:

- Make a GET call

- View The Results

This JavaScript version of the author’s code consists of a training and run method. As described on the author’s site, the weights are ‘trained’ on an expected range of input. Then, when presented to the run method with the weights, the specified elements are picked as ‘recognized’ elements from the others.

After completing that part, I wanted to see if it could be applied the house example and this answer is…well, kinda Let me explain.

The house example had three parameters – bedroom count, square footage and the city it is located in. I plugged this directly into the same perceptron code to see if it worked. However, instead of 2 inputs, there were 3. Perceptrons seem to work by dividing two groups of points that are linearly separable (as the reference #2 author pointed out). 3 inputs would imply a 3 dimensional graph with an x, y and z axis.

I never got it to work

So, I removed the location city from the list of variables to make it essentially the same problem as reference #2’s author presented. It works…kinda and only sometimes.

- First Issue – If you look at my training data, you will notice that the ‘recognized’ element is last (i.e. ‘expectedOutput: 1’). If I placed it at the start or middle of the training set, I get false positives.

- Second Issue – If you look at my run data, you will notice that the number of bedrooms and the square footage size are not greater than the ‘recognized’ element. If I added either to a non-recognized element, I got false positives. It doesn’t seem to matter whether I include data that is larger in the run or training set.

As long as I stayed within these caveats (ish), it seems to work

However, it just kinda reaffirms to me that the Perceptron is good for small, academic and limited issues. I have played with same algorithm in the past. For example, I tried to get it to recognize the color blue. More specifically and following another blogger’s post (he couldn’t get it to work either), I created a 3 input Perceptron to process various RGB colors. Each input had a value of 0-255/255. Never worked for me.

For an algorithm to be able to solve the house problem with more than two inputs reliably, it must be a lot more robust than this effort. Or, my knowledge of how to correctly model my data needs to be improved. I think both are true

So, my next post in this series will (hopefully) be with a robust, supervised machine learning algorithm that can complete the house problem.

Stay tuned!

References

- http://aass.oru.se/~lilien/ml/seminars/2007_02_01b-Janecek-Perceptron.pdf
- http://www.codingvision.net/miscellaneous/c-perceptron-tutorial
- https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471
- https://www.lynda.com/Data-Science-tutorials/Machine-Learning-Essential-Training-Value-Estimations/548594-2.html?lpk35=9149&utm_medium=ldc-partner&utm_source=CMPRC&utm_content=524&utm_campaign=CD20575&bid=524&aid=CD20575

]]>