Here is an simple example of ensemble learning method called bagging:

- Take your training set and split up into 5 sets of 5 data points.
- Apply 3rd order polynomials to each of your 5 sets.
- Average them.

The result is , you get a better result at predicting the training set than you would applying a 3rd degree or 4th degree polynomial over the whole set. The graph below shows housing data according to time:

In the graph above, the red X shows training data set and the green X shows a test data point. The Red line is derived from the result of the Bagging algorithm and the blue line is derived from applying a 4th order polynomial over the whole training set. You can see that the red line predicts the test point closer than the blue line which indicates that in this situation bagging has a slightly better outcome than applying regression over the training set.

**Coming Soon.**

]]>

The Nearest Neighbor algorithm is an instance based learning algorithm than can handle both classification and regression. It is one of the simplest of algorithms because any data point can be inferred from it’s nearest neighbor. See the diagram below: Let’s say you have a graph of red and blue points and we want to figure out what the green point is.

We can see that the nearest point to the green circle is a red, so logically we could assume green is also a red. There is also Kth Nearest Neighbor, method which takes into account not just the closest but also the kth nearest points. For example in the same graph, if we want the 3rd Nearest Neighbor to the green circle, there are two reds and one blue. What should the green circle be then? Well if we are using classification we implement a voting system (count up the number of blues and reds), or if we were doing regression we would compare the mean of the distances.

So this algorithm is helpful when you are looking at a graph with geographical information such as finding out what the cost of a house is in a neighborhood, or mapping a certain classification to a graph of data. For example, If we want to assign credit scores according to financial data of people in a database, we can graph them and find their nearest neighbor.

So then, we come to the idea of Instance based learning or lazy learning. Kth Nearest Neighbor is considered an instance based learning method because learning takes a relatively short amount of time but the querying takes a relatively longer amount of time. What is meant by that is Kth Nearest Neighbor does not produce a function to predict outputs like a regression. That makes Kth Nearest Neighbors extremely efficient because the training data is the model for learning. However, KNN takes a longer time on the back end to compute all the neighbors, as well as store all the data points in memory. The opposite of this is Eager learning methods like regression, decision trees and neural networks who produce models first.

KNN can also be described as non-parametric method because it does not produce a function which it maps to. It does not make assumptions about the form of the function of the data, the data is the model.

**That’s It for Today and the Kth Learning Algorithm**

This week we will explore Neural Networks, a form of supervised learning that takes in multiple inputs and predicts a continuous output. The name neural networks comes from the way biological neurons work and were modeled after.

Much like Regression it attempts to predict a continuous output to given inputs. A basic example of a Neural network is a Perception. It is an algorithm that based on the inputs and combined with a set of weights it’s output is 0 or 1.

**Basic Perceptron Example**

The formula goes like this, given inputs X1.. Xn and corresponding weights, if the summation of the product of the weights and inputs are bigger than θ (the activation) then the result is true and if not it is false. A perception unit can be combined to make any logic work. For example an And statement can be written as such:

If the inputs X1 and X2 can only be 0 or 1. Then, lets say if they were both 1 the output is 1. if they were both 0 the out put will be 0. If you solve the truth table for every combination you would get and exactly. If you graph this you get the following:

Perceptions are linearly separable, meaning there is a line that separates true or false. If the data is not linearly separable we have to use anther method call Gradient Descent.

This is Linearly Seperable:

This is not:

**Gradient Descent**

Gradient Descent is used when we model more complicated non linear seperable data. It seeks to minimize the Error given in d the training set.:

without getting too in detail, we use Calculus to turn this formula to :

with yd as a output minus the w*xd the activation (from the percepitron model) times the x from the input. Again this is used for non-linearly separable data, although the final formula looks a lot like the previous percepitron model. It is used for non-linearly separble data.

**Sigmoid**

An S Like function that gets applied to the activation. As the value of the activation (z) increases towards +inf it goes towards 1 and the as it goes towards -inf 0. The sigmoid affects the activation of each unit. It acts as a middleware between input and the output, the input is weighted, then put through the sigmoid function. By using sigmoids we can model more complex functions that are non-linear.

**Putting it all Together**

So putting it all together, The above diagram shows a neural network. We take in a bunch of inputs and we get an output. But between the inputs we have sigmoid units, which are the circles. Basically an input is weighted, put through the sigmoid function and the output is passed to next level. Each level is called a hidden layer because you can’t really see the inputs or the outputs. As we go up the neural network each level passes down outputs to the next level until you get to the end. So then this brings us to the idea of back propagation where the output is taken and passed through the input again and going through the neural network again from the beginning. This is made possible because the weights are differential, meaning they can be adjusted based on input to match more of an output we want. Since the network seeks to minimize error, every time we go back, are errors are getting less and we are achieve a result we want, and thus that is where the learning is taking place.

]]>Last week we built a linear regression and a polynomial regression from historical Bitcoin prices: http://jkurokawa.com/2018/04/25/learning-ml-part-v-creating-a-model-using-scikit-learn/. This week we will make more models from historical prices of other cryptocurrencies like Litecoin, Ethereum and Ripple. Let's see if we can produce a linear model out of the data first. I am only analyzing this year's prices as to eliminate some of the volatility experienced in the market last year that will make it hard to come up with an accurate model.

Like last week, we will loop through our rows in the extracted csv using loadtxt, put the datetime and corresponding prices to arrays and store then as variables.

and the console output is this:

The MSE for LTC is: 145.5845122372679

The coef. for LTC is: [-7.26416342e-06]

The MSE for ETH is: 16752.089816773823

The coef. for ETH is: [-7.52252359e-05]

The MSE for RIP is: 0.12567067740217847

The coef. for RIP is: [-2.76329702e-07]

We see that the Error for ETH is high so the linear model does not predict the price of Ethereum coins. The Error for the LTC and RIP models are much smaller so they follow a more linear path.

The above graph depicts just the linear model and plotting it against the testing data.

We see that the LTC graph does not follow the model very well. Let's try to examine LTC prices using polynomial regression for the LTC graph:

Just list last week we will cycle through the various polynomial degrees and make a model as we go along. At every degree we will print out the MSE and plot the model on our graph.

the coefficients for the degree 2 is: [ 1.38845504e+05 -1.74472013e-04 5.47667364e-14]

the mean square error for degree 2 is: 1024.9864388458207the coefficients for the degree 3 is: [ 3.62199542e+02 4.07698308e-03 -5.36497513e-12 1.76491388e-21]

the mean square error for degree 3 is: 924.1754899788522the coeficients for the degree 4 is: [ 1.33493162e+03 1.16188108e-30 2.68213895e-12 -3.53062801e-21

1.16166650e-30]

the mean square error for degree 4 is: 924.6724109907932the coeficients for the degree 5 is: [ 6.82213034e+02 1.61019028e-02 -2.10148658e-11 1.38657496e-21

7.12197734e-30 -2.31815648e-39]

the mean square error for degree 5 is: 921.2089860046844the coefficients for the degree 6 is: [-6.10565773e-01 -8.30229727e-46 3.78262115e-36 2.90815712e-18

-5.74468520e-27 3.78262283e-36 -8.30229711e-46]

the mean square error for degree 6 is: 515.7131605348862

As you can see the MSE for each progressive degree is decreasing. So as we go higher in polynomials, we see a closer and closer fit to the model. Polynomail of degree 6 seems to be the best fit.

After importing data through csv, we will take the first row and convert it to a datetime object. from the datetime object, we will convert it to a unix timestamp and put in an array **dates**. For prices we will take the closing value and put them in an array** prices.**

We will use linear regression first to try it out. You can see a good example here: Linear Reg. Example

The plot of the polynomial graph is above. The code is as follows

Here we enumerate over degrees 3 to 6 and plot the results. We see that the plot degree 6 has the closest to the scatter plot so we will use that. The coefficients are : [-5.40037231e+06, 5.23410271e-01, -1.44786669e-09, 1.51985693e-18, -7.11067134e-28, 1.24872305e-37]

remember that polynomial regressions are of the form:

So the equation is of the form:

To get the MSE for the degree 6 plot we call:

`mean_squared_error(y_train, y_pred)`

and we get 1095563.04 which is significantly high number. So it does not guarantee the accuracy of our model. Next week we will see if we can get our model closer and see next week if any other digital currencies can be modeled using linear or polynomial regression.

>= Python 3.6. You can look up how to install Python from other resources or the Python homepage. After doing so following the next steps:

`pip install -U scikit-learn`

however, you will need all the other dependencies such as numpy and scikit. The easiest way to get scikit-learn and all the dependencies is to get Anaconda. Anaconda is a scientific package of libraries and software used by data scientists. It also comes with useful tools like Jupiter Lab and Notebook so it is worth downloading: https://www.anaconda.com

In order to work in Python, I recommend using an IDE instead of just the terminal. I use PyCharm by JetBrains since the community version is free and it's easy to use. It can be found at https://www.jetbrains.com/pycharm/ to install you can use the installation package or use command`sudo apt-get install pycharm-community`

Once you are done installing, open up the IDE and start a new project and click **File -> Settings**

**project interpreter**

Then, at the top drop down, select **Show All** This should open up a new window.

click the **Green + sign on the right**

**Conda Environment**. And once you do make sure the python version is correct in the dropdown, if not change it. Click okay, and you should see your Anaconda environment loaded into your interpreter list with your version of Python. For my project I used 2.7.

Select your new Anaconda python interpreter and click the**green + sign**

**install package** to install. Once done go back and repeat the process for the following libraries: **numpy, panda, and scipy**

When considering your first project. It can often be difficult to figure out what you will be doing. Go to kaggle.com/datasets

to browse and experiment with datasets. Kaggle is a great resource for data scientists in general. I would highly recommend looking at a few different datasets before starting a project. Since we will be taking over a month to complete, I highly suggest picking a topic you find interesting in order to keep your motivation alive. I find that side projects you find interesting you do better compared to topics you choose at random. Here are the steps I took to getting an idea for starting my project:

- Figure out an interesting results to predict. For example, can you predict the price of a home given it's location and square footage? Can you find other indicators like number of bedrooms and garages as a positive correlation of price?
- Can you attempt to predict the price of a certain stock?
- Can you calculate the strength of concrete given a dataset of water content related to strength
- In the case of predicting home prices , we will need to use a classification algorithm because it the outcome is based on multiple variables
- In the case of stock prices you can you regression because it is based on just two variables time and price

For my first Machine Learning project I will use regression to predict the future price of different digital currencies. The dataset I will be using is this: All Cryptocurrency Data

Once you have picked out a good dataset we will import it in python using numpy or pandas.

Now that we have all the libraries imported into pyCharm we can start wrting code. You can use the numpy library or the pandas.

import numpy by`import numpy as np`

or pandas as`import pandas as pd`

The first few statements uses numpy, lets deconstruct it:

Using the loadtxt command, open the crypot-markets.csv,

`data = np.loadtxt("crypto-markets.csv",delimiter=",",dtype=object, skiprows=1, usecols=[2,3,8])`

Here is a breakdown of the arguements:`delimiter`

is set to commas for csv`dtype`

is set to object because the csv is a mixed datatype of strings and numbers.`skiprows`

is set to 1 to skip the first row of just headers`usecol is set to an array because we only need 2,3, and 8 and the rest of the data we do not need`

In order to reference this data we just call `data`

as if it were an array as such:

`print(data[3])`

And the code for pandas is similar if you choose to use it over numpy:

`data = pd.read_csv("crypto-markets.csv", usecols=[2,3,8])`

`print(data.head)`

This is part II of my blog for Machine Learning algorithms. The core of my learning will be based on Georgia Techs ML class which can be found on Udacity. https://www.udacity.com/course/machine-learning--ud262 . It is the same class that students take to earn credit in their online masters program.

Regression, in terms of machine learning is a form of supervised learning . It is used to map example data of inputs and outputs and come up with a generalization that can be used to predict a future outcome. One thing to be said about the data is that it is continuous and not discrete.

The most common type of regression is linear regression. If you remember back to algebra, we will first plot all the points on a graph and find the best fit line with the old

y = mx + b.

From there, you can expand on the idea to include polynomial fit (i.e. squared, cubic, octic, etc..) Remember back to Algebra where the different degree function exhibit different behaviors.

So let's say we have a set of points, we can inspect those points and eyeball what polynomial function it approximates to. If it looks like a parabola, it's probably a degree 2 function, if it looks linear its probably a degree 1 function.

Can you generalize this in a mathematical way other than eyeballing the data? Yes, you can use linear algebra and use least squares to come up with the error. https://en.wikipedia.org/wiki/Polynomial_regression

Given a matrix of inputs and outputs we can solve for the coefficients B. And the vector of coefficients B can be written as

The concept of cross validation is using a smaller portion of the data as a test set, and the majority as a training set. The training set is used to come up with the regression model and the test set will be used to validate that model by checking the error. This is assuming that the data is Independent and Identically distributed (data is all coming from the same source.)

You can split your data in to four parts. You can use parts 1 through 3 as the training set and part 4 as the test set. Next you can use parts 1, 2, 4 as the training set and part 3 as the test set. After using each part as a test set you can use the model that give you the least amount of error. It is just a way of checking the error and taking the best model as the one you use.

Okay, that's it for this week. Next week I plan on tackling my first project in Machine Learning where I build a project from concepts we have learned s o far using Scikit Learn!

-Joe Kurokawa

So Making a Decision tree is fairly simple. A and B is a condition it is expressed as below:

A or B is a condition and it is expressed as below.

An algorithm for building Decision trees ID3:

Loop:

- Find the best attribute
- Assign A as a decision attribute for node
- For each value of A create a descendant of node
- For each value of A create a descendant of nodes
- Sort training examples of leaves
- If examples perfectly classified then stop.
- Else iterate over leaves

Information gain splits the data into two halves. Low information gain is one in which classifications (Yes or No) is evenly distributed and high information gain is one in which classification can be split into less evenly distributed groups, it is more opinionated. To actually calculate this value we will use entropy.

Entropy is is a measure of how homogeneous a dataset is. For a binary classification, it ranges from 0 to log2(2). Where 0 is all data in the set is the same.

Entropy formula :

For a more specific case the entropy forumula becomes this:

After the root is found the Tree should look like this:

The process should be repeated over again (ie. find the Gains for each attribute and get the highest and set it as the root), but this time for subset of the training data that has the outlook of Sunny. That is the basic idea of descision trees and ID3. Look out next week for more machine learning algorithms.

-Joe Kurokawa

Welcome to the first series of topics on machine learning. Each week I will post what I learned in Machine learning. I have no prior knowledge in this area and will attempt to learn things as I go using online tools. The core of my learning will be based on Georgia Tech's ML class which can be found on Udacity. https://www.udacity.com/course/machine-learning--ud262 , It is the same class that students take to earn credit in their online masters program. This week, I touch on the different types of machine learning. There are three types of machine learning algorithms:

**Supervised learning****Unsupervised learning****Reinforcement learning**