## Learning ML Part VII Neural Networks

**Learning ML Part VII. Neural Networks**

This week we will explore Neural Networks, a form of supervised learning that takes in multiple inputs and predicts a continuous output. The name neural networks comes from the way biological neurons work and were modeled after.

Much like Regression it attempts to predict a continuous output to given inputs. A basic example of a Neural network is a Perception. It is an algorithm that based on the inputs and combined with a set of weights it’s output is 0 or 1.

**Basic Perceptron Example**

The formula goes like this, given inputs X1.. Xn and corresponding weights, if the summation of the product of the weights and inputs are bigger than θ (the activation) then the result is true and if not it is false. A perception unit can be combined to make any logic work. For example an And statement can be written as such:

If the inputs X1 and X2 can only be 0 or 1. Then, lets say if they were both 1 the output is 1. if they were both 0 the out put will be 0. If you solve the truth table for every combination you would get and exactly. If you graph this you get the following:

Perceptions are linearly separable, meaning there is a line that separates true or false. If the data is not linearly separable we have to use anther method call Gradient Descent.

This is Linearly Seperable:

This is not:

**Gradient Descent**

Gradient Descent is used when we model more complicated non linear seperable data. It seeks to minimize the Error given in d the training set.:

without getting too in detail, we use Calculus to turn this formula to :

with yd as a output minus the w*xd the activation (from the percepitron model) times the x from the input. Again this is used for non-linearly separable data, although the final formula looks a lot like the previous percepitron model. It is used for non-linearly separble data.

**Sigmoid**

An S Like function that gets applied to the activation. As the value of the activation (z) increases towards +inf it goes towards 1 and the as it goes towards -inf 0. The sigmoid affects the activation of each unit. It acts as a middleware between input and the output, the input is weighted, then put through the sigmoid function. By using sigmoids we can model more complex functions that are non-linear.

**Putting it all Together**

So putting it all together, The above diagram shows a neural network. We take in a bunch of inputs and we get an output. But between the inputs we have sigmoid units, which are the circles. Basically an input is weighted, put through the sigmoid function and the output is passed to next level. Each level is called a hidden layer because you can’t really see the inputs or the outputs. As we go up the neural network each level passes down outputs to the next level until you get to the end. So then this brings us to the idea of back propagation where the output is taken and passed through the input again and going through the neural network again from the beginning. This is made possible because the weights are differential, meaning they can be adjusted based on input to match more of an output we want. Since the network seeks to minimize error, every time we go back, are errors are getting less and we are achieve a result we want, and thus that is where the learning is taking place.