티스토리 뷰

AI

AI Neural Network Summary

꿈더꿔 2023. 10. 4. 11:35

I will summary about Neural Netwrok such as learning rate, feed forward, backpropagation, gradient descent with python code. Let's start!

 

한글로 정리된 내용은 아래 글을 참고하시면 됩니다.

 

 

 

 

Neural Network Summary

1. Neural Network

In a neural network, there are a various layers. 

Input layer

Data point or real world input data (generally, vector, matrix, tensor ...)

Hidden layer

In neural network, important layer. we can solve various problem such as xor problem,  problem needed a lot of feature understanding with hidden layer. more hidden layer, we can solve more complicated problem.

Output layer

We can see target. For example, class label / sequence embedding / stock price ...

 

 

If you want Korean version, refer to this posting

 

 

 

 

2. Loss Function

Each dependent variable(continuous, binary, categorical) has a different form of the loss function.

 

Continuous

Generally, MSE(Mean Square Error) is used. It is mean value about difference's square between prediction and truth. It is commonly used to regression problem. (output layer dim equal to 1)

binary

generally, class label is A or B. So Cross Entropy is used a lot for binary dependent variable.

categorical

It is a generalization of binary. In output layer, softmax is used and use categorical cross entropy as a loss function for categorical problem.

 

3. Stochastic Gradient Descent

Gradient descent is a very important factor in ML or DL. When updating weight, usually SGD is used. Gradient Method is  adjusting weight value to optimal value with gradient. Also, stochastic is associated with mini-batch. Because we applied gradient descent with random sampling (mini-batch), this method is called stochastic gradient descent.

 

4. Backpropagation

For updating weight, we need calculus (partial derivative, chain rule). After feedforward, all weight are calculated. In other words, we need change of loss / change of weight. so the calculation is needed in the opposite direction to feed forward. This is backpropagation.

 

5. Update Function

update weight = original weight - learning rate * gradient (high level formulas)

 

6. GPU vs CPU

CPU is used general processing. GPU's important feature is parallel processing. In DL, model have a lot of parameters. so when updating parameters, needed a lot of resource, time and computing power. so parallel processing is very important.

 

7. Learning rate

If gradient is very high value, training will be unstable. so we want slow, stable training. but modifying weight is hard problem. so in update function, as multiplying a small amount to gradient, solve gradient exploding. then, a small amout is called 'learning rate'. Generally, learning rate range is 0.0001 ~ 0.01.