티스토리 뷰
목차
I will summary about Neural Netwrok such as learning rate, feed forward, backpropagation, gradient descent with python code. Let's start!
한글로 정리된 내용은 아래 글을 참고하시면 됩니다.
Neural Network Summary
1. Neural Network
In a neural network, there are a various layers.
Input layer
Data point or real world input data (generally, vector, matrix, tensor ...)
Hidden layer
In neural network, important layer. we can solve various problem such as xor problem, problem needed a lot of feature understanding with hidden layer. more hidden layer, we can solve more complicated problem.
Output layer
We can see target. For example, class label / sequence embedding / stock price ...
If you want Korean version, refer to this posting
2. Loss Function
Each dependent variable(continuous, binary, categorical) has a different form of the loss function.
Continuous
Generally, MSE(Mean Square Error) is used. It is mean value about difference's square between prediction and truth. It is commonly used to regression problem. (output layer dim equal to 1)
binary
generally, class label is A or B. So Cross Entropy is used a lot for binary dependent variable.
categorical
It is a generalization of binary. In output layer, softmax is used and use categorical cross entropy as a loss function for categorical problem.
3. Stochastic Gradient Descent
Gradient descent is a very important factor in ML or DL. When updating weight, usually SGD is used. Gradient Method is adjusting weight value to optimal value with gradient. Also, stochastic is associated with mini-batch. Because we applied gradient descent with random sampling (mini-batch), this method is called stochastic gradient descent.
4. Backpropagation
For updating weight, we need calculus (partial derivative, chain rule). After feedforward, all weight are calculated. In other words, we need change of loss / change of weight. so the calculation is needed in the opposite direction to feed forward. This is backpropagation.
5. Update Function
update weight = original weight - learning rate * gradient (high level formulas)
6. GPU vs CPU
CPU is used general processing. GPU's important feature is parallel processing. In DL, model have a lot of parameters. so when updating parameters, needed a lot of resource, time and computing power. so parallel processing is very important.
7. Learning rate
If gradient is very high value, training will be unstable. so we want slow, stable training. but modifying weight is hard problem. so in update function, as multiplying a small amount to gradient, solve gradient exploding. then, a small amout is called 'learning rate'. Generally, learning rate range is 0.0001 ~ 0.01.
'AI' 카테고리의 다른 글
[논문 리뷰] Swin Transformer 간단하게 리뷰해보기 (0) | 2022.08.31 |
---|---|
[논문 구현] ViT 살펴보기 3편 - Pytorch 구현 (0) | 2022.08.31 |
[논문 리뷰] ViT 살펴보기 2편 - Vision Transformer (0) | 2022.08.31 |
[논문 리뷰] ViT 살펴보기 1편 - Transformer (0) | 2022.08.30 |
[논문 리뷰 및 구현] MobileNet V3 간단하게 리뷰하고 구현해보기!! (0) | 2022.08.27 |