티스토리 뷰
목차
AlexNet 보다 더 깊은 16 / 19 Layer까지 쌓은 Network로 그 이상 쌓는 경우 Saturation에 의해 더 깊게 쌓는 것이 무의미한 실험결과를 논문에서는 언급하고 있으며, 비교적 간단한 구조로 높은 성능을 내어 아직까지 자주 사용되는 모델이다. 각 Block 마다 Maxpool이 추가되어 있어 적은 Tuning으로 U-net 구조의 Feature Extractor로 사용되기도 한다.
1. Setup
import torch.nn as nn
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.optim as optim
import time
import numpy as np
import random
import torch.backends.cudnn as cudnn
seed = 2022
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
cudnn.benchmark = False
cudnn.deterministic = True
random.seed(seed)
2. Network
기본구조는 conv3 64 - Maxpool - conv3 128 - Maxpool - conv3 256 - Maxpool - conv3 512 - Maxpool - conv3 512 - Maxpool - fc 4096 - fc 4096 - fc (num_classes) - softmax 형태로 구현된다.
def conv(stride = 1, padding = 1, kernel_size = 3, in_channels = 3, out_channels = 64):
conv_l = nn.Conv2d(in_channels = in_channels, out_channels = out_channels, kernel_size = kernel_size, stride = stride, padding = padding)
batch_1 = nn.BatchNorm2d(num_features = out_channels)
act_1 = nn.ReLU()
return nn.Sequential(conv_l, batch_1, act_1)
class VGGNet(nn.Module):
def __init__(self, layers = 16, in_channels = 3, num_classes = 10, size = 224):
super(VGGNet, self).__init__()
self.in_channels = in_channels
self.num_classes = num_classes
self.layers = layers
self.size = size
self.conv1 = nn.Sequential(
conv(stride = 1, padding = 1, kernel_size=3, in_channels = 3, out_channels=64),
conv(stride = 1, padding = 1, kernel_size=3, in_channels = 64, out_channels=64),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.conv2 = nn.Sequential(
conv(stride=1, padding=1, kernel_size=3, in_channels=64, out_channels=128),
conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=128),
nn.MaxPool2d(kernel_size=2, stride=2)
)
if self.layers == 16:
self.conv3 = nn.Sequential(
conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=256),
conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.conv4 = nn.Sequential(
conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.conv5 = nn.Sequential(
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
nn.MaxPool2d(kernel_size=2, stride=2)
)
else:
self.conv3 = nn.Sequential(
conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=256),
conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.conv4 = nn.Sequential(
conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.conv5 = nn.Sequential(
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.features = nn.Sequential(self.conv1, self.conv2, self.conv3, self.conv4, self.conv5)
self.flat = nn.Flatten()
self.classifiers = nn.Sequential(
nn.Linear(in_features=512 * size * size // ((2 ** 5) * (2 ** 5)), out_features=4096, bias=True),
nn.Linear(in_features=4096, out_features=4096, bias=True),
nn.Linear(in_features=4096, out_features=self.num_classes, bias=True),
)
self.soft = nn.Softmax(dim = 1)
def forward(self, x):
x = self.features(x)
x = self.flat(x)
x = self.classifiers(x)
x = self.soft(x)
return x
3. Train
다음 Github 링크에 전체 코드를 업로드 했다. 학습결과는 AlexNet과 비교하면 높은 성능을 보여준다.
Epoch 20 기준
AlexNet - Train acc : 0.66 Test acc : 0.67
VGGNet - Train acc : 0.888 Test acc : 0.817
논문 : https://arxiv.org/abs/1409.1556
'AI' 카테고리의 다른 글
[논문 리뷰] AlexNet을 간단하게 리뷰해보자! (0) | 2022.07.27 |
---|---|
[논문 구현] ResNet 직접 구현해보기! (0) | 2022.07.27 |
[논문 구현] Vision Transformer (ViT)를 간단하게 구현해보기!! (0) | 2022.07.25 |
[딥러닝 Basic] 신경망 (Neural Network) 간단하게 알아보기2 (0) | 2022.07.25 |
[딥러닝 Basic] 신경망 (Neural Network) 간단하게 알아보기1 (0) | 2022.07.25 |