티스토리 뷰

AlexNet 보다 더 깊은 16 / 19 Layer까지 쌓은 Network로 그 이상 쌓는 경우 Saturation에 의해 더 깊게 쌓는 것이 무의미한 실험결과를 논문에서는 언급하고 있으며, 비교적 간단한 구조로 높은 성능을 내어 아직까지 자주 사용되는 모델이다. 각 Block 마다 Maxpool이 추가되어 있어 적은 Tuning으로 U-net 구조의 Feature Extractor로 사용되기도 한다.

 

1. Setup

 

import torch.nn as nn
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.optim as optim
import time
import numpy as np

import random
import torch.backends.cudnn as cudnn

seed = 2022
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
cudnn.benchmark = False
cudnn.deterministic = True
random.seed(seed)

2. Network

기본구조는 conv3 64 - Maxpool - conv3 128 - Maxpool - conv3 256 - Maxpool - conv3 512 - Maxpool - conv3 512 - Maxpool - fc 4096 - fc 4096 - fc (num_classes) - softmax 형태로 구현된다.

def conv(stride = 1, padding = 1, kernel_size = 3, in_channels = 3, out_channels = 64):
    conv_l = nn.Conv2d(in_channels = in_channels, out_channels = out_channels, kernel_size = kernel_size, stride = stride, padding = padding)
    batch_1 = nn.BatchNorm2d(num_features = out_channels)
    act_1 = nn.ReLU()
    return nn.Sequential(conv_l, batch_1, act_1)


class VGGNet(nn.Module):
    def __init__(self, layers = 16, in_channels = 3, num_classes = 10, size = 224):
        super(VGGNet, self).__init__()
        self.in_channels = in_channels
        self.num_classes = num_classes
        self.layers = layers
        self.size = size

        self.conv1 = nn.Sequential(
            conv(stride = 1, padding = 1, kernel_size=3, in_channels = 3, out_channels=64),
            conv(stride = 1, padding = 1, kernel_size=3, in_channels = 64, out_channels=64),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.conv2 = nn.Sequential(
            conv(stride=1, padding=1, kernel_size=3, in_channels=64, out_channels=128),
            conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=128),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        if self.layers == 16:
            self.conv3 = nn.Sequential(
                conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=256),
                conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
            self.conv4 = nn.Sequential(
                conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
            self.conv5 = nn.Sequential(
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
        else:
            self.conv3 = nn.Sequential(
                conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=256),
                conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
            self.conv4 = nn.Sequential(
                conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
            self.conv5 = nn.Sequential(
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
        self.features = nn.Sequential(self.conv1, self.conv2, self.conv3, self.conv4, self.conv5)
        self.flat = nn.Flatten()
        self.classifiers = nn.Sequential(
            nn.Linear(in_features=512 * size * size // ((2 ** 5) * (2 ** 5)), out_features=4096, bias=True),
            nn.Linear(in_features=4096, out_features=4096, bias=True),
            nn.Linear(in_features=4096, out_features=self.num_classes, bias=True),
        )
        self.soft = nn.Softmax(dim = 1)

    def forward(self, x):
        x = self.features(x)
        x = self.flat(x)
        x = self.classifiers(x)
        x = self.soft(x)

        return x

3. Train

다음 Github 링크에 전체 코드를 업로드 했다. 학습결과는 AlexNet과 비교하면 높은 성능을 보여준다.

Epoch 20 기준

AlexNet - Train acc : 0.66 Test acc : 0.67

VGGNet - Train acc : 0.888 Test acc : 0.817

 

논문 : https://arxiv.org/abs/1409.1556

 

Very Deep Convolutional Networks for Large-Scale Image Recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x

arxiv.org