티스토리 뷰

목차



    AlexNet 보다 더 깊은 16 / 19 Layer까지 쌓은 Network로 그 이상 쌓는 경우 Saturation에 의해 더 깊게 쌓는 것이 무의미한 실험결과를 논문에서는 언급하고 있으며, 비교적 간단한 구조로 높은 성능을 내어 아직까지 자주 사용되는 모델이다. 각 Block 마다 Maxpool이 추가되어 있어 적은 Tuning으로 U-net 구조의 Feature Extractor로 사용되기도 한다.

     

    1. Setup

     

    import torch.nn as nn
    import torch
    import torchvision
    import torchvision.transforms as transforms
    from torch.utils.data import DataLoader
    import torch.optim as optim
    import time
    import numpy as np
    
    import random
    import torch.backends.cudnn as cudnn
    
    seed = 2022
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    cudnn.benchmark = False
    cudnn.deterministic = True
    random.seed(seed)

    2. Network

    기본구조는 conv3 64 - Maxpool - conv3 128 - Maxpool - conv3 256 - Maxpool - conv3 512 - Maxpool - conv3 512 - Maxpool - fc 4096 - fc 4096 - fc (num_classes) - softmax 형태로 구현된다.

    def conv(stride = 1, padding = 1, kernel_size = 3, in_channels = 3, out_channels = 64):
        conv_l = nn.Conv2d(in_channels = in_channels, out_channels = out_channels, kernel_size = kernel_size, stride = stride, padding = padding)
        batch_1 = nn.BatchNorm2d(num_features = out_channels)
        act_1 = nn.ReLU()
        return nn.Sequential(conv_l, batch_1, act_1)
    
    
    class VGGNet(nn.Module):
        def __init__(self, layers = 16, in_channels = 3, num_classes = 10, size = 224):
            super(VGGNet, self).__init__()
            self.in_channels = in_channels
            self.num_classes = num_classes
            self.layers = layers
            self.size = size
    
            self.conv1 = nn.Sequential(
                conv(stride = 1, padding = 1, kernel_size=3, in_channels = 3, out_channels=64),
                conv(stride = 1, padding = 1, kernel_size=3, in_channels = 64, out_channels=64),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
    
            self.conv2 = nn.Sequential(
                conv(stride=1, padding=1, kernel_size=3, in_channels=64, out_channels=128),
                conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=128),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
    
            if self.layers == 16:
                self.conv3 = nn.Sequential(
                    conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=256),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                    nn.MaxPool2d(kernel_size=2, stride=2)
                )
                self.conv4 = nn.Sequential(
                    conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    nn.MaxPool2d(kernel_size=2, stride=2)
                )
                self.conv5 = nn.Sequential(
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    nn.MaxPool2d(kernel_size=2, stride=2)
                )
            else:
                self.conv3 = nn.Sequential(
                    conv(stride=1, padding=1, kernel_size=3, in_channels=128, out_channels=256),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=256),
                    nn.MaxPool2d(kernel_size=2, stride=2)
                )
                self.conv4 = nn.Sequential(
                    conv(stride=1, padding=1, kernel_size=3, in_channels=256, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    nn.MaxPool2d(kernel_size=2, stride=2)
                )
                self.conv5 = nn.Sequential(
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    conv(stride=1, padding=1, kernel_size=3, in_channels=512, out_channels=512),
                    nn.MaxPool2d(kernel_size=2, stride=2)
                )
            self.features = nn.Sequential(self.conv1, self.conv2, self.conv3, self.conv4, self.conv5)
            self.flat = nn.Flatten()
            self.classifiers = nn.Sequential(
                nn.Linear(in_features=512 * size * size // ((2 ** 5) * (2 ** 5)), out_features=4096, bias=True),
                nn.Linear(in_features=4096, out_features=4096, bias=True),
                nn.Linear(in_features=4096, out_features=self.num_classes, bias=True),
            )
            self.soft = nn.Softmax(dim = 1)
    
        def forward(self, x):
            x = self.features(x)
            x = self.flat(x)
            x = self.classifiers(x)
            x = self.soft(x)
    
            return x

    3. Train

    다음 Github 링크에 전체 코드를 업로드 했다. 학습결과는 AlexNet과 비교하면 높은 성능을 보여준다.

    Epoch 20 기준

    AlexNet - Train acc : 0.66 Test acc : 0.67

    VGGNet - Train acc : 0.888 Test acc : 0.817

     

    논문 : https://arxiv.org/abs/1409.1556

     

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x

    arxiv.org