티스토리 뷰

목차



     

    Pytorch로 AlexNet을 간단하게 구현해보자

     

    논문에서 제시한 AlexNet 구조보다 좀 더 간단하게 1 Flow로

    11x11 Conv - 5x5 Conv - Pooling - 3x3 Conv - 3x3 Conv - 3x3 Conv - Pooling - fc 4096 - fc 4096 - classifier 구조로 구현해봤다.

    1. Intro

    논문에서 제시한 AlexNet 구조이다.

    1. 11x11 Conv

    stride = 4, padding = 0, out_channels = 96 

    논문은 48 + 48 두개로 split했으나 간단하게 1 flow로 구현

    input size = 227(논문에서는 224지만 실제로 227로 구현해야 동작하여 transform Resize 활용해 적용)

    output size = 55

     

    2. 5x5 Conv

    stride = 1, padding = 2, out_channels = 256 + Max Pooling (stride = 2, kernel = 3)

    논문은 128 + 128 두개로 split했으나 간단하게 1 flow로 구현

    input size = 55

    output size = 27

     

    3. 3x3 Conv

    stride = 1, padding = 1, out_channels = 384 + Max Pooling (stride = 2, kernel = 3)

    논문은 192 + 192 두개로 split했으나 간단하게 1 flow로 구현

    input size = 27

    output size = 13

     

    4. 3x3 Conv 

    stride = 1, padding = 1, out_channels = 384 

    논문은 192 + 192 두개로 split했으나 간단하게 1 flow로 구현

    input size = 13

    output size = 13

     

    5. 3x3 Conv

    stride = 1, padding = 1, out_channels = 256 + Max Pooling (stride = 2, kernel = 3)

    논문은 192 + 192 두개로 split했으나 간단하게 1 flow로 구현

    input size = 13

    output size = 7

     

    6. Fully Connected Layer

    4096 - 4096 - num_classes (코드에서는 10으로 구현)

     

     

     

    2. Code

    1. Setup

    import torch
    import torchvision
    import torch.nn as nn
    import torchvision.transforms as transforms
    from torch.utils.data import DataLoader, sampler
    import torch.optim as optim
    import time
    import numpy as np
    
    import random
    import torch.backends.cudnn as cudnn
    
    seed = 2022
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    cudnn.benchmark = False
    cudnn.deterministic = True
    random.seed(seed)

     

     

    2. Dataset 준비

    CIFAR-10 기준으로 모델을 돌리는 코드작성

    train_data = torchvision.datasets.CIFAR10(root = '../data', train = True, transform = transforms, download = True)
    train_loader = DataLoader(train_data, shuffle = True, batch_size = batch_size, num_workers = 1)
    
    test_data = torchvision.datasets.CIFAR10(root = '../data', train = False, transform = transforms, download = True)
    test_loader = DataLoader(test_data, shuffle = False, batch_size = batch_size, num_workers = 1)
    
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    
    transforms = transforms.Compose(
        [transforms.Resize(227),
         transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )

     

     

    3. AlexNet

    class Alexnet(nn.Module):
        def __init__(self, in_channels = 3, classes = 1000):
            super().__init__()
            self.classes = classes
            self.in_channels = in_channels
            self.conv_1 = nn.Sequential(nn.Conv2d(in_channels = self.in_channels,
                                                   out_channels = 96,
                                                   kernel_size = 11,
                                                   stride = 4,
                                                   padding = 0),
                                         nn.BatchNorm2d(num_features=96),
                                         nn.ReLU(),
                                         nn.Dropout2d(),
                                         nn.MaxPool2d(kernel_size = 3, stride = 2))
            self.conv_2 = nn.Sequential(nn.Conv2d(in_channels = 96,
                                                   out_channels = 256,
                                                   kernel_size = 5,
                                                   stride = 1,
                                                   padding = 2),
                                         nn.BatchNorm2d(num_features=256),
                                         nn.ReLU(),
                                         nn.Dropout2d(),
                                         nn.MaxPool2d(kernel_size = 3, stride = 2))
            self.conv_3 = nn.Sequential(nn.Conv2d(in_channels = 256,
                                                   out_channels = 384,
                                                   kernel_size = 3,
                                                   stride = 1,
                                                   padding = 1),
                                         nn.BatchNorm2d(num_features=384),
                                         nn.ReLU(),
                                         nn.Dropout2d())
            self.conv_4 = nn.Sequential(nn.Conv2d(in_channels = 384,
                                                   out_channels = 384,
                                                   kernel_size = 3,
                                                   stride = 1,
                                                   padding = 1),
                                         nn.BatchNorm2d(num_features=384),
                                         nn.ReLU(),
                                         nn.Dropout2d())
            self.conv_5 = nn.Sequential(nn.Conv2d(in_channels = 384,
                                                   out_channels = 256,
                                                   kernel_size = 3,
                                                   stride = 1,
                                                   padding = 1),
                                         nn.BatchNorm2d(num_features=256),
                                         nn.ReLU(),
                                         nn.Dropout2d(),
                                         nn.MaxPool2d(kernel_size = 3, stride = 2))
            self.flat = nn.Flatten()
            self.linear1 = nn.Linear(in_features = 6 * 6 * 256, out_features = 4096)
            self.act1 = nn.ReLU()
            self.linear2 = nn.Linear(in_features = 4096, out_features = 4096)
            self.act2 = nn.ReLU()
            self.linear3 = nn.Linear(in_features = 4096, out_features = self.classes)
            self.soft = nn.Softmax(dim = -1)
    
    
        def forward(self, x):
            x = self.conv_1(x)
            x = self.conv_2(x)
            x = self.conv_3(x)
            x = self.conv_4(x)
            x = self.conv_5(x)
            x = self.flat(x)
            x = self.linear1(x)
            x = self.act1(x)
            x = self.linear2(x)
            x = self.act2(x)
            x = self.linear3(x)
            x = self.soft(x)
            return x

    약간의 Refactoring이 필요하지만 Naive하게 구현했으며 Normalization 부분은 Batch Normalization으로 대체하여 구현

    Activation Function은 모두 ReLU로 작성했으며, Block마다 Dropout을 적용했다.

     

     

     

    4. Train Code

    batch_size = 8
    EPOCHS = 20
    
    device = torch.device('cuda')
    
    model = Alexnet(classes = 10)
    model.to(device)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(params = model.parameters(), lr = 0.001, momentum = 0.9, weight_decay = 0.0005)
    
    def eval_model(model, data):
        model.eval()
        with torch.no_grad():
            t_count = 0
            a_count = 0
            t_loss = 0
            for j, data2 in enumerate(data):
                inputs, labels = data2[0].to(device), data2[1].to(device)
                t_count += len(inputs)
                preds = model(inputs)
                loss = criterion(preds, labels)
                output = torch.argmax(preds, dim=1)
                a_count += sum(output == labels)
                t_loss += loss
        model.train()
        val_loss = t_loss / t_count
        val_acc = a_count / t_count
        return val_loss, val_acc
        
    if __name__ == "__main__":
        for epoch in range(EPOCHS):
            for dset in ['train', 'test']:
                if dset == 'train':
                    model.train()
                else:
                    model.eval()
                a_count = 0
                t_count = 0
                t_loss = 0
                if dset == 'train':
                    start = time.time()
                    for i, data in enumerate(train_loader):
                        inputs, labels = data[0].to(device), data[1].to(device)
                        t_count += len(inputs)
                        optimizer.zero_grad()
                        preds = model(inputs)
                        loss = criterion(preds, labels)
                        loss.backward()
                        optimizer.step()
    
                        output = torch.argmax(preds, dim = 1)
                        a_count += sum(output == labels)
                        t_loss += loss
                        if (i + 1) % 1000 == 0:
                            print(f"loss : {t_loss / t_count:.5f} acc : {a_count / t_count:.3f}")
                    time_delta = time.time() - start
                    print(f"Final Train Acc : {a_count / t_count:.3f}")
                    print(f"Final Train Loss : {t_loss / t_count:.5f}")
                    print(f'Train Finished in {time_delta // 60}mins {time_delta % 60} secs')
                else:
                    val_loss, val_acc = eval_model(model, test_loader)
                    print(f"val loss : {val_loss:.5f} val acc : {val_acc:.3f}")

    EPOCHS = 20 기준으로 acc = 66.4 까지 성능이 나온다. (acc는 test set 기준)

    Validation 단계에서 metric 평가를 위해 eval_model 함수로 따로 작성했다.

    Loss는 Classification Task라 CrossEntropy로 했으며, Optimizer는 Stochastic Gradient Descent로 작성했다.