2.2.1. 이미지 분할

이미지 분할은 직접 모델을 학습시키지 않고 용도에 적합한 pretrained 모델을 활용

1️⃣ 사용 모델

Pytorch hub에서 제공되는 모델 중 ResNet101 기반 DeeplavV3 사용
- 해당 모델은 Mean IOU가 67.4, Global Pixelwise Accuracy가 92.4로 제공되는 모델 중 ResNet50이나 MobileNetV3를 기반으로 한 모델에 비해 성능이 뛰어남
- 모델은 COCO train2017 dataset으로 학습되었으며, 성능 측정은 COCO val2017 dataset으로 진행됨
- 분류가능한 category는 총 20종으로 아래와 같음
  - aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, TV/monitor, bird, cat, cow, dog, horse, sheep, and person
- category 중 dog가 존재하므로 금번 프로젝트에 사용하기에 적합함
관련 논문 파일

ReThinking Atrous Convolution for Semantic Image Segmentation.pdf
원본 코드 출처

PyTorch

2️⃣ 코드

# 라이브러리 및 모듈 호출
import numpy as np
import PIL
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import torch
from torchvision import transforms, models
import cv2

# 이미지 파일 경로 지정
image_path = '이미지 파일 경로'

# pytorch hub에서 모델 호출
model = models.segmentation.deeplabv3_resnet101(pretrained=True).eval()

# segmentation 결과 확인을 위한 colormap 지정
cmap = plt.cm.get_cmap('tab20c')
colors = (cmap(np.arange(cmap.N)) * 255).astype(int)[:, :3].tolist()
np.random.seed(3)
colors.insert(0, [0, 0, 0]) # 분류 category 외는 검은색으로 지정
colors = np.array(colors, dtype=np.uint8)

# segmentation 함수 정의
def segment(net, img):
    preprocess = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        ),
    ]) 

    input_tensor = preprocess(img) # 이미지를 텐서로 변환
    input_batch = input_tensor.unsqueeze(0) # 모델 입력값으로 사용하기 위해 차원 증가

    if torch.cuda.is_available(): # cuda 사용 설정
        input_batch = input_batch.to('cuda')
        model.to('cuda') 

    output = model(input_batch)['out'][0] # (21, height, width)의 형태로 결과값 반환

    output_predictions = output.argmax(0).byte().cpu().numpy() # (height, width) 
		# 가장 높은 확률의 category 추출

    r = PIL.Image.fromarray(output_predictions).resize((img.shape[1], img.shape[0]))
    r.putpalette(colors) # 결과를 이미지로 변환

    return r, output_predictions

# segmentation 및 결과 확인
img = np.array(PIL.Image.open(image_path)) # 이미지 파일을 넘파이 배열로 변환
fg_h, fg_w, _ = img.shape
segment_map, pred = segment(model, img) # segmentation 수행
background = np.ones((fg_h, fg_w, 3))*255.0 # 결과 이미지를 그릴 흰 배경 생성
mask = (pred == 12).astype(float) * 255 # 분류 결과에서 인덱스 12는 dog를 의미
_, alpha = cv2.threshold(mask, 0, 255, cv2.THRESH_BINARY) # alpha channel 생성
alpha = cv2.GaussianBlur(alpha, (7, 7), 0).astype(float) # 외곽선을 흐릿하게 만듬
alpha = alpha / 255.
alpha = np.repeat(np.expand_dims(alpha, axis=2), 3, axis=2) 
foreground = cv2.multiply(alpha, img.astype(float))
background = cv2.multiply(1. - alpha, background.astype(float)) # 알파 채널을 이용해 dog/나머지 분리
result = cv2.add(foreground, background).astype(np.uint8)
plt.imshow(result) # 결과 확인

3️⃣ 과정 도식화 및 결과

Untitled

흰 배경 구분을 위해 전체 배경을 회색으로 표시
alpha의 경우, 외곽선이 흐려진 모습을 확인 가능
foreground의 경우, 개 이미지가 나타날 것으로 생각할 수 있으나, 데이터 타입이 float이므로 이미지 상으로는 보이지 않는 것이 정상
데이터 타입이 uint8인 최종 result에서는 정상적으로 나타남
모든 이미지는 아무런 후처리를 하지 않은 코드 실행 출력 결과임