데이터 과학

고양이와 개의 분류 예제 - CNN (케라스) 본문

인공지능/딥러닝 -파이썬 인공지능

고양이와 개의 분류 예제 - CNN (케라스)

티에스윤 2022. 10. 23. 21:48

고양이와 개의 분류에 대한 컨볼루션 알고리즘의 적용 예제입니다. 

이전 예제에서 좀 더 다른 예제를 통해 고양이와 개의 분류를 케라스로 적용해 봅시다. 

 

https://tsyoon.tistory.com/31

 

 

import numpy as np
import pandas as pd 
from keras.preprocessing.image import ImageDataGenerator, load_img
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import random
import os
print(os.listdir("./catndog"))

 

 

결과:

'sampleSubmission.csv', 'test1.zip', 'train.zip

 

파일이 있는 위치는 이 링크에 있습니다.  https://www.kaggle.com/competitions/dogs-vs-cats/data

파일이 3개 있는데 다운로드하여서 주피터 노트북 ./catndog 폴더에 넣으면 됩니다. 

 

FAST_RUN = False
IMAGE_WIDTH=128
IMAGE_HEIGHT=128
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3

 

// 이미지 크기는 128*128입니다. 채널은 3개입니다. 

 

filenames = os.listdir("./catndog/train/train")
categories = []
for filename in filenames:
    category = filename.split('.')[0]
    if category == 'dog':
        categories.append(1)
    else:
        categories.append(0)

df = pd.DataFrame({
    'filename': filenames,
    'category': categories
})

 

// 카테고리 dog는 1입니다. cat은 0이겠네요. 

 

df.head()

df.tail()

df['category'].value_counts().plot.bar()

 

 

 

sample = random.choice(filenames)
image = load_img("./catndog/train/train/"+sample)
plt.imshow(image)

 

 

 

 

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense, Activation, BatchNormalization

model = Sequential()

model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax')) # 2 because we have cat and dog classes
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

model.summary()

 

// 액티베이션은 relu 함수,  옵티마이저는 rmsprop

 

 

출력: 

 

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 126, 126, 32)      896       
_________________________________________________________________
batch_normalization_1 (Batch (None, 126, 126, 32)      128       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 63, 63, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 63, 63, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 61, 61, 64)        18496     
_________________________________________________________________
batch_normalization_2 (Batch (None, 61, 61, 64)        256       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 30, 30, 64)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 30, 30, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 28, 28, 128)       73856     
_________________________________________________________________
batch_normalization_3 (Batch (None, 28, 28, 128)       512       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 128)       0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 14, 14, 128)       0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               12845568  
_________________________________________________________________
batch_normalization_4 (Batch (None, 512)               2048      
_________________________________________________________________
dropout_4 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 1026      
=================================================================
Total params: 12,942,786
Trainable params: 12,941,314
Non-trainable params: 1,472

 

from keras.callbacks import EarlyStopping, ReduceLROnPlateau

 

earlystop = EarlyStopping(patience=10)

 

learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', 
                                            patience=2, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

 

callbacks = [earlystop, learning_rate_reduction]

df["category"] = df["category"].replace({0: 'cat', 1: 'dog'}) 

train_df, validate_df = train_test_split(df, test_size=0.20, random_state=42)
train_df = train_df.reset_index(drop=True)
validate_df = validate_df.reset_index(drop=True)

train_df['category'].value_counts().plot.bar()

 

//학습한 결과가 바 플롯으로 나타납니다. 

 

validate_df['category'].value_counts().plot.bar()

 

total_train = train_df.shape[0]
total_validate = validate_df.shape[0]
batch_size=15

 

 

train_datagen = ImageDataGenerator(
    rotation_range=15,
    rescale=1./255,
    shear_range=0.1,
    zoom_range=0.2,
    horizontal_flip=True,
    width_shift_range=0.1,
    height_shift_range=0.1
)

 

// 학습하는 내용입니다.  입력값을 0부터 1사잇값으로 rescale 정의 

 

 

train_generator = train_datagen.flow_from_dataframe(
    train_df, 
    "./catndog/train/train/", 
    x_col='filename',
    y_col='category',
    target_size=IMAGE_SIZE,
    class_mode='categorical',
    batch_size=batch_size
)

 

// fiename과 category를 축으로 만들어서 실험 

 

validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_dataframe(
    validate_df, 
    "./catndog/train/train/", 
    x_col='filename',
    y_col='category',
    target_size=IMAGE_SIZE,
    class_mode='categorical',
    batch_size=batch_size
)

 

//테스트 데이터 정의 

 

example_df = train_df.sample(n=1).reset_index(drop=True)
example_generator = train_datagen.flow_from_dataframe(
    example_df, 
    "./catndog/train/train/", 
    x_col='filename',
    y_col='category',
    target_size=IMAGE_SIZE,
    class_mode='categorical'
)

 

plt.figure(figsize=(12, 12))
for i in range(0, 15):
    plt.subplot(5, 3, i+1)
    for X_batch, Y_batch in example_generator:
        image = X_batch[0]
        plt.imshow(image)
        break
plt.tight_layout()
plt.show()

 

 

epochs=3 if FAST_RUN else 50  // epochs=3  if FAST_RUN else 5 로 만들어 주세요. 
history = model.fit_generator(
    train_generator, 
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=total_validate//batch_size,
    steps_per_epoch=total_train//batch_size,
    callbacks=callbacks
)

 

// epoch를 50번까지 실행함.  연산이 오래 걸립니다.

// 에포트 11번까지 갈 때 2시간 걸렸습니다.( GTX1050입니다.) GTX 3080 이상 아니면 에포크 횟수를 줄이세요. 

 

 

model.save_weights("model.h5")

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 12))
ax1.plot(history.history['loss'], color='b', label="Training loss")
ax1.plot(history.history['val_loss'], color='r', label="validation loss")
ax1.set_xticks(np.arange(1, epochs, 1))
ax1.set_yticks(np.arange(0, 1, 0.1))

ax2.plot(history.history['acc'], color='b', label="Training accuracy")
ax2.plot(history.history['val_acc'], color='r',label="Validation accuracy")
ax2.set_xticks(np.arange(1, epochs, 1))

legend = plt.legend(loc='best', shadow=True)
plt.tight_layout()
plt.show()

 

//중간에 에러가 생깁니다. acc 에러인데 해결해 보세요. 

 

test_filenames = os.listdir("./catndog/test1/test1")
test_df = pd.DataFrame({
    'filename': test_filenames
})
nb_samples = test_df.shape[0]

 

 

test_gen = ImageDataGenerator(rescale=1./255)
test_generator = test_gen.flow_from_dataframe(
    test_df, 
    "./catndog/test1/test1/", 
    x_col='filename',
    y_col=None,
    class_mode=None,
    target_size=IMAGE_SIZE,
    batch_size=batch_size,
    shuffle=False
)

 

 

결과: 

Found 12500 validated image filenames

predict = model.predict_generator(test_generator, steps=np.ceil(nb_samples/batch_size))

 

test_df['category'] = np.argmax(predict, axis=-1)

 

label_map = dict((v,k) for k,v in train_generator.class_indices.items())
test_df['category'] = test_df['category'].replace(label_map)

 

test_df['category'] = test_df['category'].replace({ 'dog': 1, 'cat': 0 })

 

test_df['category'].value_counts().plot.bar()

 

 

 

 

sample_test = test_df.head(18)
sample_test.head()
plt.figure(figsize=(12, 24))
for index, row in sample_test.iterrows():
    filename = row['filename']
    category = row['category']
    img = load_img("./catndog/test1/test1/"+filename, target_size=IMAGE_SIZE)
    plt.subplot(6, 3, index+1)
    plt.imshow(img)
    plt.xlabel(filename + '(' + "{}".format(category) + ')' )
plt.tight_layout()
plt.show()

 

 

 

 

 

 

submission_df = test_df.copy()
submission_df['id'] = submission_df['filename'].str.split('.').str[0]
submission_df['label'] = submission_df['category']
submission_df.drop(['filename', 'category'], axis=1, inplace=True)
submission_df.to_csv('submission.csv', index=False)

 

catndog.ipynb
3.06MB

 

 

 

소스 사이트 : 

 

 

https://www.kaggle.com/code/uysimty/keras-cnn-dog-or-cat-classification

https://www.kaggle.com/code/kanncaa1/convolutional-neural-network-cnn-tutorial

 

Convolutional Neural Network (CNN) Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from Digit Recognizer

www.kaggle.com

 

 

 

Convolutional Neural Network (CNN) Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from Digit Recognizer

www.kaggle.com