SegNet 详解与 TensorFlow 2.0 实现

tfwechat · February 18, 2021, 2:27pm

本文来自社区投稿与征集，作者 AI 菌。转自：https://github.com/Keyird/

引言

今天为大家带来社区作者的精选推荐《深度学习-＞语义分割实战（一）：SegNet 详解与 TensorFlow 2.0 实现》。CSDN 博客专家 @AI 菌从 SegNet 算法着手，带大家使用 TensorFlow 2.0 搭建 SegNet 网络，对场景中的目标进行分割。

本次我依然以理论和实战两部分展开，首先理论部分对 SegNet 算法进行必要的讲解，然后在实战部分，使用 TensorFlow 2.0 框架搭建 SegNet 网络，实现对场景中的目标（矿堆）进行分割。分割结果如下：

一、SegNet 算法详解

1. SegNet 简介

早在 2015 年，Vijay Badrinarayanan, Alex Kendall 等人就提出了 SegNet 算法，这是一种用于语义像素级分割的深度全卷积神经网络结构。它主要是由一个编码器网络、一个对应的解码网络和一个像素级分类层组成。

SegNet 的新颖之处在于解码器对其低分辨率输入特征映射进行上采样的方式。具体地说，解码器使用在对应编码器的最大池化步骤中计算的池索引来执行非线性上采样。SegNet 的主要针对场景理解应用，SegNet 的可训练的参数量比其它的网络结构显著减少，并且它可以通过随机梯度下降算法进行端对端地训练。经评估表明，与其他体系结构相比，SegNet 在推理过程中，具有时间和内存方面的良好性能。

2. SegNet 网络结构

(1) 整体结构。如下图所示，SegNet 网络由一个编码器网络、一个对应的解码器网络和一个像素级分类层组成。其中，编码器网络采用的是 VGG 进行特征提取；解码器网络主要进行 3 次分线性上采样；像素级分类层，通过一个卷积层将网路输出调整为我们所需的输出。

来源：SegNet: A Deep ConvolutionalEncoder-Decoder Architecture for ImageSegmentation

(2) 编码器网络。详细来说，编码器网络采用的是 VGG16 的前面 13 个卷积层进行提取特征，并且送入到解码器中的是 VGG16 第 4 个卷积块 Conv_block 输出的特征矩阵 feature。由于原图的 shape 是 (416,416,3)的，经过 4 个卷积块后（相当于进行了 4 次下采样），编码器输出的 feature 的 shape 是 (26,26,3)

(3) 解码器网络。解码实际上是上采样的过程，SegNet 的新颖之处在于它的上采样方式。具体做法如下图所示，解码器使用在对应编码器的最大池化步骤中计算的池索引来执行非线性上采样。

SegNet: A Deep ConvolutionalEncoder-Decoder Architecture for ImageSegmentation

(4) 像素级分类层。这一层主要是为像素级分类而设计。使用卷积层来改变解码器网络输出张量的通道数。比如我们要进行 n_class 分类，那么通过卷积层的输出 shape 就要变为 (208,208,n_class)

3. SegNet 实验效果

(1) 在 CamVid 数据集上白天和黑夜样本下的测试效果图。很明显，SegNet 的测试结果更接近 Ground Truth。因此，SegNet 在该数据集上的表现更好。

(2) 下面两张表展示了 SegNet 在 CamVid 和 SUNRGB-D 数据集上的表现，可见 SegNet 的整体精度优于其它的分割网络。

(3) 从下表可以看出，SegNet 在保持精度不错的情况下，在推理时间和占用内存仍有较好的优势。

二、用 TF2.0 搭建 SegNet 进行语义分割

下面仅对本项目的核心代码进行讲解。我已经将完整代码上传至我的 GitHub 地址：需要的可自行下载，欢迎 star！

1. 数据集准备

语义分割数据集以及标签的制作过程可参考：labelme 安装以及使用教程—自制语义分割数据集数据集制作完成后，要通过 make_txt 文件保存数据集所有图片和对应标签的文件名。代码如下：

# coding:utf-8
import os

imgs_path = '/home/fmc/WX/Segmentation/SegNet-tf2/dataset/jpg'  # 图片文件存放地址
for files in os.listdir(imgs_path):
    print(files)
    image_name = files + ';' + files[:-4] + '.png'


    with open("train.txt", "a") as f:
        f.write(str(image_name) + '\n')
f.close()

2. 网络结构搭建

(1) 编码器网络

def vggnet_encoder(input_height=416, input_width=416, pretrained='imagenet'):

    img_input = tf.keras.Input(shape=(input_height, input_width, 3))

    # 416,416,3 -> 208,208,64
    x = layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
    x = layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
    f1 = x

    # 208,208,64 -> 128,128,128
    x = layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
    x = layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
    f2 = x

    # 104,104,128 -> 52,52,256
    x = layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
    x = layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
    x = layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
    f3 = x

    # 52,52,256 -> 26,26,512
    x = layers.Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
    x = layers.Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
    x = layers.Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
    f4 = x

    # 26,26,512 -> 13,13,512
    x = layers.Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
    x = layers.Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
    x = layers.Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
    f5 = x

    return img_input, [f1, f2, f3, f4, f5]

(2) 解码器网络与像素级分类层

# 解码器
def decoder(feature_input, n_classes, n_upSample):
    # feature_input是vggnet第四个卷积块的输出特征矩阵
    # 26,26,512
    output = (layers.ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(feature_input)
    output = (layers.Conv2D(512, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(output)
    output = (layers.BatchNormalization())(output)

    # 进行一次UpSampling2D，此时hw变为原来的1/8
    # 52,52,256
    output = (layers.UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(output)
    output = (layers.ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(output)
    output = (layers.Conv2D(256, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(output)
    output = (layers.BatchNormalization())(output)

    # 进行一次UpSampling2D，此时hw变为原来的1/4
    # 104,104,128
    for _ in range(n_upSample - 2):
        output = (layers.UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(output)
        output = (layers.ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(output)
        output = (layers.Conv2D(128, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(output)
        output = (layers.BatchNormalization())(output)

    # 进行一次 UpSampling2D，此时 hw 变为原来的 1/2
    # 208,208,64
    output = (layers.UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(output)
    output = (layers.ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(output)
    output = (layers.Conv2D(64, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(output)
    output = (layers.BatchNormalization())(output)
# 像素级分类层
    # 此时输出为h_input/2,w_input/2,nclasses
    # 208,208,2
    output = layers.Conv2D(n_classes, (3, 3), padding='same', data_format=IMAGE_ORDERING)(output)

    return output

(3) 整体结构

# 语义分割网络SegNet
def SegNet(input_height=416, input_width=416, n_classes=2, n_upSample=3, encoder_level=3):

    img_input, features = vggnet_encoder(input_height=input_height, input_width=input_width)
    feature = features[encoder_level]  # (26,26,512)
    output = decoder(feature, n_classes, n_upSample)

    # 将结果进行reshape
    output = tf.reshape(output, (-1, int(input_height / 2) * int(input_width / 2), 2))
    output = layers.Softmax()(output)

    model = tf.keras.Model(img_input, output)

    return model

3. 模型的装配与训练

(1) 模型的装配

model.compile(loss=loss_function,  # 交叉熵损失函数
                  optimizer=optimizers.Adam(lr=1e-3),  # 优化器
                  metrics=['accuracy'])  # 评价标准

(2) 模型的训练

# 开始训练
model.fit_generator(generate_arrays_from_file(lines[:num_train], batch_size),  # 训练集
                        steps_per_epoch=max(1, num_train // batch_size),  # 每一个epos的steps数
                        validation_data=generate_arrays_from_file(lines[num_train:], batch_size),  # 验证集
                        validation_steps=max(1, num_val // batch_size),
                        epochs=50,
                        initial_epoch=0,
                        callbacks=[checkpoint_period, reduce_lr, early_stopping])  # 回调

4. 测试效果

对存放在 img_test 文件下的图片一一进行测试，并将语义分割后的结果存放在 img_out 文件里。

for jpg in imgs:

    img = Image.open("./img_test/"+jpg)
    old_img = copy.deepcopy(img)
    orininal_h = np.array(img).shape[0]
    orininal_w = np.array(img).shape[1]

    img = img.resize((WIDTH,HEIGHT))
    img = np.array(img)
    img = img/255
    img = img.reshape(-1,HEIGHT,WIDTH,3)

    pr = model.predict(img)[0]
    pr = pr.reshape((int(HEIGHT/2), int(WIDTH/2), NCLASSES)).argmax(axis=-1)

    seg_img = np.zeros((int(HEIGHT/2), int(WIDTH/2), 3))
    colors = class_colors

    for c in range(NCLASSES):
        seg_img[:,:,1] += ((pr[:,: ] == c )*( colors[c][1] )).astype('uint8')

    # Image.fromarray将数组转换成image格式
    seg_img = Image.fromarray(np.uint8(seg_img)).resize((orininal_w, orininal_h))
    # 将两张图片合成一张图片
    image = Image.blend(old_img, seg_img, 0.3)
    image.save("./img_out/"+jpg)

最后测试得到的效果如下：

— 资源传送门 —

SegNet 论文详解：深度学习—语义分割(1)：SegNet论文详解

中文：TensorFlow 公众号