理解 CNN 卷积层的工作原理

本文通过 TensorFlow 深度学习框架演示 CNN 卷积层的工作原理。首先，引入相关依赖并查看 TensorFlow 的版本号。

In [1]:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models

In [2]:

tf.__version__

Out[2]:

'2.7.4'

单通道 2D 卷积

定义权值矩阵

In [3]:

W = np.array([[0, 0, -1], [0, 1, 0], [-2, 0, 2]], dtype=np.int32)
W

Out[3]:

array([[ 0,  0, -1],
       [ 0,  1,  0],
       [-2,  0,  2]], dtype=int32)

定义偏置项

In [4]:

b = np.array([1], dtype=np.int32)
b

Out[4]:

array([1], dtype=int32)

定义一个 Sequential 模型，它只包含一个 2D 卷积层，并使用 W 和 b 初始化模型层参数。为了输出模型的 summary，输入了 input_shape，可以看到总的参数量为 10，这正好是 W 和 b 的参数总和。

In [5]:

model = models.Sequential(
    [
        layers.Conv2D(
            input_shape=(7, 7, 1),
            filters=1,
            kernel_size=[3, 3],
            kernel_initializer=tf.constant_initializer(W),
            bias_initializer=tf.constant_initializer(b),
        )
    ]
)
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param#
=================================================================
 conv2d (Conv2D)             (None, 5, 5, 1)           10

=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________

2024-10-10 16:21:45.159123: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

测试卷积运算过程

模拟一张单通道图片，例如灰度图。在传入 model 之前，需要进行维度扩展，因为所需的最低维度是 4，第一维是批次 (batch)，最后一维是通道。

In [6]:

image = np.array(
    [
        [0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 1, 2, 1, 0],
        [0, 0, 2, 2, 0, 1, 0],
        [0, 1, 1, 0, 2, 1, 0],
        [0, 0, 2, 1, 1, 0, 0],
        [0, 2, 1, 1, 2, 0, 0],
        [0, 0, 0, 0, 0, 0, 0],
    ],
    dtype=np.int32,
)
image.shape

Out[6]:

(7, 7)

In [7]:

image = np.expand_dims(image, axis=0)
image.shape

Out[7]:

(1, 7, 7)

In [8]:

image = np.expand_dims(image, axis=-1)
image.shape

Out[8]:

(1, 7, 7, 1)

传入 model，执行一次卷积操作；将得到的结果进行维度压缩，以便于展示。

In [9]:

output = model(image)
tf.squeeze(output)

Out[9]:

<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 6.,  5., -2.,  1.,  2.],
       [ 3.,  0.,  3.,  2., -2.],
       [ 4.,  2., -1.,  0.,  0.],
       [ 2.,  1.,  2., -1., -3.],
       [ 1.,  1.,  1.,  3.,  1.]], dtype=float32)>

三通道卷积

假设图片是彩色的，例如有 RGB 三通道，现在考虑一个卷积层，有 2 个卷积核，大小是 $3 \times 3$ ，因此参数量为 $(3 \times 3 \times 3 + 1) \times 2 = 56$ 。由于有 2 个卷积核，所以卷积之后，最后一维的大小为 2。

In [10]:

model = models.Sequential(
    [layers.Conv2D(input_shape=(7, 7, 3), filters=2, kernel_size=[3, 3])]
)
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param#
=================================================================
 conv2d_1 (Conv2D)           (None, 5, 5, 2)           56

=================================================================
Total params: 56
Trainable params: 56
Non-trainable params: 0
_________________________________________________________________

理解 CNN 卷积层的工作原理

单通道 2D 卷积

测试卷积运算过程

三通道卷积

相关推荐