理解 CNN 卷积层的工作原理

本文通过 TensorFlow 深度学习框架演示 CNN 卷积层的工作原理。首先,引入相关依赖并查看 TensorFlow 的版本号。

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
In [2]:
tf.__version__
Out[2]:
'2.7.4'

单通道 2D 卷积

定义权值矩阵

In [3]:
W = np.array([[0, 0, -1], [0, 1, 0], [-2, 0, 2]], dtype=np.int32)
W
Out[3]:
array([[ 0,  0, -1],
       [ 0,  1,  0],
       [-2,  0,  2]], dtype=int32)

定义偏置项

In [4]:
b = np.array([1], dtype=np.int32)
b
Out[4]:
array([1], dtype=int32)

定义一个 Sequential 模型,它只包含一个 2D 卷积层,并使用 Wb 初始化模型层参数。为了输出模型的 summary,输入了 input_shape,可以看到总的参数量为 10,这正好是 Wb 的参数总和。

In [5]:
model = models.Sequential(
    [
        layers.Conv2D(
            input_shape=(7, 7, 1),
            filters=1,
            kernel_size=[3, 3],
            kernel_initializer=tf.constant_initializer(W),
            bias_initializer=tf.constant_initializer(b),
        )
    ]
)
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param#
=================================================================
 conv2d (Conv2D)             (None, 5, 5, 1)           10

=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
2024-10-10 16:21:45.159123: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

测试卷积运算过程

模拟一张单通道图片,例如灰度图。在传入 model 之前,需要进行维度扩展,因为所需的最低维度是 4,第一维是批次 (batch),最后一维是通道。

In [6]:
image = np.array(
    [
        [0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 1, 2, 1, 0],
        [0, 0, 2, 2, 0, 1, 0],
        [0, 1, 1, 0, 2, 1, 0],
        [0, 0, 2, 1, 1, 0, 0],
        [0, 2, 1, 1, 2, 0, 0],
        [0, 0, 0, 0, 0, 0, 0],
    ],
    dtype=np.int32,
)
image.shape
Out[6]:
(7, 7)
In [7]:
image = np.expand_dims(image, axis=0)
image.shape
Out[7]:
(1, 7, 7)
In [8]:
image = np.expand_dims(image, axis=-1)
image.shape
Out[8]:
(1, 7, 7, 1)

传入 model,执行一次卷积操作;将得到的结果进行维度压缩,以便于展示。

In [9]:
output = model(image)
tf.squeeze(output)
Out[9]:
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 6.,  5., -2.,  1.,  2.],
       [ 3.,  0.,  3.,  2., -2.],
       [ 4.,  2., -1.,  0.,  0.],
       [ 2.,  1.,  2., -1., -3.],
       [ 1.,  1.,  1.,  3.,  1.]], dtype=float32)>

三通道卷积

假设图片是彩色的,例如有 RGB 三通道,现在考虑一个卷积层,有 2 个卷积核,大小是 ,因此参数量为 。由于有 2 个卷积核,所以卷积之后,最后一维的大小为 2。

In [10]:
model = models.Sequential(
    [layers.Conv2D(input_shape=(7, 7, 3), filters=2, kernel_size=[3, 3])]
)
model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param#
=================================================================
 conv2d_1 (Conv2D)           (None, 5, 5, 2)           56

=================================================================
Total params: 56
Trainable params: 56
Non-trainable params: 0
_________________________________________________________________

相关推荐