resnet50结构_ResNet50做图⽚分类
在lifelong⽐赛上下载了图⽚数据集,⽬标是将不同光照下不同视⾓物体的分类,每张图⽚只含有⼀种类别,⼀共有51个类别(有⼑、订书机、杯⼦、勺⼦等),所以想到了⽤ResNet50做图⽚分类,顺便学习ResNet的背后原理。
论⽂阅读:Residual learning
部分图⽚展⽰
在ResNet之前
理论上,加深神经⽹络层数之后,⽹络应该可以对更为复杂的特征进⾏提取,但是实验的结果是发现⽹络会出现退化问题(degradation
退化问题(degradation problem):⽹络深度增加时,⽹络的训练问题反⽽上升了。
problem)
图⽚摘⾃论⽂Deep Residual Learning for Image Recognition
残差学习
如果想继续堆积新层来建⽴深层⽹络,⼀个极端的情况就是增加的层什么也不学习,做⼀个恒等映射(identity mapping)
恒等映射(identity mapping)。残差学习
短路连接(shortcut connection),只学习残差项,因为残差学习⽐较原始特征学习更提出了⼀个结构,相⽐之前的结构引⼊了⼀个短路连接(shortcut connection)
为容易,如果学习到的残差值为0,就相当于做了⼀个恒等映射,⾄少⽹络的性能不会下降。
a building block
残差学习到的内容⽐较少,学习难度⼩,从数学⾓度分析:
其中
分别表⽰的是第
个残差单元的输⼊和输出,注意每个残差单元⼀般包含多层结构。
是残差函数,表⽰学习到的残差,⽽
表⽰恒等映射
是ReLU激活函数。基于上式,我们求得从浅层
到深层
的学习特征为:
利⽤链式规则,可以求得反向过程的梯度
式⼦的第⼀个因⼦
表⽰的损失函数到达
的梯度,⼩括号中的1表明短路机制可以⽆损地传播梯度,⽽另外⼀项残差梯度则需要经过带有weights的层,梯度不是直接传递过来的。
残差梯度不会那么巧全为-1,⽽且就算其⽐较⼩,有1的存在也不会导致梯度消失。所以残差学习会更容易。要注意
上⾯的推导并不是严格的证明。
上⾯的推导并不是严格的证明
你必须要知道CNN模型:ResNet - ⼩⼩将的⽂章 - 知乎
以上的内容摘⾃(有略)
以上的内容摘⾃(有略):你必须要知道CNN模型:ResNet - ⼩⼩将的⽂章 - 知乎
(有空再补全⼀下ResNet背后的数学推导)
对ResNet本质的⼀些思考 - 黄⼆⼆的⽂章 - 知乎
也可以参考⼀下⼤神对ResNet的另⼀种⾓度的解读:对ResNet本质的⼀些思考 - 黄⼆⼆的⽂章 - 知乎
⽹络框架
回音ResNet直接ResNet⽹络是参考了VGG19⽹络,在其基础上进⾏了修改,并通过短路机制加⼊了残差单元,如图5所⽰。变化主要体现在ResNet直接
当feature map⼤⼩使⽤stride=2的卷积做下采样
使⽤stride=2的卷积做下采样,并且⽤global average pool层替换了全连接层。ResNet的⼀个重要设计原则是:当feature map⼤⼩
降低⼀半时,feature map的数量增加⼀倍,这保持了⽹络层的复杂度。从图5中可以看到,ResNet相⽐普通⽹络每两层间增加了短路降低⼀半时,feature map的数量增加⼀倍
机制,这就形成了残差学习,其中虚线表⽰feature map数量发⽣了改变。图5展⽰的34-layer的ResNet,还可以构建更深的⽹络如表1所
⽰。从表中可以看到,对于18-layer和34-layer的ResNet,其进⾏的两层间的残差学习,当⽹络更深时,其进⾏的是三层间的残差学习,
三层卷积核分别是1x1,3x3和1x1,⼀个值得注意的是隐含层的feature map数量是⽐较⼩的,并且是输出feature map数量的1/4。
江湖写照ResNet结构
不同的残差单元
代码块:
1 卷积块
def conv_op(x, name, n_out, training, useBN, kh=3, kw=3, dh=1, dw=1, padding="SAME", lu):
'''
x: 输⼊
kh,kw: 卷集核的⼤⼩
n_out:输出的通道数
从头再来伴奏
dh,dw: strides⼤⼩
name: op的名字
'''
唐门诀
n_in = x.get_shape()[-1].value
with tf.name_scope(name) as scope:
w = tf.get_variable(scope + "w", shape=[kh, kw, n_in, n_out], dtype=tf.float32,
ib.layers.xavier_initializer_conv2d())
b = tf.get_variable(scope + "b", shape=[n_out], dtype=tf.float32,
stant_initializer(0.01))
conv = v2d(x, w, [1, dh, dw, 1], padding=padding)
z = tf.nn.bias_add(conv, b)
if useBN:
z = tf.layers.batch_normalization(z, trainable=training)
if activation:
z = activation(z)
return z
2 最⼤池化层以及平均池化层
def max_pool_op(x, name, kh=2, kw=2, dh=2, dw=2, padding="SAME"):
ax_pool(x,
ksize=[1, kh, kw, 1],
strides=[1, dh, dw, 1],
padding=padding,
name=name)
def avg_pool_op(x, name, kh=2, kw=2, dh=2, dw=2, padding="SAME"):
avg_pool(x,
ksize=[1, kh, kw, 1],
strides=[1, dh, dw, 1],
padding=padding,
name=name)
3 全连接层
def fc_op(x, name, n_out, lu):
n_in = x.get_shape()[-1].value
with tf.name_scope(name) as scope:
w = tf.get_variable(scope + "w", shape=[n_in, n_out],
dtype=tf.float32,
ib.layers.xavier_initializer())
b = tf.get_variable(scope + "b", shape=[n_out], dtype=tf.float32,
stant_initializer(0.01))
fc = tf.matmul(x, w) + b
out = activation(fc)
return fc, out
做分类的时候,最后接的是⼀个全连接层,然后得到的是 [batch size, class_number] 的概率矩阵,这个结果是需要跟ground truth进⾏
⽤到的是fc这个结果(⽤out矩阵去与ground truth⽐较反⽽训练的误差降不下⽐较得到最终的loss的,但这⾥不需要⽤到out这个结果,⽤到的是fc这个结果
去)
4 res block
把上⾯的各个⼦块写好,就组建res ⼦块(就上⾯图的残差单元)
def res_block_layers(x, name, n_out_list, change_dimension=False, block_stride=1):
if change_dimension:
short_cut_conv = conv_op(x, name + "_ShortcutConv", n_out_list[1], training=True, useBN=True, kh=1, kw=1,
dh=block_stride, dw=block_stride,
padding="SAME", activation=None)
小叮铛else:
short_cut_conv = x
block_conv_1 = conv_op(x, name + "_lovalConv1", n_out_list[0], training=True, useBN=True, kh=1, kw=1,
dh=block_stride, dw=block_stride,
padding="SAME", lu)
block_conv_2 = conv_op(block_conv_1, name + "_lovalConv2", n_out_list[0], training=True, useBN=True, kh=3, kw=3,
dh=1, dw=1,
padding="SAME", lu)
block_conv_3 = conv_op(block_conv_2, name + "_lovalConv3", n_out_list[1], training=True, useBN=True, kh=1, kw=1,
dh=1, dw=1,
padding="SAME", activation=None)
block_res = tf.add(short_cut_conv, block_conv_3)
res = lu(block_res)
return res
5 ResNet搭建
def bulid_resNet(x, num_class, training=True, usBN=True):
conv1 = conv_op(x, "conv1", 64, training, usBN, 3, 3, 1, 1)
pool1 = max_pool_op(conv1, "pool1", kh=3, kw=3)
block1_1 = res_block_layers(pool1, "block1_1", [64, 256], True, 1)
block1_2 = res_block_layers(block1_1, "block1_2", [64, 256], False, 1)等一分钟原唱
block1_3 = res_block_layers(block1_2, "block1_3", [64, 256], False, 1)
block2_1 = res_block_layers(block1_3, "block2_1", [128, 512], True, 2)
block2_2 = res_block_layers(block2_1, "block2_2", [128, 512], False, 1)
block2_3 = res_block_layers(block2_2, "block2_3", [128, 512], False, 1)
block2_4 = res_block_layers(block2_3, "block2_4", [128, 512], False, 1)
block3_1 = res_block_layers(block2_4, "block3_1", [256, 1024], True, 2)
block3_2 = res_block_layers(block3_1, "block3_2", [256, 1024], False, 1)
block3_3 = res_block_layers(block3_2, "block3_3", [256, 1024], False, 1)
block3_4 = res_block_layers(block3_3, "block3_4", [256, 1024], False, 1)
block3_5 = res_block_layers(block3_4, "block3_5", [256, 1024], False, 1)
block3_6 = res_block_layers(block3_5, "block3_6", [256, 1024], False, 1)
block4_1 = res_block_layers(block3_6, "block4_1", [512, 2048], True, 2)
block4_2 = res_block_layers(block4_1, "block4_2", [512, 2048], False, 1)
block4_3 = res_block_layers(block4_2, "block4_3", [512, 2048], False, 1)
pool2 = avg_pool_op(block4_3, "pool2", kh=7, kw=7, dh=1, dw=1, padding="SAME")
shape = _shape()
fc_in = tf.reshape(pool2, [-1, shape[1].value * shape[2].value * shape[3].value])
logits, prob = fc_op(fc_in, "fc1", num_class, softmax)
# 需要进⼊损失函数的是没有经过激活函数的logits
return logits, prob
6 训练过程的搭建
def training_pro():
train_data_path, train_label = loadCSVfile(train_path)      # 加载图⽚的路径和图⽚的label
batch_index = []
# 将训练数据分batch
for i in range(train_data_path.shape[0]):
if i % batch_size == 0:
batch_index.append(i)
if batch_index[-1] is not train_data_path.shape[0]:
batch_index.append(train_data_path.shape[0])
input = tf.placeholder(dtype=tf.float32, shape=[None, img_size, img_size, channel], name="input")
# output = tf.placeholder(dtype=tf.float32, shape=[None, num_classes], name="output")
output = tf.placeholder(dtype=tf.int64, shape=[None], name="output")
# 将label值进⾏onehot编码
one_hot_labels = tf.one_hot(indices=tf.cast(output, tf.int32), depth=51)
# 需要传⼊到softmax_cross_entropy_with_logits的是没有经过激活函数的y_pred
y_pred, _ = bulid_resNet(input, num_classes)
y_pred = tf.reshape(y_pred, shape=[-1, num_classes])
tf.add_to_collection('output_layer', y_pred)
# loss = tf.reduce_softmax_cross_entropy_with_logits(logits=y_pred, labels=output))
loss = tf.reduce_sigmoid_cross_entropy_with_logits(logits=y_pred, labels=one_hot_labels))    # 该api,做了三件事⼉ 1. y_ -> softmax 2. y -> one_hot 3. loss = ylogy
tf.summary.scalar('loss', loss)
# 这⼀段是为了得到accuracy,⾸先是得到数值最⼤的索引
# 准确度