基于RNN的MNIST数据集分类实现

本文使用TensorFlow实现了基于RNN循环神经网络的MNIST手写数据集的分类。


1. 超参数定义

# 超参数定义(hyper parameters)
LEARNING_RATE = 0.001
TRAINING_ITER = 10000
BATCH_SIZE = 128

INPUT_DIMS = 28
HIDDEN_DIMS = 128
CLASSES_NUM = 10

TIME_STEPS = 28

对于以上超参数,其中:

INPUT_DIMS : 输入的维度,在本例中长度为28,即是图像的每一列的长度

HIDDEN_DIMS : RNN cell中的状态state的维度,即隐藏节点

CLASSES_NUM : 分类结果数,一共是10个分类

TIME_STEPS : 指的是整个序列的长度,这里设为每个图像的行数

2. 网络结构

    此处实现的为RNN循环神经网络的前向传播

# weights和bias初始化函数,用于多次调用
def weight_init(shape):
    return tf.Variable(tf.random_normal(shape=shape, stddev=0.1))


def bias_init(shape):  # 使bias的初始化为较小的值,但是不为零
    return tf.Variable(tf.zeros(shape=shape) + 0.01)


# 创建网络结构,进行前向传播
def rnn_inference(x):
    weight_xa = weight_init([INPUT_DIMS, HIDDEN_DIMS])
    bias_xa = bias_init([HIDDEN_DIMS])
    weight_ay = weight_init([HIDDEN_DIMS, CLASSES_NUM])
    bias_ay = bias_init([CLASSES_NUM])

    x = tf.reshape(x, [-1 , INPUT_DIMS])
    x_to_a = tf.matmul(x, weight_xa) + bias_xa
    x_to_a = tf.reshape(x_in, [-1, TIME_STEPS, HIDDEN_DIMS])

    rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=HIDDEN_DIMS)
    # state_init = rnn_cell.zero_state()
    # 可以不进行状态初始化
    output, last_state = tf.nn.dynamic_rnn(rnn_cell, x_to_a, dtype=tf.float32)
    # 这里可以选取output的最后一个输出来进行计算
    logits = tf.nn.softmax(tf.matmul(output[:, -1, :], weight_ay) + bias_ay , 1)
    return logits

以上需要注意的的是,每层的RNN cell是共用相同的参数weights和biases,和CNN网络中卷积核比较类似,所以在这我们只需要定义weight_xa, bisa_a以及weight_ay和bias_y;由RNN循环神经网络的原理可知,状态更新为:

\begin{align*} a^{<t>}&= f ( W_{f}*[a^{<t-1>},x^{<t>}]+b_{f} ) \\ &=f(a^{<t-1>}*W_{aa}+x^{<t>}*W_{xa} + b_f) \end{align*}

在本例中,选择状态量的维度为HIDDEN_DIMS,其中W_{f}又可以分为W_{xa},W_{aa},只需要我们自己定义W_{xa}即可,这里我的理解为TensorFlow自己已经实现内部变量的更新,则可以知道W_{xa}&=[\ input\_dims\ , \ hiddlen\_dims\ ],所以这里我们需要对输入的x进行reshape,大小为[batch\_size*image_rows\ ,\ input\_dims],然后对x\_to\_a=x^{<t>}*W_{xa},再将其reshape为TensorFlow中RNN cell的输入的格式即可,即为[\ batch\_size\ , time\_steps\ , \ hiddlen\_dims\ ]。具体的TensorFlow中的rnn_cell以及danamic_rnn()和rnn_cell.call()可以自行进行详细的了解。

输出为:

\begin{align*} y^{<t>}&= f ( W_{g}*a^{<t>}+b_{g} ) \end{align*}

 

3. 进行训练

# 训练函数
def train():
    # 数据读取
    mnist_data = input_data.read_data_sets("MNIST_data", one_hot=True)
    print(mnist_data.train.images.shape)

    # 创建输入和训练标签占位符
    x_in = tf.placeholder(dtype=tf.float32, shape=[None, 784], name="x_inputs")
    y_label = tf.placeholder(dtype=tf.float32, shape=[None, CLASSES_NUM], name="y_labels")

    y_out = lstm_inference(x_in)
    # 使用交叉熵目标函数
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_label, logits=y_out))
    correct_pred = tf.equal(tf.argmax(y_label, 1), tf.argmax(y_out, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    # 使用Adam法进行优化
    optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(loss)

    init_op = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init_op)
        for i in range(TRAINING_ITER):
            batch_x, batch_label = mnist_data.train.next_batch(BATCH_SIZE)
            sess.run(
                optimizer,
                feed_dict={
                    x_in: batch_x,
                    y_label: batch_label
                })
            if i % 100 == 0:
                print("steps {0} , loss : {1} , accuracy : {2}".format(str(i), str(loss.eval(
                    feed_dict={
                        x_in: batch_x,
                        y_label: batch_label
                    })), str(accuracy.eval(
                    feed_dict={
                        x_in: mnist_data.test.images,
                        y_label: mnist_data.test.labels
                    }
                    ))))

 

4. 训练结果

steps 0 , loss : 2.299897 , accuracy : 0.1486
steps 100 , loss : 1.7092526 , accuracy : 0.7546
steps 200 , loss : 1.5534915 , accuracy : 0.9055
steps 300 , loss : 1.5672985 , accuracy : 0.9162
steps 400 , loss : 1.5232389 , accuracy : 0.9295
steps 500 , loss : 1.4930712 , accuracy : 0.9482
steps 600 , loss : 1.520176 , accuracy : 0.9547
steps 700 , loss : 1.497025 , accuracy : 0.9545
steps 800 , loss : 1.4983183 , accuracy : 0.9562
steps 900 , loss : 1.5114753 , accuracy : 0.9644
steps 1000 , loss : 1.498097 , accuracy : 0.9687
steps 1100 , loss : 1.4864463 , accuracy : 0.9672
steps 1200 , loss : 1.5137687 , accuracy : 0.9637
steps 1300 , loss : 1.4695554 , accuracy : 0.9712
steps 1400 , loss : 1.4954376 , accuracy : 0.9706
steps 1500 , loss : 1.4733143 , accuracy : 0.9724
steps 1600 , loss : 1.4818668 , accuracy : 0.9722
steps 1700 , loss : 1.4771671 , accuracy : 0.9669
steps 1800 , loss : 1.4712167 , accuracy : 0.9741
steps 1900 , loss : 1.4700866 , accuracy : 0.9742
steps 2000 , loss : 1.4893398 , accuracy : 0.9721
steps 2100 , loss : 1.48666 , accuracy : 0.9742
steps 2200 , loss : 1.4840587 , accuracy : 0.9748
steps 2300 , loss : 1.488863 , accuracy : 0.9765
steps 2400 , loss : 1.4716485 , accuracy : 0.9735
steps 2500 , loss : 1.4644128 , accuracy : 0.9749

Process finished with exit code -1

5. 结论

在实验结果中我们可以看到,收敛速度很快,大概进行1500次的梯度下降就可以达到97%的正确率,但是在进行了10000次的迭代之后最终的分类正确率大概只能到98%左右,和CNN和99.2%还是有不小的差距,改进如下:1).可以使用LSTM长短期记忆进行改进;2)使用多层RNN,即深度循环神经网络。


参考文献


附录

使用LSTM

def lstm_inference(x):
    weight_xa = weight_init([INPUT_DIMS, HIDDEN_DIMS])
    bias_xa = bias_init([HIDDEN_DIMS])
    weight_ay = weight_init([HIDDEN_DIMS, CLASSES_NUM])
    bias_ay = bias_init([CLASSES_NUM])

    x = tf.reshape(x, [-1, INPUT_DIMS])
    x_in = tf.matmul(x, weight_xa) + bias_xa

    x_in = tf.reshape(x_in, [-1, TIME_STEPS, HIDDEN_DIMS])
    # 定义LSTM cell
    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(HIDDEN_DIMS)
    output, last_state = tf.nn.dynamic_rnn(lstm_cell, x_in, dtype=tf.float32)
    # 取output的最后一cell的输出
    logits = tf.nn.softmax(tf.matmul(output[:, -1, :], weight_ay) + bias_ay, 1)
    return logits

如有错误,欢迎指出,十分感谢


更多精彩内容