基于GoogLeNet和双层GRU的图像描述-陕西师范大学学报期刊社网站

陕西师范大学学报（自然科学版）

模式识别与智能控制专题

基于GoogLeNet和双层GRU的图像描述

张洁庆，郭敏*，肖冰

（陕西师范大学计算机科学学院，陕西西安 710119）

郭敏，女，教授，博士生导师，主要从事数字信号处理及模式识别研究。E-mail: guominmail@sina.com.cn

摘要:

针对基于卷积神经网络（convolutional neural network,CNN)和长短期记忆网络（long short-term memory,LSTM)的方法存在计算复杂度高、收敛速度慢、训练时间长等问题，本文提出基于GoogLeNet和双层GRU的图像描述模型，在训练阶段采用适应性动量估计法(adaptive moment estimation, Adam)优化算法，加快了整体模型的收敛速率，提高了模型性能。在MSCOCO和Flickr30K两个数据集上的实验结果表明，基于GoogLeNet和双层GRU的图像描述模型实验效果优于目前常用的图像描述模型，生成的句子准确度更高，在多个评价指标上超过了其他常用图像描述模型。

关键词：

图像描述；GoogLeNet；门限递归单元；适应性动量估计法

收稿日期：

2019-10-15

中图分类号：

TP391

文献标识码：

文章编号：

1672-4291(2021)01-0068-06

基金项目：

国家自然科学基金(61401265)

Doi:

Image description based on GoogLeNet and double-layer GRU

ZHANG Jieqing, GUO Min*, XIAO Bing

(School of Computer Science， Shaanxi Normal University， Xi′an 710119, Shaanxi, China）

Abstract:

The method based on CNN and LSTM is currently the mainstream method of image description. Although this method has made great progress in image description, it still has problems such as high computational complexity, slow convergence speed and long training time. To solve these problems, an image description model based on GoogLeNet and double-layer GRU is proposed. The Adam optimization algorithm is used in the training stage to accelerate the overall model′s convergence rate and improve the model performance. Experimental results on the two datasets of MSCOCO and Flickr30K show that the image description model based on GoogLeNet and double-layer GRU has better experimental results than the commonly used image description model. The generated sentence is more accurate and exceeds other currently used image description models in multiple evaluation indicators.

KeyWords:

image description; GoogLeNet; gated recurrent unit(GRU); adaptive moment estimation