ZHANG Jieqing, GUO Min*, XIAO Bing
(School of Computer Science, Shaanxi Normal University, Xi′an 710119, Shaanxi, China)
Abstract:
The method based on CNN and LSTM is currently the mainstream method of image description. Although this method has made great progress in image description, it still has problems such as high computational complexity, slow convergence speed and long training time. To solve these problems, an image description model based on GoogLeNet and double-layer GRU is proposed. The Adam optimization algorithm is used in the training stage to accelerate the overall model′s convergence rate and improve the model performance. Experimental results on the two datasets of MSCOCO and Flickr30K show that the image description model based on GoogLeNet and double-layer GRU has better experimental results than the commonly used image description model. The generated sentence is more accurate and exceeds other currently used image description models in multiple evaluation indicators.
KeyWords:
image description; GoogLeNet; gated recurrent unit(GRU); adaptive moment estimation