Article citationsMore >>

Yang Xiaochun, Hou Jixiang, Zheng Han, Wang Bin. An image description generation method based on multiple attention mechanisms and external knowledge [P]. Liaoning Province: CN112784848A, 2021-05-11.

has been cited by the following article:

Article

Design of a Picture-seeing and Talking System Based on Attention Mechanism

1College of Electrical and Automation Engineering, Nanjing Normal University,Nanjing,China


American Journal of Electrical and Electronic Engineering. 2022, Vol. 10 No. 1, 6-23
DOI: 10.12691/ajeee-10-1-2
Copyright © 2022 Science and Education Publishing

Cite this paper:
Kexin Ye, Lei Zhang, Tianran Li. Design of a Picture-seeing and Talking System Based on Attention Mechanism. American Journal of Electrical and Electronic Engineering. 2022; 10(1):6-23. doi: 10.12691/ajeee-10-1-2.

Correspondence to: Kexin  Ye, College of Electrical and Automation Engineering, Nanjing Normal University,Nanjing,China. Email: 947985637@qq.com

Abstract

In order to solve this problem, this paper proposes an image title generation model based on deep loop architecture. This model combines some new achievements in computer vision and machine translation, and can generate natural sentences that accurately describe the image for an uncomplicated physical image. The model is trained to maximize the accuracy of the target description sentence in a given training image. The training data set was mainly completed by the MSCOCO data set, and in the later adjustment stage, some feature pictures I specifically looked for on the Internet were included for improvement. Test experiments on several data sets verify that the model has the ability of basic accurate image description. This model is usually more accurate in the case of uncomplicated physical pictures, which I have verified both qualitatively and quantitatively. The final result can input a qualified image and output a natural language sentence to describe the main content of the image.

Keywords