In Artificial Intelligence (AI), image content can be produced automatically which entails computer vision and Natural Language Processing. A constructive neural model, which depends on computer vision and machine translation, is designed in this work. Natural sentences describing the image are produced in speech form by this model. Visual Geometry Group (VGG16), a pre-trained Convolution Neural Network(CNN)model, Long Short Term Memory(LSTM), which is an extension of Recurrent Neural Network(RNN),and Google Text to Speech(gTTS) are part of this proposed model. Feature extraction from an image is carried out using CNN and sentence generation is carried out using RNN and output voice is generated by gTTS. The model generates captions for image description in voice form. The model is tested using the Flickr_8k dataset. Experimental results show that the model is producing approximately correct image descriptions frequently for a given input image.
Volume 12 | Issue 2