ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

    xiaoxiao2021-03-25  32

      VQA的相关应用及其挑战:   VQA is of great importance to many applications, including image retrieval, early education, and navigation for blind people as it provides user-specific information through the understanding of both the natural language questions and image content.VQA is a highly challenging problem as it requires the machine to understand natural language queries, extract semantic contents from images, and relate them in a unified framework。   本文提出question-guided attention map(QAM),把QAM当作是潜在的信息,这些map并不需要为各种可能的搜索提供明确的标签。QAM是通过在空间图像特征图中搜索与问题语义相关的视觉特征生成的。这种搜索是通过configurable convolution neural network实现的,这个网路是利用feature map 与configurable convolutional kernel卷积而成。configurable convolutional kernel是一个特殊的卷积核,在把问题向量映射到视觉特征空间中过程中生成configurable convolutional kernel   框架图:      主要分为4个模块:图像特征提取、问题特征提取、Attention提取、Answer生成。

    转载请注明原文地址: https://ju.6miu.com/read-200005.html

    最新回复(0)