Navigation

    Gpushare.com

    • Register
    • Login
    • Search
    • Popular
    • Categories
    • Recent
    • Tags

    图像描述【3】

    语音识别与语义处理领域
    1
    1
    67
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 188****7632
      188****7632 last edited by

      这次介绍一篇比较新的文章

      RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words CVPR2021
      总的来说,贡献点有两个,对我而言,有启发的是后者
      模型结果如下:

      第一次看没看懂,之前把bert忘的差不多了

      昨天又仔细看了一遍才看懂

      说白了,第二点,还是对qkv的改进

      把从h_t预测变成h_t去自主选择视觉还是文本

      核心代码如下:

      这一点看懂就好了,希望实验效果好一点

      参考文献:
      https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_RSTNet_Captioning_With_Adaptive_Attention_on_Visual_and_Non-Visual_Words_CVPR_2021_paper.html

      1 Reply Last reply Reply Quote 0
      • First post
        Last post