Navigation

    Gpushare.com

    • Register
    • Login
    • Search
    • Popular
    • Categories
    • Recent
    • Tags

    图像描述【2】

    语音识别与语义处理领域
    1
    1
    54
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 188****7632
      188****7632 last edited by

      Hierarchical Attention Network for Image Captioning
      AAAI 2019
      由于这篇文章被其他问题提到,顺便看一看吧,看看有没有启发
      模型架构如下:

      首先它引入了4个图像特征

      这个patch特征 我一开始以为是VIT那种, 其实就是grid feature

      整体架构如下:

      但我看了半天没找到V_g怎么来的,可能是时间紧迫吧,今天要把代码改完

      然后就是attetion,但现在看,这个attention也不是qkv那种形式,交互比较简单

      还提到一点,使用gate来平衡不同feature的特征
      CVPR2021有篇工作的重点就是这个,选text feature还是image feature,那个当时没看懂,下次写吧
      先留个坑吧,以后再补

      参考文献:
      https://ojs.aaai.org//index.php/AAAI/article/view/4924

      1 Reply Last reply Reply Quote 1
      • First post
        Last post