Navigation

    Gpushare.com

    • Register
    • Login
    • Search
    • Popular
    • Categories
    • Recent
    • Tags

    【1 结果展示】哪个男孩不想拥有自己的预训练模型(留下贫穷的眼泪)

    技术交流
    2
    2
    151
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 183****0229
      183****0229 last edited by 183****0229

      使用RTX3090在openwebtext数据集上添加rotary-position-embedding,训练electra-small模型

      一、 复现结果(dev dataset)
      Model CoLA SST MRPC STS QQP MNLI QNLI RTE Avg.
      ELECTRA-Small-OWT(original) 56.8 88.3 87.4 86.8 88.3 78.9 87.9 68.5 80.36
      ELECTRA-RoFormer-Small-OWT (this) 55.76 90.45 87.3 86.64 89.61 81.17 88.85 62.71 80.31
      三、 训练细节
      • 数据集 openwebtext
      • 训练batch_size 256
      • 学习率lr 5e-4
      • 最大句子长度max_seqlen 128
      • 训练total step 50W
      • GPU RTX3090
      • 训练时间总共耗费55h
      四、roformer_electra 的 wandb日志
      • 预训练日志
      • GLUE微调日志

      https://huggingface.co/junnyu/electra_small_discriminator
      https://huggingface.co/junnyu/roformer_small_discriminator

      Tips:
      • 未完待续,To be continued.
      1 Reply Last reply Reply Quote 2
      • Alice_恒源云
        Alice_恒源云 last edited by

        关于模型训练经过,大家可以移步至下一篇帖子,更加精彩哦:

        【2 数据下载+模型训练】哪个男孩不想拥有自己的预训练模型(留下贫穷的眼泪)

        1 Reply Last reply Reply Quote 0
        • First post
          Last post