Navigation

    Gpushare.com

    • Register
    • Login
    • Search
    • Popular
    • Categories
    • Recent
    • Tags

    【3】 transformers简介

    技术交流
    1
    1
    47
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 188****7632
      188****7632 last edited by 188****7632

      上一节提到这个图

      这一次主要是model部分
      我们通过tokenizer得到了

      我们可以通过from_pretrained加载预训练好的模型,把tokenizer送入到model里面

      还有好多模型可以使用

      查看模型,automodel的架构如下:

      DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0): TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(in_features=768, out_features=768, bias=True)
                (k_lin): Linear(in_features=768, out_features=768, bias=True)
                (v_lin): Linear(in_features=768, out_features=768, bias=True)
                (out_lin): Linear(in_features=768, out_features=768, bias=True)
              )
              (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (ffn): FFN(
                (dropout): Dropout(p=0.1, inplace=False)
                (lin1): Linear(in_features=768, out_features=3072, bias=True)
                (lin2): Linear(in_features=3072, out_features=768, bias=True)
              )
              (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            )
            (1): TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(in_features=768, out_features=768, bias=True)
                (k_lin): Linear(in_features=768, out_features=768, bias=True)
                (v_lin): Linear(in_features=768, out_features=768, bias=True)
                (out_lin): Linear(in_features=768, out_features=768, bias=True)
              )
              (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (ffn): FFN(
                (dropout): Dropout(p=0.1, inplace=False)
                (lin1): Linear(in_features=768, out_features=3072, bias=True)
                (lin2): Linear(in_features=3072, out_features=768, bias=True)
              )
              (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            )
            (2): TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(in_features=768, out_features=768, bias=True)
                (k_lin): Linear(in_features=768, out_features=768, bias=True)
                (v_lin): Linear(in_features=768, out_features=768, bias=True)
                (out_lin): Linear(in_features=768, out_features=768, bias=True)
              )
              (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (ffn): FFN(
                (dropout): Dropout(p=0.1, inplace=False)
                (lin1): Linear(in_features=768, out_features=3072, bias=True)
                (lin2): Linear(in_features=3072, out_features=768, bias=True)
              )
              (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            )
            (3): TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(in_features=768, out_features=768, bias=True)
                (k_lin): Linear(in_features=768, out_features=768, bias=True)
                (v_lin): Linear(in_features=768, out_features=768, bias=True)
                (out_lin): Linear(in_features=768, out_features=768, bias=True)
              )
              (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (ffn): FFN(
                (dropout): Dropout(p=0.1, inplace=False)
                (lin1): Linear(in_features=768, out_features=3072, bias=True)
                (lin2): Linear(in_features=3072, out_features=768, bias=True)
              )
              (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            )
            (4): TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(in_features=768, out_features=768, bias=True)
                (k_lin): Linear(in_features=768, out_features=768, bias=True)
                (v_lin): Linear(in_features=768, out_features=768, bias=True)
                (out_lin): Linear(in_features=768, out_features=768, bias=True)
              )
              (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (ffn): FFN(
                (dropout): Dropout(p=0.1, inplace=False)
                (lin1): Linear(in_features=768, out_features=3072, bias=True)
                (lin2): Linear(in_features=3072, out_features=768, bias=True)
              )
              (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            )
            (5): TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(in_features=768, out_features=768, bias=True)
                (k_lin): Linear(in_features=768, out_features=768, bias=True)
                (v_lin): Linear(in_features=768, out_features=768, bias=True)
                (out_lin): Linear(in_features=768, out_features=768, bias=True)
              )
              (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (ffn): FFN(
                (dropout): Dropout(p=0.1, inplace=False)
                (lin1): Linear(in_features=768, out_features=3072, bias=True)
                (lin2): Linear(in_features=3072, out_features=768, bias=True)
              )
              (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            )
          )
        )
      )
      

      而AutoModelForSequenceClassification的架构如下

      DistilBertForSequenceClassification(
        (distilbert): DistilBertModel(
          (embeddings): Embeddings(
            (word_embeddings): Embedding(30522, 768, padding_idx=0)
            (position_embeddings): Embedding(512, 768)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (transformer): Transformer(
            (layer): ModuleList(
              (0): TransformerBlock(
                (attention): MultiHeadSelfAttention(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (q_lin): Linear(in_features=768, out_features=768, bias=True)
                  (k_lin): Linear(in_features=768, out_features=768, bias=True)
                  (v_lin): Linear(in_features=768, out_features=768, bias=True)
                  (out_lin): Linear(in_features=768, out_features=768, bias=True)
                )
                (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (ffn): FFN(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (lin1): Linear(in_features=768, out_features=3072, bias=True)
                  (lin2): Linear(in_features=3072, out_features=768, bias=True)
                )
                (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              )
              (1): TransformerBlock(
                (attention): MultiHeadSelfAttention(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (q_lin): Linear(in_features=768, out_features=768, bias=True)
                  (k_lin): Linear(in_features=768, out_features=768, bias=True)
                  (v_lin): Linear(in_features=768, out_features=768, bias=True)
                  (out_lin): Linear(in_features=768, out_features=768, bias=True)
                )
                (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (ffn): FFN(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (lin1): Linear(in_features=768, out_features=3072, bias=True)
                  (lin2): Linear(in_features=3072, out_features=768, bias=True)
                )
                (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              )
              (2): TransformerBlock(
                (attention): MultiHeadSelfAttention(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (q_lin): Linear(in_features=768, out_features=768, bias=True)
                  (k_lin): Linear(in_features=768, out_features=768, bias=True)
                  (v_lin): Linear(in_features=768, out_features=768, bias=True)
                  (out_lin): Linear(in_features=768, out_features=768, bias=True)
                )
                (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (ffn): FFN(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (lin1): Linear(in_features=768, out_features=3072, bias=True)
                  (lin2): Linear(in_features=3072, out_features=768, bias=True)
                )
                (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              )
              (3): TransformerBlock(
                (attention): MultiHeadSelfAttention(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (q_lin): Linear(in_features=768, out_features=768, bias=True)
                  (k_lin): Linear(in_features=768, out_features=768, bias=True)
                  (v_lin): Linear(in_features=768, out_features=768, bias=True)
                  (out_lin): Linear(in_features=768, out_features=768, bias=True)
                )
                (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (ffn): FFN(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (lin1): Linear(in_features=768, out_features=3072, bias=True)
                  (lin2): Linear(in_features=3072, out_features=768, bias=True)
                )
                (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              )
              (4): TransformerBlock(
                (attention): MultiHeadSelfAttention(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (q_lin): Linear(in_features=768, out_features=768, bias=True)
                  (k_lin): Linear(in_features=768, out_features=768, bias=True)
                  (v_lin): Linear(in_features=768, out_features=768, bias=True)
                  (out_lin): Linear(in_features=768, out_features=768, bias=True)
                )
                (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (ffn): FFN(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (lin1): Linear(in_features=768, out_features=3072, bias=True)
                  (lin2): Linear(in_features=3072, out_features=768, bias=True)
                )
                (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              )
              (5): TransformerBlock(
                (attention): MultiHeadSelfAttention(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (q_lin): Linear(in_features=768, out_features=768, bias=True)
                  (k_lin): Linear(in_features=768, out_features=768, bias=True)
                  (v_lin): Linear(in_features=768, out_features=768, bias=True)
                  (out_lin): Linear(in_features=768, out_features=768, bias=True)
                )
                (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (ffn): FFN(
                  (dropout): Dropout(p=0.1, inplace=False)
                  (lin1): Linear(in_features=768, out_features=3072, bias=True)
                  (lin2): Linear(in_features=3072, out_features=768, bias=True)
                )
                (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              )
            )
          )
        )
        (pre_classifier): Linear(in_features=768, out_features=768, bias=True)
        (classifier): Linear(in_features=768, out_features=2, bias=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      

      区别就在于最后三层
      我们可以使用save_pretrained保存模型
      保存了两个文件


      两次结果是一样的

      参考:
      https://huggingface.co/course/chapter2/3?fw=pt

      1 Reply Last reply Reply Quote 4
      • First post
        Last post