Navigation

    Gpushare.com

    • Register
    • Login
    • Search
    • Popular
    • Categories
    • Recent
    • Tags

    混合精度训练Automatic mixed precision(AMP)加速训练

    技术分享📚有奖励
    2
    2
    137
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 189****6672
      189****6672 last edited by

      混合精度训练Automatic mixed precision(AMP)加速训练

      使用amp基本代码流程:

      # amp依赖Tensor core架构,所以model参数必须是cuda tensor类型
      model = Net().cuda()
      optimizer = optim.SGD(model.parameters(), ...)
      amp = True # 控制是否使用amp
      # GradScaler对象用来自动做梯度缩放
      scaler = torch.cuda.amp.GradScaler() if amp
      
      for epoch in epochs:
          model.train()
          for input, target in dataloader:
              optimizer.zero_grad()
              # 在autocast enable 区域运行forward
              if amp and scaler is not None:
              	with torch.cuda.amp.autocast():
                  	# model做一个FP16的副本,forward
                  	output = model(input)
                  	loss = loss_fn(output, target)
              	# 用scaler,scale loss(FP16),backward得到scaled的梯度(FP16)
             	 	scaler.scale(loss).backward()
              	# scaler 更新参数,会先自动unscale梯度
              	# 如果有nan或inf,自动跳过
              	scaler.step(optimizer)
              	# scaler factor更新
              	scaler.update()
               else:
                  output = model(input)
                  loss = loss_fn(output, target)
                  loss.backward
                  optimizer.step()
           
          if (epoch + 1) % val_interval == 0:  # 多长时间验证一次
              model.eval()
              with torch.no_grad():
                  for input, target in valdataloader:
                      if amp:
                          with torch.cuda.amp.autocast():
                              output = model(input)
                      else:
                          output = model(input)
              
      

      内存占用对比

      使用AMP会节约cuda内存

      不使用AMP

      使用AMP

      损失曲线和验证metric曲线

      使用AMP能够加快收敛速度

      训练总时长和每个epoch时长

      使用AMP大大缩短训练时长

      1 Reply Last reply Reply Quote 3
      • Alice_恒源云
        Alice_恒源云 last edited by

        咚咚越来越高产了~一起进步!

        1 Reply Last reply Reply Quote 0
        • First post
          Last post