Navigation

    Gpushare.com

    • Register
    • Login
    • Search
    • Popular
    • Categories
    • Recent
    • Tags

    2D函数梯度优化

    语音识别与语义处理领域
    1
    1
    35
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 155****7220
      155****7220 last edited by

      首先我们构建一个2D函数方程
      $$
      f(x,y)=(x^2+y-11)^2+(x+y^2-7)^2
      $$
      该方程为Himmelblau方程,是科学家们研究出来专门用于检测一个优化器效果的方程。该方程所绘制的图像如下

      由图可见,四个蓝色圆圈即为该方程的极小值点,其平面上的图像如右图所示。该方程虽然有四个极小值点,但四个点对应的值均为0

      • f(3.0,2.0)=0.0f(3.0,2.0)=0.0f(3.0,2.0)=0.0
      • f(−2.805118,3.131312)=0.0f(-2.805118,3.131312)=0.0f(−2.805118,3.131312)=0.0
      • f(−3.779310,−3.283186)=0.0f(-3.779310,-3.283186)=0.0f(−3.779310,−3.283186)=0.0
      • f(3.584428,−1.848126)=0.0f(3.584428,-1.848126)=0.0f(3.584428,−1.848126)=0.0
        我们可以通过测试优化器能否找到四个极小值点,来判断其优劣
      import torch
      import torch.nn.functional as F
      import numpy as np
      import matplotlib.pyplot as plt
      from mpl_toolkits.mplot3d import Axes3D
      
      def himmelblau(x, y):
          return (x**2 + y - 11)**2 + (x + y**2 - 7)**2
      
      x = np.arange(-6, 6, 0.1) # x轴的范围
      y = np.arange(-6, 6, 0.1) # y轴的范围
      X, Y = np.meshgrid(x, y)
      Z = himmelblau(X, Y)
      
      fig = plt.figure('himmelblau')
      ax = fig.gca(projection='3d')
      ax.plot_surface(X, Y, Z)
      ax.view_init(60, -30)
      ax.set_xlabel('x')
      ax.set_ylabel('y')
      plt.show()
      

      输出图像为

      关于meshgrid()函数,这篇博客解释的很好,如果不理解的可以看看

      下面以梯度下降法为例子来试着找到Himmelblau方程的极小值。这里是以优化预测值pred为目标,而不是误差Error

      x = torch.tensor([0., 0.], requires_grad=True) # 设定初始值(0, 0)
      optimizer = torch.optim.Adam([x], lr=1e-3)
      # 优化器对x进行优化,设定学习率为0.001
      
      for step in range(20000):
          pred = himmelblau(x[0], x[1])
          optimizer.zero_grad() # 梯度信息清零
          pred.backward()
          optimizer.step() # 进行一次优化器优化,根据梯度信息更新x[0]和x[1]
          
          if step % 2000 == 0:
              print("step{}: x={}, f(x) = {}".format(step, x.tolist(), pred.item()))
      

      输出为

      step0: x=[0.0009999999310821295, 0.0009999999310821295], f(x) = 170.0
      step2000: x=[2.3331806659698486, 1.9540694952011108], f(x) = 13.730916023254395
      step4000: x=[2.9820079803466797, 2.0270984172821045], f(x) = 0.014858869835734367
      step6000: x=[2.999983549118042, 2.0000221729278564], f(x) = 1.1074007488787174e-08
      step8000: x=[2.9999938011169434, 2.0000083446502686], f(x) = 1.5572823031106964e-09
      step10000: x=[2.999997854232788, 2.000002861022949], f(x) = 1.8189894035458565e-10
      step12000: x=[2.9999992847442627, 2.0000009536743164], f(x) = 1.6370904631912708e-11
      step14000: x=[2.999999761581421, 2.000000238418579], f(x) = 1.8189894035458565e-12
      step16000: x=[3.0, 2.0], f(x) = 0.0
      step18000: x=[3.0, 2.0], f(x) = 0.0
      

      由结果可见,运行到16000次后,找到极小值点。若改变初始值点,则可能会改变输出结果,比方说把初始值点由(0,0)改为(4,0)

      1 Reply Last reply Reply Quote 1
      • First post
        Last post