Navigation

    Gpushare.com

    • Register
    • Login
    • Search
    • Popular
    • Categories
    • Recent
    • Tags

    前沿分享

    技术分享📚有奖励
    1
    1
    45
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 189****6672
      189****6672 last edited by

      转载:爱可可-爱生活(知乎)
      摘要:自监督视觉表示学习的变换不变性和协方差对比、视频预测的掩码视觉预训练、基于更简单合成任务的预训练分析、一行代码实现多语言NLG基准、隐私记忆的洋葱效应、双层规划神经符号技能学习、深度强化学习目标误泛化、基于码本查找Transformer的鲁棒盲人脸恢复、没有监督情况下的目标发现与到达学习

      1、[CV] TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning

      J Zhu, R M. Moraes, S Karakulak, V Sobol, A Canziani, Y LeCun
      [New York University & Viasat, Inc]
      TiCo:自监督视觉表示学习的变换不变性和协方差对比。本文提出变换不变性和协方差对比(TiCo)的自监督视觉表示学习。与其他最近的自监督学习方法类似,所提出方法是基于最大化同一图像的不同扭曲版本嵌入间的一致性,促使编码器产生变换不变表示。为避免编码器产生常数向量的平凡解,通过惩罚低秩解来正则不同图像嵌入协方差矩阵。通过联合最小化变换不变损失和协方差对比损失,就得到了一个能为下游任务产生有用表示的编码器。本文分析所提出方法,表明它可以被视为MoCo的一种变体,具有无限大小的隐式存储库,无需额外的记忆成本。这使得该方法在使用小批量时比其他方法性能更好。TiCo也可以被视为Barlow Twins的变体。通过将对比方法和冗余减少方法结合在一起,TiCo为了解联合嵌入方法的工作原理提供了新的思路。
      We present Transformation Invariance and Covariance Contrast (TiCo) for selfsupervised visual representation learning. Similar to other recent self-supervised learning methods, our method is based on maximizing the agreement among embeddings of different distorted versions of the same image, which pushes the encoder to produce transformation invariant representations. To avoid the trivial solution where the encoder generates constant vectors, we regularize the covariance matrix of the embeddings from different images by penalizing low rank solutions. By jointly minimizing the transformation invariance loss and covariance contrast loss, we get an encoder that is able to produce useful representations for downstream tasks. We analyze our method and show that it can be viewed as a variant of MoCo [16] with an implicit memory bank of unlimited size at no extra memory cost. This makes our method perform better than alternative methods when using small batch sizes. TiCo can also be seen as a modification of Barlow Twins [35]. By connecting the contrastive and redundancy-reduction methods together, TiCo gives us new insights into how joint embedding methods work.
      https://arxiv.org/abs/2206.10698

      img

      img

      img

      img

      2、[CV] MaskViT: Masked Visual Pre-Training for Video Prediction

      A Gupta, S Tian, Y Zhang, J Wu, R Martín-Martín, L Fei-Fei
      [Stanford University & Salesforce AI]
      MaskViT:视频预测的掩码视觉预训练。基于过去的观察和运动指令预测未来视觉观察的能力,可以使具身智能体在复杂环境中规划各种任务的解决方案。本文工作表明,可以通过掩码视觉建模对Transformer进行预训练,从而建立良好的视频预测模型。所提出方法MaskViT,基于两个简单的设计决策。首先,为了提高记忆和训练效率,使用了两种类型的窗口注意力:空间注意力和时空注意力。第二,在训练过程中,我们掩码的token比例是可变的,而不是固定比例。对于推断,MaskViT通过迭代细化生成所有token,其中按照掩码调度函数逐步降低掩码率。在多个数据集上证明了MaskViT在视频预测方面优于以往的工作,参数高效,可生成高分辨率视频(256 × 256)。本文还展示了在真实机器人上用MaskViT进行规划的迭代译码所带来的推理加速(最高可达512×)的好处。本文工作表明,可以通过使用最小的领域知识来利用掩码可视化建模的一般框架,赋予具身智能体强大的预测模型。
      The ability to predict future visual observations conditioned on past observations and motor commands can enable embodied agents to plan solutions to a variety of tasks in complex environments. This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling. Our approach, named MaskViT, is based on two simple design decisions. First, for memory and training efficiency, we use two types of window attention: spatial and spatiotemporal. Second, during training, we mask a variable percentage of tokens instead of a fixed mask ratio. For inference, MaskViT generates all tokens via iterative refinement where we incrementally decrease the masking ratio following a mask scheduling function. On several datasets we demonstrate that MaskViT outperforms prior works in video prediction, is parameter efficient, and can generate high-resolution videos (256 × 256). Further, we demonstrate the benefits of inference speedup (up to 512×) due to iterative decoding by using MaskViT for planning on a real robot. Our work suggests that we can endow embodied agents with powerful predictive models by leveraging the general framework of masked visual modeling with minimal domain knowledge.
      https://arxiv.org/abs/2206.11894

      img

      img

      img

      img

      img

      3、[LG] Insights into Pre-training via Simpler Synthetic Tasks

      Y Wu, F Li, P Liang
      [Stanford University & UC Berkeley]
      基于更简单合成任务的预训练分析。预训练产生的表示对下游任务广泛有效,但尚不清楚预训练哪些特性对有效增益是必要的。值得注意的是,最近的研究表明,即使对合成任务进行预训练,也可以在下游任务中获得显著的收益。本文进行了三项实验,迭代简化的预训练,并表明简化仍然保留了它的许多增益。首先,在之前工作的基础上,对现有的三种合成训练预训练方法进行了系统评估。发现最好的合成训练预方法,LIME,平均能达到自然预训练67%的效果。令人惊讶的是,在Set函数定义的简单而通用的合成任务上的预训练达到了65%的收益,几乎与LIME相匹配。第三,39%的好处可以通过仅使用合成训练参数统计获得。
      Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. We find the best synthetic pre-training method, LIME, attains an average of 67% of the benefits of natural pre-training. Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves 65% of the benefits, almost matching LIME. Third, we find that 39% of the benefits can be attained by using merely the parameter statistics of synthetic pre-training. We release the source code at https://github.com/felixzli/synthetic_pretraining.
      https://arxiv.org/abs/2206.10139

      img

      img

      img

      img

      img

      4、[CL] GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

      S Gehrmann, A Bhattacharjee…
      [Allen Institute for AI & Amazon Alexa AI & Bangladesh University of Engineering and Technology & CMU & …]
      GEMv2:一行代码实现多语言NLG基准。机器学习中的评估通常是由过去的选择决定的,例如用哪些数据集或指标。这种标准化使得用排行榜进行公平的比较成为可能,但随着更好的选择出现,评估选择变得次优。这个问题在自然语言生成中尤其重要,因为它需要不断改进的数据集组件、指标和人工评估来做出明确的声明。为简化最佳模型评估实践,本文提出GEMv2。生成、评估和度量基准的新版本为数据集、模型和度量开发人员引入了一个模块化的基础设施,使他们能够从彼此的工作中受益。GEMv2支持51种语言的40个文档数据集。所有数据集的模型都可以在线评估,所提供的交互式数据卡创建和渲染工具使得向实时基准添加新数据集变得更容易。
      Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires everimproving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.
      https://arxiv.org/abs/2206.11249

      img

      img

      img

      img

      5、[LG] The Privacy Onion Effect: Memorization is Relative

      N Carlini, M Jagielski, C Zhang, N Papernot, A Terzis, F Tramer
      [Google Research]
      隐私洋葱效应:记忆是相对的。在私有数据集上训练的机器学习模型已经被证明会泄露其隐私数据。虽然最近的研究发现,平均数据点很少被泄露,但异常值样本往往要被记忆,于是,隐私泄漏就发生了。本文演示和分析了记忆的洋葱效应:移除最易受到隐私攻击的异常点“层”,就会在相同的攻击面前暴露出一层之前安全的新点。本文做了几个实验来研究该效应,并理解它发生的原因。这种效应的存在会产生各种各样的后果。例如,没有经过严格隐私保障训练就反对记忆的提议不太可能有效。此外,诸如机器遗忘学习这样的隐私增强技术实际上可能会损害其他用户的隐私。
      Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the “layer” of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack. We perform several experiments to study this effect, and understand why it occurs. The existence of this effect has various consequences. For example, it suggests that proposals to defend against memorization without training with rigorous privacy guarantees are unlikely to be effective. Further, it suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.
      https://arxiv.org/abs/2206.10469

      img

      img

      img

      img

      img

      另外几篇值得关注的论文:

      [RO] Learning Neuro-Symbolic Skills for Bilevel Planning

      双层规划神经符号技能学习
      T Silver, A Athalye, J B. Tenenbaum, T Lozano-Perez, L P Kaelbling
      [MIT]
      https://arxiv.org/abs/2206.10680

      img

      img

      img

      img

      img

      [LG] Goal Misgeneralization in Deep Reinforcement Learning

      深度强化学习目标误泛化
      L Langosco, J Koch, L Sharkey, J Pfau, L Orseau, D Krueger
      [University of Cambridge & University of Tubingen & University of Edinburgh & DeepMind]
      https://arxiv.org/abs/2105.14111

      img

      img

      img

      img

      img

      [CV] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

      基于码本查找Transformer的鲁棒盲人脸恢复
      S Zhou, K C.K. Chan, C Li, C C Loy
      [Nanyang Technological University]
      https://arxiv.org/abs/2206.11253

      img

      img

      img

      img

      img

      [LG] Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

      随机游走:没有监督情况下的目标发现与到达学习
      L Mezghani, S Sukhbaatar, P Bojanowski, K Alahari
      [Meta AI & Inria]
      https://arxiv.org/abs/2206.11733

      img

      img

      img

      img

      img

      1 Reply Last reply Reply Quote 1
      • First post
        Last post