All you need know about Normalization in NLP

Reivew Adam with me

5 minute read

Published: September 28, 2021

Adam作为一种自适应的优化算法, 结合了Momentum以及RMSprop算法, 一方面参考动量作为参数更新方向, 一方面计算梯度的指数加权平方. Adam在深度学习领域有广泛的实用性, 同时也是过去五年来被cite数量最多的scientific paper, 根据Nature Index and Google Scholar, 被戏称为AI Paper计数器.

Are Pre-trained Convolutions Better than Pre-trained Transformers?

4 minute read

Published: July 15, 2021

这是一篇来自ACL 2021的文章, 不同于以往Transformers征战CV无人可敌的情况, CV届传统一哥CNN现今也杀回NLP了领域. Google Research团队通过Convolutions替代Transformer进行预训练微调范式(pre-train-fine-tune paradigm), 在7个NLP任务中取得了不亚于Transformer的结果. 文章也对广大巨无霸预训练模型的成功得益与pre-training schemes或是model architectures提出了以下三个疑问, 让我们带着问题来阅读这篇文章.

Improve Code Quality With Gitlab

2 minute read

Published: April 22, 2021

代码质量检测在大型工程项目中扮演着十分重要的角色, 通过高效可靠的代码质量检测可以大幅提升代码健壮性以及可阅读性. 常规的代码检测可以人工审查和基于工具的自动检测.

Data Augmentation with CoDA

4 minute read

Published: March 19, 2021

数据增强(Data Augmentation)方法成功地改进了大规模基于神经网络的模型. 然而,现有的大多数研究都是针对计算机视觉(CV)任务.图像数据得以于其构造的特性, 可以使用可以使用剪裁, 翻转, 缩放等操作来扩大数据集. 自然语言的离散性, 让这种保留标签监督性(label-preserving)同时有助于模型泛化的简单转换在文本序列上异常困难. 从模型层面来讲, 巨无霸式的大型预训练语言模型依靠大量的算力, 在海量的无监督文本下被喂食以先验知识. 但是当将其应用于小样本数据的下游任务时, 往往会因为数据缺失无法表现出其应有的模型能力. 为此Microsoft Dynamics 365 AI和UIUC在这项工作中提出了CoDA方案, 进一步寻找有效的数据增强策略.

Zheyu Ye

All you need know about Normalization in NLP

[PMLR 2015] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

[ArXiv 2016] Layer Normalization

[ECCV 2016] Identity Mappings in Deep Residual Networks

[ACL 2018] The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

[ACL 2019] Learning Deep Transformer Models for Machine Translation

[ICLR 2019] Fixup Initialization: Residual Learning Without Normalization

[NeurIPS 2019] Root mean square layer normalization

[ICML 2020] On Layer Normalization in the Transformer Architecture

[EMNLP 2020] Understanding the Difficulty of Training Transformers

[ArXiv 2020] RealFormer: Transformer Likes Residual Attention

Share on

You May Also Enjoy

Reivew Adam with me

Are Pre-trained Convolutions Better than Pre-trained Transformers?

Improve Code Quality With Gitlab

Data Augmentation with CoDA