All you need know about Normalization in NLP

less than 1 minute read


[PMLR 2015] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

[ArXiv 2016] Layer Normalization

[ECCV 2016] Identity Mappings in Deep Residual Networks

[ACL 2018] The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

[ACL 2019] Learning Deep Transformer Models for Machine Translation

[ICLR 2019] Fixup Initialization: Residual Learning Without Normalization

[NeurIPS 2019] Root mean square layer normalization

[ICML 2020] On Layer Normalization in the Transformer Architecture

[EMNLP 2020] Understanding the Difficulty of Training Transformers

[ArXiv 2020] RealFormer: Transformer Likes Residual Attention