[PMLR 2015] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
[ArXiv 2016] Layer Normalization
[ECCV 2016] Identity Mappings in Deep Residual Networks
[ACL 2018] The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
[ICLR 2019] Fixup Initialization: Residual Learning Without Normalization
[NeurIPS 2019] Root mean square layer normalization