colossalai.kernel.cuda_native.layer_norm

This code is from NVIDIA apex:

https://github.com/NVIDIA/apex

with some changes.