Choose the name of the paper that first proposed the "Transformer".
1. Attention Mechanisms in Neural Networks.
2. Attention is All You Need.
3. The Transformer Model.
4. Neural Machine Translation by Jointly Learning to Align and Translate.