1. Input Token Embeddings
Shape: [2048, 12,288]
Parameters: ~614M (50,000 × 12,288)
2. Add Positional Encoding
Shape: [2048, 12,288]
Parameters: ~25M (2048 × 12,288, learned)
4. De-embedding & Softmax
Shape: [2048, 50,000]
Parameters: ~614M (12,288 × 50,000)
Output Tokens
[]