Brand-New Attention Chapter

We have added the brand-new Chapter: Attention Mechanisms:

Attention Cues
- Attention Cues in Biology
- Queries, Keys, and Values
- Visualization of Attention
Attention Pooling: Nadaraya-Watson Kernel Regression
- Generating the Dataset
- Average Pooling
- Nonparametric Attention Pooling
- Parametric Attention Pooling
Attention Scoring Functions
- Masked Softmax Operation
- Additive Attention
- Scaled Dot-Product Attention
Bahdanau Attention
- Model
- Defining the Decoder with Attention
- Training
Multi-Head Attention
- Model
- Implementation
Self-Attention and Positional Encoding
- Self-Attention
- Comparing CNNs, RNNs, and Self-Attention
- Positional Encoding
Transformer
- Model
- Positionwise Feed-Forward Networks
- Residual Connection and Layer Normalization
- Encoder
- Decoder
- Training

PyTorch Adaptation Completed

We have completed PyTorch implementations for Vol.1 (Chapter 1--15).

The following chapters have been significantly improved for v1.0:

The following chapters have been translated into Chinese (d2l-zh v2 Git repo, Web preview):

The community are translating the book into Turkish (d2l-tr Git repo, Web preview). The first draft of Chapter 1--7 is complete.