Natural Language Processing (NLP) is one of the most important fields in Artificial Intelligence (AI). It has become very crucial in the information age because most of the information is in the form of unstructured text. NLP technologies are applied everywhere as people communicate mostly in language: language translation, web search, customer support, emails, forums, advertisement, radiology reports, to name a few.

There are a number of core NLP tasks and machine learning models behind NLP applications. Deep learning, a sub-field of machine learning, has recently brought a paradigm shift from traditional task-specific feature engineering to end-to-end systems and has obtained high performance across many different NLP tasks and downstream applications. Tech companies like Google, Baidu, Alibaba, Apple, Amazon, Facebook, Tencent, and Microsoft are now actively working on deep learning methods to improve their products. For example, Google recently replaced its traditional statistical machine translation and speech-recognition systems with systems based on deep learning methods.

**Optional Textbooks**

- Deep Learning by Goodfellow, Bengio, and Courville free online
- Machine Learning — A Probabilistic Perspective by Kevin Murphy online
- Natural Language Processing by Jacob Eisenstein free online
- Speech and Language Processing by Dan Jurafsky and James H. Martin (3rd ed. draft)

In this course, students will learn state-of-the-art deep learning methods for NLP. Through lectures and practical assignments, students will learn the necessary tricks for making their models work on practical problems. They will learn to implement, and possibly to invent their own deep learning models using available deep learning libraries like Pytorch.

**Our Approach**

*Thorough and Detailed*: How to write from scratch, debug and train deep neural models*State of the art*: Most lecture materials are new from research world in the past 1-5 years.*Practical*: Focus on practical techniques for training the models, and on GPUs.*Fun*: Cover exciting new advancements in NLP (e.g., Transformer, BERT).

**Weekly Workload**

- Every two-hour lecture will be accompanied by practice problems implemented in PyTorch.
- There will be a 30-min office hour per week to discuss assignments and project.
- There will be some invited talks from NLP researchers (see the schedule).
- There will be
`5%`

marks for class participation.

**Assignments (individually graded)**

- There will be three (3) assignments contributing to
`3 * 15% = 45%`

of the total assessment. Assignments will be posted on NTU-Learn (see the schedule).

Late day policy

- 2 free late days; afterwards,
`10%`

off per day late - Not accepted after 3 late days

- 2 free late days; afterwards,
Students will be graded individually on the assignments. They will be allowed to discuss with each other on the homework assignments, but they are required to submit individual write-ups and coding exercises.

**Final Project (Group work but individually graded)**

There will be a final project contributing to the remaining 50% of the total course-work assessment.

`1–3`

people per group- Project proposal:
`5%`

, presentation:`10%`

, report:`35%`

Instructions for project proposal and final report will be posted on NTU-Learn (see the schedule).

The project will be a group or individual work depending on the student’s preference. Students will be graded individually. The final project presentation will ensure the student’s understanding of the project

- Proficiency in Python (using numpy and PyTorch). There is a lecture for those who are not familiar with Python.
- College Calculus, Linear Algebra
- Basic Probability and Statistics
- Machine Learning basics

Project final report guidelines

- Project presentation: 10-12 min/group

**Assignment 3 in**

`Invited talk`

on Unsupervised MT by Xuan Phi

- Multitask-learning
- Fine-Tuning for Transfer Learning
- Meta-learning problem
- Two views of Meta-learning Problem
- Black-box meta learning (GPT3)
- Optimization-based meta learning (MAML)
- Non-parametric meta learning (ptotyical nets)

Slides on adverarial attacks (prepared by Samson@Amazon)

Slides on deep generative models

`Invited talk`

on **Text Generation** by Lin Xiang

**Lecture Content**

- Generative adversarial nets (GANs)
- Domain adversarial nets (DANs)
Adversarial attacks in NLP

Defense:

- Training with adversarial examples
- Consistency regularization
- Cross-view consistency

Variational inference

Auto encoders

Variational auto encoders

Conditional VAEs

Vector Quantized VAEs

Variational Generative adversarial nets

**Suggested Readings**

**Assignment 2 in**

**Assignment 3 out**

Project final report guidelines

**Lecture Content**

Pre-training and fine-tuning paradigm

- CoVe
- TagLM
- ELMo
- GPT
- ULMfit
- BERT (+ mBERT)
- XLM
- XL-Net
- BART (+ mBART)
- T5 (+ mT5)

Evaluation benchmarks

- GLUE
- SQuAD
- NER
- SuperGLUE
- XNLI

TA: Mathieu

**Suggested Readings**

**Project Proposal due**

**Lecture Content**

Seq2Seq Variants (Pointer nets, Pointer Generator Nets)

- Machine Translation
- Summarization
- Parsing
- image/video captioning

Transformer architecture

- Self-attention
- Positional encoding
- Multi-head attention

**Practical exercise with Pytorch**

TA: Bosheng Ding

**Suggested Readings**

**Project Proposal due**

**Lecture Content**

- Information bottleneck issue with vanilla Seq2Seq
- Attention to the rescue
- Details of attention mechanism
- Attention variants
- Morphology in MT
- Subword level models

**Practical exercise with Pytorch**

**Practical exercise with Pytorch**

**Suggested Readings**

**Lecture Content**

Machine translation

- Early days (1950s)
- Statistical machine translation or SMT (1990-2010)
- Alignment in SMT
- Decoding in SMT

Neural machine translation or NMT (2014 - )

Encoder-decoder model for NMT

Advantages and disadvantages of NMT

Greedy vs. beam-search decoding

MT evaluation

Other applications of Seq2Seq

**Suggested Readings**

**Assignment 1 in**

**Assignment 2 out (in NTU-Learn)**

**Lecture Content**

- Basic RNN structures
- Language modeling with RNNs
- Backpropagation through time
- Text generation with RNN LM
- Issues with Vanilla RNNs
- Exploding gradient
- Gated Recurrent Units (GRUs) and LSTMs
- Bidirectional RNNs
- Multi-layer RNNs
- Sequence labeling with RNNs
- Sequence classification with RNNs

TA: Saiful Bari

**Practical exercise with Pytorch**

**Suggested Readings**

**Lecture Content**

- Classification tasks in NLP
- Window-based Approach for language modeling
- Window-based Approach for NER, POS tagging, and Chunking
- Convolutional Neural Net for NLP
- Max-margin Training
- Scaling Softmax (Adaptive input & output)

**Practical exercise with Pytorch**

**Suggested Readings**

**Assignment `1`

out (in NTU-Learn) **

**Lecture Content**

- Word meaning
- Denotational semantics
- Distributed representation of words
- Word2Vec models (Skip-gram, CBOW)
- Negative sampling
- Glove
- FastText
Evaluating word vectors

- Intrinsic evaluation
- Extrinsic evaluation

Cross-lingual word vectors

TA: Mathieu

**Practical exercise with Pytorch**

Visualization

**Suggested Readings**

- Word2Vec Tutorial - The Skip-Gram Model blog
- Efficient Estimation of Word Representations in Vector Space - Original word2vec paper
- Distributed Representations of Words and Phrases and their Compositionality - negative sampling paper
- GloVe: Global Vectors for Word Representation
- FastText: Enriching Word Vectors with Subword Information
- Linguistic Regularities in Sparse and Explicit Word Representations.
- Neural Word Embeddings as Implicit Matrix Factorization.
Improving Distributional Similarity with Lessons Learned from Word Embeddings

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

**Lecture Content**

Why Deep Learning for NLP?

From Logistic Regression to Feed-forward NN

- Activation functions

SGD with Backpropagation

Adaptive SGD (Adagrad, adam, RMSProp)

Regularization (Weight Decay, Dropout, Batch normalization, Gradient clipping)

Introduction to Word Vectors

TA: Mathieu

**Practical exercise with Pytorch**

- Backpropagation
- Dropout
- Batch normalization
- Initialization
- Gradient clipping

**Suggested Readings**

**Lecture Content**

- What is Machine Learning?
- Supervised vs. unsupervised learning
- Linear Regression
- Logistic Regression
- Multi-class classification
- Parameter estimation (MLE & MAP)
- Gradient-based optimization & SGD

TA: Mathieu

**Practical exercise with Pytorch**

- Deep learning with PyTorch
- Linear Regression
- Logistic Regression
[Supplementary]

- Numerical programming with Pytorch - Pytorch intro

- Numerical programming with Pytorch - Pytorch intro

**Lecture Content**

- What is Natural Language Processing?
- Why is language understanding difficult?
- What is Deep Learning?
- Deep learning vs. other machine learning methods?
- Why deep learning for NLP?
- Applications of deep learning to NLP
- Knowing the target group (background, field of study, programming experience)
- Expectation from the course

**Python & PyTorch Basics**

Programming in Python

- Jupiter Notebook and google colab
- Introduction to python
- Deep Learning Frameworks
- Why Pytorch?
- Deep learning with PyTorch

[Supplementary]

- Numerical programming with numpy/scipy - Numpy intro
- Numerical programming with Pytorch - Pytorch intro

- What is RL?
- Key concepts: Rewards, Policy, Value Function
- What is Deep RL?
Policy-based Deep RL

- Deep Policy Network
- Policy Gradient

Deep Q-Learning

Applications of Deep RL in NLP

- Abstractive summarization
- Dialogue generation
- Question answering
- Multimodal (image and video captioning)
- Machine translation