Course Objectives

Natural Language Processing (NLP) is one of the most important fields in Artificial Intelligence (AI). It has become very crucial in the information age because most of the information is in the form of unstructured text. NLP technologies are applied everywhere as people communicate mostly in language: language translation, web search, customer support, emails, forums, advertisement, radiology reports, to name a few.

There are a number of core NLP tasks and machine learning models behind NLP applications. Deep learning, a sub-field of machine learning, has recently brought a paradigm shift from traditional task-specific feature engineering to end-to-end systems and has obtained high performance across many different NLP tasks and downstream applications. Tech companies like Google, Baidu, Alibaba, Apple, Amazon, Facebook, Tencent, and Microsoft are now actively working on deep learning methods to improve their products. For example, Google recently replaced its traditional statistical machine translation and speech-recognition systems with systems based on deep learning methods.

Optional Textbooks

  • Deep Learning by Goodfellow, Bengio, and Courville free online
  • Machine Learning — A Probabilistic Perspective by Kevin Murphy online
  • Natural Language Processing by Jacob Eisenstein free online
  • Speech and Language Processing by Dan Jurafsky and James H. Martin (3rd ed. draft)

Intended Learning Outcomes

In this course, students will learn state-of-the-art deep learning methods for NLP. Through lectures and practical assignments, students will learn the necessary tricks for making their models work on practical problems. They will learn to implement, and possibly to invent their own deep learning models using available deep learning libraries like Pytorch.

Our Approach

  • Thorough and Detailed: How to write from scratch, debug and train deep neural models

  • State of the art: Most lecture materials are new from research world in the past 1-5 years.

  • Practical: Focus on practical techniques for training the models, and on GPUs.

  • Fun: Cover exciting new advancements in NLP (e.g., Transformer, BERT).

Assessment Approach

Weekly Workload

  • Every two-hour lecture will be accompanied by practice problems implemented in PyTorch.
  • There will be a 30-min office hour per week to discuss assignments and project.
  • There will be 5% marks for class participation.

Assignments (individually graded)

  • There will be three (3) assignments contributing to 3 * 15% = 45% of the total assessment.
  • Late day policy
    • 2 free late days; afterwards,10% off per day late
    • Not accepted after 3 late days
  • Students will be graded individually on the assignments. They will be allowed to discuss with each other on the homework assignments, but they are required to submit individual write-ups and coding exercises.

Final Project (Group work but individually graded)

  • There will be a final project contributing to the remaining 50% of the total course-work assessment.
    • 1–3 people per group
    • Project proposal: 5%, update: 5%, presentation: 10%, report: 30%
  • The project will be a group or individual work depending on the student’s preference. Students will be graded individually. The final project presentation will ensure the student’s understanding of the project

Course Prerequisites

  • Proficiency in Python (using numpy and PyTorch). There is a lecture for those who are not familiar with Python.
  • College Calculus, Linear Algebra
  • Basic Probability and Statistics
  • Machine Learning basics

Schedule & Course Content

 
 
 
 
 

Week 13: In-class Project presentation

02:30 PM - 5:30 PM 15 April 2020 LT13, NTU, Singapore
  • Project presentation: 15 min/group
 
 
 
 
 

Week 12: Adversarial NLP

02:30 PM - 5:30 PM 8 April 2020 LT13, NTU, Singapore

Slides on adversarial nets

Slides on adverarial attacks (prepared by Samson@NUS-Salesforce)

Lecture Recording

Lecture Content

  • Generative adversarial nets (GANs)
  • Domain adversarial nets (DANs)
  • Transfer learning with DANs
  • Adversarial attacks in NLP

  • Defense:

    • Training with adversarial examples
    • Consistency regularization
    • Cross-view consistency
  • Limits & Future of Deep NLP

    • Multi-sentence processing
    • Multi-task learning
    • Multimodal learning
    • Model interpretability

Suggested Readings

 
 
 
 
 

Week 11: Semisupervised Learning I – Self-supervised learning

02:30 PM - 5:30 PM 1 April 2020 LT13, NTU, Singapore

Lecture Slide

Lecture Recording

Lecture Content

  • Why semi-supervsied?
  • Semisupervised learning dimensions
  • Pre-training and fine-tuning methods

    • CoVe
    • TagLM
    • ELMo
    • GPT
    • ULMfit
    • BERT (+ mBERT)
    • XLM
    • XL-Net
    • BART (+ mBART)
  • Evaluation benchmarks

    • GLUE
    • SQuAD
    • NER
    • SuperGLUE
    • XNLI

Assignment 3 in

Suggested Readings

 
 
 
 
 

Week 10: Seq2Seq Variants and Transformer

02:30 PM - 5:30 PM 25 March 2020 LT13, NTU, Singapore

Lecture Slide

Lecture Recording

Lecture Content

  • Seq2Seq Variants (Pointer nets, Pointer Generator Nets)

    • Machine Translation
    • Summarization
    • Parsing
    • image/video captioning
  • Transformer architecture

    • Self-attention
    • Positional encoding
    • Multi-head attention

Invited talk on Tree-transformer

Suggested Readings

 
 
 
 
 

Week 9: Seq2Seg Models with Attentions

02:30 PM - 5:30 PM 18 March 2020 LT13, NTU, Singapore
 
 
 
 
 

Week 8: Machine translation and Seq2Seg Models

02:30 PM - 5:30 PM 11 March 2020 LT13, NTU, Singapore

Lecture Slide

Video Lecture

Lecture Content

  • Machine translation

    • Early days (1950s)
    • Statistical machine translation or SMT (1990-2010)
    • Alignment in SMT
    • Decoding in SMT
  • Neural machine translation or NMT (2014 - )

  • Encoder-decoder model for NMT

  • Advantages and disadvantages of NMT

  • Greedy vs. beam-search decoding

  • Byte-pair encoding

  • MT evaluation

  • Other applications of Seq2Seq

Practical exercise with Pytorch

Suggested Readings

 
 
 
 
 

Recess Week

02:30 PM - 5:30 PM 4 March 2020 LT13, NTU, Singapore

No Lecture

Assignment 2 in

 
 
 
 
 

Week 7: Recursive Neural Nets & Parsing

02:30 PM - 5:30 PM 26 February 2020 LT13, NTU, Singapore

Project Proposal Instructions in NTU Learn (inside Content)

Lecture Slide

Video Lecture - Part 1

Video Lecture - Part 2

Project Proposal due

Lecture Content

  • Compositionality in language & recursion
  • Recursive vs. recurrent NN
  • Parsing with tree-structured recursive NN
  • Tree LSTMs
  • Backpropagation through tree
  • Other applications of recursive NN

    • Fine-grained sentiment analysis
    • Semantic relationship identification
  • Modern parsers

Practical exercise with Pytorch

  • Sentiment treebank
  • Subject-Verb Agreement

Suggested Readings

 
 
 
 
 

Week 6: Recurrent Neural Nets

02:30 PM - 5:30 PM 19 February 2020 LT13, NTU, Singapore

Lecture Slide

Video Lecture - Part 1

Video Lecture - Part 2

Lecture Content

  • Basic RNN structures
  • Language modeling with RNNs
  • Backpropagation through time
  • Text generation with RNN LM
  • Issues with Vanilla RNNs
  • Exploding gradient
  • Gated Recurrent Units (GRUs) and LSTMs
  • Bidirectional RNNs
  • Multi-layer RNNs
  • Sequence labeling with RNNs
  • Sequence classification with RNNs

Assignment 2 out

Practical exercise with Pytorch

Suggested Readings

 
 
 
 
 

Week 5: Cross-lingual Word Vectors & CNNs

02:30 PM - 5:30 PM 12 February 2020 LT13, NTU, Singapore

Lecture Slide

Slides with recording

Slides with video

Lecture Content

  • Cross-lingual word embeddings
  • Classification tasks in NLP
  • Window-based Approach for language modeling
  • Window-based Approach for NER, POS tagging, and Chunking
  • Convolutional Neural Net for NLP
  • Max-margin Training
  • Scaling Softmax (Adaptive input & output)

Assignment 1 in

Invited talk on cross-lingual word vectors

Practical exercise with Pytorch

Suggested Readings

 
 
 
 
 

Week 4: Word Vectors

02:30 PM - 5:30 PM 5 February 2020 LT13, NTU, Singapore
 
 
 
 
 

Week 3: Neural Network & Optimization Basics

02:30 PM - 5:30 PM 29 January 2020 LT13, NTU, Singapore

Lecture Slide

Lecture Content

  • Why Deep Learning for NLP?

  • From Logistic Regression to Feed-forward NN

    • Activation functions
  • SGD with Backpropagation

  • Adaptive SGD (Adagrad, adam, RMSProp)

  • Regularization (Weight Decay, Dropout, Batch normalization, Gradient clipping)

  • Introduction to Word Vectors

Assignment 1 out

Practical exercise with Pytorch

Numpy notebook Pytorch notebook

  • Backpropagation
  • Dropout
  • Batch normalization
  • Initialization
  • Gradient clipping

Suggested Readings

 
 
 
 
 

Week 2: Machine Learning Basics

02:30 PM - 5:30 PM 22 January 2020 LT13, NTU, Singapore

Lecture Slide

Lecture Content

  • What is Machine Learning?
  • Supervised vs. unsupervised learning
  • Linear Regression
  • Logistic Regression
  • Multi-class classification
  • Parameter estimation (MLE & MAP)
  • Gradient-based optimization & SGD

Practical exercise with Pytorch

 
 
 
 
 

Week 1: Introduction

02:30 PM - 5:30 PM 15 January 2020 LT13, NTU, Singapore

Lecture Slide

Lecture Content

  • What is Natural Language Processing?
  • Why is language understanding difficult?
  • What is Deep Learning?
  • Deep learning vs. other machine learning methods?
  • Why deep learning for NLP?
  • Applications of deep learning to NLP
  • Knowing the target group (background, field of study, programming experience)
  • Expectation from the course

Python & PyTorch Basics

 
 
 
 
 

Futher Reading: Deep Reinforcement Learning for NLP

1 January 2019 LT13, NTU, Singapore
  • What is RL?
  • Key concepts: Rewards, Policy, Value Function
  • What is Deep RL?
  • Policy-based Deep RL

    • Deep Policy Network
    • Policy Gradient
  • Deep Q-Learning

  • Applications of Deep RL in NLP

    • Abstractive summarization
    • Dialogue generation
    • Question answering
    • Multimodal (image and video captioning)
    • Machine translation

[Supplementary reading] Variational Methods for Deep NLP

  • Deep learning meets graphical models
  • Variational autoencoders
  • Variational Generative adversarial nets (GANs)
  • Applications