Discourse Processing and Its Applications in Text Mining --- Tutoral at ICDM-2018

Time: TBD
Location: TBD

Shafiq Joty Giuseppe Carenini Raymond T Ng Gabriel Murray

Tutorial Abstract

Discourse processing is a suite of Natural Language Processing (NLP) tasks to uncover linguistic structures from texts at several levels, which can support many text mining applications. This involves identifying the topic structure, the coherence structure, the coreference structure, and the conversation structure for conversational discourse. Taken together, these structures can inform text summarization, essay scoring, sentiment analysis, machine translation, information extraction, question answering, and thread recovery. The tutorial starts with an overview of basic concepts in discourse analysis – monologue vs. conversation, synchronous vs. asynchronous conversation, and key linguistic structures in discourse analysis. It then covers traditional machine learning methods along with the most recent works using deep learning, and compare their performances on benchmark datasets. For each discourse structure we describe, we show its applications in downstream text mining tasks. Methods and metrics for evaluation are discussed in detail. We conclude the tutorial with an interactive discussion of future challenges and opportunities.

Tutorial Outline


  • Discourse & its different forms
  • Linguistic structures in discourse & discourse analysis tasks recognition
  • Applications of discourse analysis

Discourse Parsing & Its Applications

  • Discourse annotations
  • Discourse parsing with RST
  • Discourse parsing in PDTB
  • Applications of Discourse Parsing

Coffee Break

Coherence Models & Its Applications

  • Coherence models for Texts
  • Coherence models for Conversations
  • Applications (Evaluation tasks)

Conversational Structures

  • Discourse Structures in Conversations
  • Thread identification models for synchronous & asynchronous conversations
  • Speech act recognition models for synchronous & asynchronous conversations
  • Evaluation & Applications

Conclusions & Future Challenges

  • Learning from limited annotated data
  • Language & domain transfer
  • New emerging applications