Finding Topics in Emails: Is LDA enough?

Abstract

Our research addresses the task of finding topics at the sentence level in email conversations. As an asynchronous collaborative application, email has its own characteristics which differ from written monologues (e.g., text books, news articles) or spoken dialogs (e.g., meetings). Hence, the generative topic models like Latent Dirichlet Allocation (LDA) and its variations, which are successful in finding topics in monologue or dialog, may not be successful by themselves in asynchronous written conversations like emails. However, an effective combination of LDA with other important features can give us the desired results. We first point out the specific characteristics of emails that we need to consider in order to find the inherent topics discussed in an email conversation. Then we demonstrate why the generative topic models by themselves may not be adequate for this task. We propose a novel graph-theoretic framework to solve the problem. Crucial toour proposed approach is that it captures the discriminative email features and integrates the strengths of the supervised approach with the unsupervised technique considering LDA yet as one of the important factors.

Publication
NIPS-2009 workshop on applications for topic models: text and beyond. Whistler, Canada (poster paper)
Date