Monday Lecture Series:

Topic: Modeling for Analyzing Document Collection

Speaker: Mitsunori Ogihara, Department of Computer Science, University of Miami

Date and time: May 16th, 11AM

Location: ITE Building, Room 325

Abstract: Topic modeling (in particular, Latent Dirichlet Analysis) is

a technique for analyzing a large collection of documents. In topic

modeling we view each document as a frequency vector over a vocabulary

and each topic as a static distribution over the vocabulary. Given a

desired number, K, of document classes, a topic modeling algorithm

attempts to estimate concurrently K static distributions and for each

document how much each K class contributes. Mathematically, this is

the problem of approximating the matrix generated by stacking the

frequency vectors into the product of two non-negative matrices, where

both the column dimension of the first matrix and the row dimension of

the second matrix are equal to K. Topic modeling is gaining popularity

recently, for analyzing large collections of documents. In this talk

I will present some examples of applying topic modeling: (1) a small

sentiment analysis of a small collection of short patient surveys,

(2) exploratory content analysis of a large collection of letters,

(3) document classification based upon topics and other linguistic

features, and (4) exploratory analysis of a large collection of

literally works. I will speak not only the exact topic modeling steps

but also all the preprocessing steps for preparing the documents for

topic modeling.

Biography: Mitsunori Ogihara is a Professor of Computer Science at the

University of Miami, Coral Gables, Florida. There he directs the Data

Mining Group in the Center for Computational Science, a university-wide

organization for providing resources and consultation for large-scale

computation. He has published three books and approximately 190 papers

in conferences and journals. He is on the editorial board for Theory of

Computing Systems and International Journal of Foundations of Computer

Science. Ogihara received a Ph.D. in Information Sciences from Tokyo

Institute of Technology in 1993 and was a tenure-track/tenured faculty

member in the Department of Computer Science at the University of

Rochester from 1994 to 2007.