rubikzube

software engineer ¤ yogi ¤ turban cowboy

Monday, February 06, 2006

The Informatics power hour

For one hour each week, the Informatics department has a research presentation. Padhraic Smyth, a faculty member here at Irvine, blew me away with a presentation on probabilistic topic models last Friday.

Probabilistic topic models are a machine learning technique whereby a computer discovers topics in a collection of documents by analyzing the occurrence of words in the documents and the relevance of certain words to each other. Pulling this into the real world, Smyth had an amazing presentation where he analyzed large sets of historical news archives and then moved on to corporate email, sifting each document set into relevant topics and tracing trends in topics over time.

The applications of this technique are staggering when you think of the amount of unsorted data that's floating around both in the ether and in libraries around world. Automatic indexing and cross-referencing of collections of documents by topic is an unbelievably powerful search and visualization tool. Just think of all of the potential research that this could enable.

0 Comments:

Post a Comment