in other words: Finding scientific topics

2004-11-15

Finding scientific topics

Other keywords: webmining, knowledge extraction

From the PNAS Mapping Knowledge Domains, we find the link under the title of this post. The topic has to do with various means, including probabilistic, by which scientific topics can be mined from a body of literature. I think this idea applies to those notions whereby subject identity is based on various properties, some of which are detected by datamining techniques. Requisite quote:

A first step in identifying the content of a document is determiningwhich topics that document addresses. We describe a generativemodel for documents, introduced by Blei, Ng, and Jordan [Blei,D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn.Res. 3, 993-1022], in which each document is generated by choosinga distribution over topics and then choosing each word in thedocument from a topic selected according to this distribution.We then present a Markov chain Monte Carlo algorithm for inferencein this model. We use this algorithm to analyze abstracts fromPNAS by using Bayesian model selection to establish the numberof topics. We show that the extracted topics capture meaningfulstructure in the data, consistent with the class designationsprovided by the authors of the articles, and outline furtherapplications of this analysis, including identifying "hot topics"by examining temporal dynamics and tagging abstracts to illustratesemantic content.

1 comment:

Bernard Vatant22.11.04
Maybe to use over http://scholar.google.com/
ReplyDelete
Replies

Add comment

Comments welcome