scispace - formally typeset
Search or ask a question

Showing papers by "Sarvnaz Karimi published in 2009"


Proceedings Article
07 Dec 2009
TL;DR: The authors' PMI score, computed using word-pair co-occurrence statistics from external data sources, has relatively good agreement with human scoring and it is shown that the ability to identify less useful topics can improve the results of a topic-based document similarity metric.
Abstract: Topic models can learn topics that are highly interpretable, semantically-coherent and can be used similarly to subject headings. But sometimes learned topics are lists of words that do not convey much useful information. We propose models that score the usefulness of topics, including a model that computes a score based on pointwise mutual information (PMI) of pairs of words in a topic. Our PMI score, computed using word-pair co-occurrence statistics from external data sources, has relatively good agreement with human scoring. We also show that the ability to identify less useful topics can improve the results of a topic-based document similarity metric.

93 citations


Book ChapterDOI
17 Nov 2009
TL;DR: This work shows how unsupervised topic models are useful for interpreting and understanding MeSH, the Medical Subject Headings applied to articles in MEDLINE and introduces the resampled author model, which captures some of the advantages of both the topic model and the author-topic model.
Abstract: We consider the task of interpreting and understanding a taxonomy of classification terms applied to documents in a collection. In particular, we show how unsupervised topic models are useful for interpreting and understanding MeSH, the Medical Subject Headings applied to articles in MEDLINE. We introduce the resampled author model, which captures some of the advantages of both the topic model and the author-topic model. We demonstrate how topic models complement and add to the information conveyed in a traditional listing and description of a subject heading hierarchy.

30 citations


Proceedings ArticleDOI
06 Nov 2009
TL;DR: The findings show that although Boolean search has limitations, it is not obvious that ranking is superior, and illustrate that a single query cannot be used to resolve an information need.
Abstract: Clinical systematic reviews are based on expert, laborious search of well-annotated literature. Boolean search on bibliographic databases, such as MEDLINE, continues to be the preferred discovery method, but the size of these databases, now approaching 20 million records, makes it impossible to fully trust these searching methods. We are investigating the trade-offs between Boolean and ranked retrieval. Our findings show that although Boolean search has limitations, it is not obvious that ranking is superior, and illustrate that a single query cannot be used to resolve an information need. Our experiments show that a combination of less complicated Boolean queries and ranked retrieval outperforms either of them individually, leading to possible time savings over the current process.

17 citations


01 Dec 2009
TL;DR: It is demonstrated how the topic modeling approach can provide an alternative and complementary view of the relationship between MeSH headings that could be informative and helpful for people searching MEDLINE.
Abstract: We show how topic models are useful for interpreting and understanding MeSH, the Medical Subject Headings applied to articles in MEDLINE. We show how our resampled author model captures some of the advantages of both the topic model and the author-topic model. We demonstrate how the topic modeling approach can provide an alternative and complementary view of the relationship between MeSH headings that could be informative and helpful for people searching MEDLINE.

3 citations