scispace - formally typeset
Search or ask a question

Showing papers by "Sarvnaz Karimi published in 2011"


Journal ArticleDOI
TL;DR: This survey reviews the key methodologies introduced in the transliteration literature and categorizes them based on the resources and algorithms used, and the effectiveness is compared.
Abstract: Machine transliteration is the process of automatically transforming the script of a word from a source language to a target language, while preserving pronunciation. The development of algorithms specifically for machine transliteration began over a decade ago based on the phonetics of source and target languages, followed by approaches using statistical and language-specific methods. In this survey, we review the key methodologies introduced in the transliteration literature. The approaches are categorized based on the resources and algorithms used, and the effectiveness is compared.

104 citations


Proceedings ArticleDOI
24 Oct 2011
TL;DR: The derivation presented here for expected 1-call@k provides a novel theoretical perspective on the emergence of diversity via a latent subtopic model of relevance --- an idea underlying both ambiguous and faceted subtopic retrieval that have been used to motivate diverse retrieval.
Abstract: It has been previously observed that optimization of the 1-call@k relevance objective (i.e., a set-based objective that is 1 if at least one document is relevant, otherwise 0) empirically correlates with diverse retrieval. In this paper, we proceed one step further and show theoretically that greedily optimizing expected 1-call@k w.r.t. a latent subtopic model of binary relevance leads to a diverse retrieval algorithm sharing many features of existing diversification approaches. This new result is complementary to a variety of diverse retrieval algorithms derived from alternate rank-based relevance criteria such as average precision and reciprocal rank. As such, the derivation presented here for expected 1-call@k provides a novel theoretical perspective on the emergence of diversity via a latent subtopic model of relevance --- an idea underlying both ambiguous and faceted subtopic retrieval that have been used to motivate diverse retrieval.

22 citations


Proceedings Article
01 Nov 2011
TL;DR: NICTA (National ICT Australia) participated in the Medical Records track of TREC 2011 with seven automatic runs, and stands at rank seven among 109 automatic runs which were submitted by the 29 participating groups.
Abstract: NICTA (National ICT Australia) participated in the Medical Records track of TREC 2011 with seven automatic runs. The main techniques used in our submissions involved using Boolean retrieval for filtering, query transformation, and query expansion. Evaluation of our best run ranks our submissions higher than the median of all systems for this track, and stands at rank seven among 109 automatic runs which were submitted by the 29 participating groups.

15 citations


Proceedings ArticleDOI
24 Jul 2011
TL;DR: This study investigates how the search behavior of domain experts changes based on their previous level of familiarity with a search topic, reporting on a user study of biomedical experts searching for a range of domain-specific material.
Abstract: Users of information retrieval systems employ a variety of strategies when searching for information. One factor that can directly influence how searchers go about their information finding task is the level of familiarity with a search topic. We investigate how the search behavior of domain experts changes based on their previous level of familiarity with a search topic, reporting on a user study of biomedical experts searching for a range of domain-specific material. The results of our study show that topic familiarity can influence the number of queries that are employed to complete a task, the types of queries that are entered, and the overall number of query terms. Our findings suggest that biomedical search systems should enable searching through a variety of querying modes, to support the different search strategies that users were found to employ depending on their familiarity with the information that they are searching for.

6 citations