scispace - formally typeset
Search or ask a question

Showing papers by "Sarvnaz Karimi published in 2006"


Book ChapterDOI
11 Oct 2006
TL;DR: A new model of Persian is introduced that takes into account the habit of shortening, or even omitting, runs of English vowels, which makes transliteration of Persian particularly difficult for phonetic based methods.
Abstract: Persian is an Indo-European language written using Arabic script, and is an official language of Iran, Afghanistan, and Tajikistan Transliteration of Persian to English—that is, the character-by-character mapping of a Persian word that is not readily available in a bilingual dictionary—is an unstudied problem In this paper we make three novel contributions First, we present performance comparisons of existing grapheme-based transliteration methods on English to Persian Second, we discuss the difficulties in establishing a corpus for studying transliteration Finally, we introduce a new model of Persian that takes into account the habit of shortening, or even omitting, runs of English vowels This trait makes transliteration of Persian particularly difficult for phonetic based methods This new model outperforms the existing grapheme based methods on Persian, exhibiting a 24% relative increase in transliteration accuracy measured using the top-5 criteria

24 citations