scispace - formally typeset
Search or ask a question

Showing papers by "Siva Reddy published in 2012"


Proceedings Article
01 May 2012
TL;DR: This paper describes the process of collecting a 42 million word corpus, parsing it, and generating word sketches from it, using an existing dependency parser to generate word sketches for Turkish.
Abstract: Word sketches are one-page, automatic, corpus-based summaries of a word's grammatical and collocational behaviour. In this paper we present word sketches for Turkish. Until now, word sketches have been generated using a purpose-built finite-state grammars. Here, we use an existing dependency parser. We describe the process of collecting a 42 million word corpus, parsing it, and generating word sketches from it. We evaluate the word sketches in comparison with word sketches from a language independent sketch grammar on an external evaluation task called topic coherence, using Turkish WordNet to derive an evaluation set of coherent topics.

14 citations


Proceedings Article
07 Jun 2012
TL;DR: The systems are all based on a simple process of identifying the components that correspond between two sentences and the results are promising, with Pearson's coefficients on each individual dataset ranging from .3765 to .7761 for the relatively simple heuristics based systems that do not require training on different datasets.
Abstract: In this paper we present our systems for the STS task. Our systems are all based on a simple process of identifying the components that correspond between two sentences. Currently we use words (that is word forms), lemmas, distributional similar words and grammatical relations identified with a dependency parser. We submitted three systems. All systems only use open class words. Our first system (alignheuristic) tries to obtain a mapping between every open class token using all the above sources of information. Our second system (wordsim) uses a different algorithm and unlike alignheuristic, it does not use the dependency information. The third system (average) simply takes the average of the scores for each item from the other two systems to take advantage of the merits of both systems. For this reason we only provide a brief description of that. The results are promising, with Pearson's coefficients on each individual dataset ranging from .3765 to .7761 for our relatively simple heuristics based systems that do not require training on different datasets. We provide some analysis of the results and also provide results for our data using Spearman's, which as a nonparametric measure which we argue is better able to reflect the merits of the different systems (average is ranked between the others).

3 citations