Top 2 papers published by Siva Reddy from University of Cambridge in 2012

Proceedings Article•

[...]

Bharat Ram Ambati¹, Siva Reddy², Adam Kilgarriff•Institutions (2)

International Institute of Information Technology, Hyderabad¹, University of York²

01 May 2012

TL;DR: This paper describes the process of collecting a 42 million word corpus, parsing it, and generating word sketches from it, using an existing dependency parser to generate word sketches for Turkish.

...read moreread less

Abstract: Word sketches are one-page, automatic, corpus-based summaries of a word's grammatical and collocational behaviour. In this paper we present word sketches for Turkish. Until now, word sketches have been generated using a purpose-built finite-state grammars. Here, we use an existing dependency parser. We describe the process of collecting a 42 million word corpus, parsing it, and generating word sketches from it. We evaluate the word sketches in comparison with word sketches from a language independent sketch grammar on an external evaluation task called topic coherence, using Turkish WordNet to derive an evaluation set of coherent topics.

...read moreread less

14 citations

Proceedings Article•

[...]

Diana McCarthy¹, Spandana Gella², Siva Reddy•Institutions (2)

Saarland University¹, University of Malta²

07 Jun 2012

TL;DR: The systems are all based on a simple process of identifying the components that correspond between two sentences and the results are promising, with Pearson's coefficients on each individual dataset ranging from .3765 to .7761 for the relatively simple heuristics based systems that do not require training on different datasets.

...read moreread less

Abstract: In this paper we present our systems for the STS task. Our systems are all based on a simple process of identifying the components that correspond between two sentences. Currently we use words (that is word forms), lemmas, distributional similar words and grammatical relations identified with a dependency parser. We submitted three systems. All systems only use open class words. Our first system (alignheuristic) tries to obtain a mapping between every open class token using all the above sources of information. Our second system (wordsim) uses a different algorithm and unlike alignheuristic, it does not use the dependency information. The third system (average) simply takes the average of the scores for each item from the other two systems to take advantage of the merits of both systems. For this reason we only provide a brief description of that. The results are promising, with Pearson's coefficients on each individual dataset ranging from .3765 to .7761 for our relatively simple heuristics based systems that do not require training on different datasets. We provide some analysis of the results and also provide results for our data using Spearman's, which as a nonparametric measure which we argue is better able to reflect the merits of the different systems (average is ranked between the others).

...read moreread less

3 citations

Showing papers by "Siva Reddy published in 2012"