scispace - formally typeset
Search or ask a question

Showing papers by "Gülşen Eryiğit published in 2018"


Joakim Nivre, Mitchell Abrams1, Željko Agić2, Lars Ahrenberg  +261 moreInstitutions (28)
01 Jul 2018

61 citations


Proceedings ArticleDOI
01 Nov 2018
TL;DR: This paper introduces the first study for the detection of Turkish-English code-switching and also a small test data collected from social media in order to smooth the way for further studies.
Abstract: Code-switching (usage of different languages within a single conversation context in an alternative manner) is a highly increasing phenomenon in social media and colloquial usage which poses different challenges for natural language processing. This paper introduces the first study for the detection of Turkish-English code-switching and also a small test data collected from social media in order to smooth the way for further studies. The proposed system using character level n-grams and conditional random fields (CRFs) obtains 95.6% micro-averaged F1-score on the introduced test data set.

23 citations


15 Nov 2018
TL;DR: Universal Dependencies as discussed by the authors is a project that aims to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, crosslingual learning, and parsing research from a language typology perspective.
Abstract: Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).

12 citations


Proceedings ArticleDOI
03 Jul 2018
TL;DR: The proposed system uses support vector machines and solves the CR task with a mention-pair model that basically accepts mention couples and decides on whether they are coreferential with each other or not.
Abstract: This paper presents the state-of-the-art results in Turkish coreference resolution (CR) which is a task of determining sets of mentions which identify the same real-world entity (eg a person, a place, a thing, an event) The proposed system uses support vector machines and solves the CR task with a mention-pair model that basically accepts mention couples and decides on whether they are coreferential with each other or not The results are evaluated on Marmara Turkish Coreference Corpus by using well-known evaluation metrics (viz MUC, B3, BLANC and LEA) The introduced approach obtains F1 scores of 9068% (MUC), 8689% (B3), 8513% (BLANC) and 7834% (LEA) yielding an improvement of 912, 1606, 1308 and 1257 percentage points respectively over a recent baseline system on Turkish CR The paper introduces the system setup (SVM parameters and negative sampling strategy) as well as the selected features and analyzes the impact of these features on the Turkish CR task

3 citations


Book ChapterDOI
09 Jul 2018
TL;DR: This chapter reviews the essential aspects of the first treebank for Turkish that was built in early 2000s and its evolution and extensions since then.
Abstract: In the last three decades, treebanks have become a crucial resource for building and evaluating natural language processing tools and applications. In this chapter, we review the essential aspects of the first treebank for Turkish that was built in early 2000s and its evolution and extensions since then.

2 citations