scispace - formally typeset
Search or ask a question

Showing papers on "Shallow parsing published in 2010"


Journal ArticleDOI
TL;DR: The proposed system, technique for concept relation identification using shallow parsing (CRISP), utilizes a shallow parser to extract semantic knowledge from construction contract documents which can be used to improve electronic document management functions such as document categorization and retrieval.
Abstract: The objective of this research is to present an innovative technique for managing the knowledge contained in construction contract documents to facilitate quick access and efficient use of such knowledge for project management and contract administration tasks. Knowledge Management has become the focus of a lot of scientific research during the second half of the 20th century as researchers discovered the importance of the knowledge resource to business organizations. Despite early expectations of improved document management techniques, document management systems used in the construction industry have failed to deliver the anticipated performance. Recent research attempts to utilize analysis of the contents of documents to improve document categorization and retrieval functions. It is hypothesized that natural language processing can be effectively used to perform document text analysis. The proposed system, technique for concept relation identification using shallow parsing (CRISP), utilizes a shallow parser to extract semantic knowledge from construction contract documents which can be used to improve electronic document management functions such as document categorization and retrieval. When compared with human evaluators, CRISP achieved almost 80% of the average kappa score attained by the evaluators, and approximately 90% of their F-measure score.

70 citations


Journal ArticleDOI
TL;DR: The objective of this work is to develop an NLP infrastructure for Urdu that is customizable and capable of providing basic analysis on which more advanced information extraction tools can be built.
Abstract: There has been an increase in the amount of multilingual text on the Internet due to the proliferation of news sources and blogs. The Urdu language, in particular, has experienced explosive growth on the Web. Text mining for information discovery, which includes tasks such as identifying topics, relationships and events, and sentiment analysis, requires sophisticated natural language processing (NLP). NLP systems begin with modules such as word segmentation, part-of-speech tagging, and morphological analysis and progress to modules such as shallow parsing and named entity tagging. While there have been considerable advances in developing such comprehensive NLP systems for English, the work for Urdu is still in its infancy. The tasks of interest in Urdu NLP includes analyzing data sources such as blogs and comments to news articles to provide insight into social and human behavior. All of this requires a robust NLP system. The objective of this work is to develop an NLP infrastructure for Urdu that is customizable and capable of providing basic analysis on which more advanced information extraction tools can be built. This system assimilates resources from various online sources to facilitate improved named entity tagging and Urdu-to-English transliteration. The annotated data required to train the learning models used here is acquired by standardizing the currently limited resources available for Urdu. Techniques such as bootstrap learning and resource sharing from a syntactically similar language, Hindi, are explored to augment the available annotated Urdu data. Each of the new Urdu text processing modules has been integrated into a general text-mining platform. The evaluations performed demonstrate that the accuracies have either met or exceeded the state of the art.

55 citations


Book ChapterDOI
08 Nov 2010
TL;DR: This paper uses sentiment-annotated lexicon based approach for sentiment analysis in Urdu, and aims to highlight the linguistic as well as technical aspects of this multidimensional research problem.
Abstract: Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language Sentiment analyzers developed for other well-studied languages, like English, are not workable for Urdu, due to their scriptic, morphological, and grammatical differences As a result, this language should be studied as an independent problem domain Our approach towards sentiment analysis is based on the identification and extraction of SentiUnits from the given text, using shallow parsing SentiUnits are the expressions, which contain the sentiment information in a sentence We use sentiment-annotated lexicon based approach Unluckily, for Urdu language no such lexicon exists So, a major part of this research consists in developing such a lexicon Hence, this paper is presented as a base line for this colossal and complex task Our goal is to highlight the linguistic (grammar and morphology) as well as technical aspects of this multidimensional research problem The performance of the system is evaluated on multiple texts and the achieved results are quite satisfactory

51 citations


BookDOI
15 Dec 2010
TL;DR: This book discusses second language processing and parsing in English and second language gap processing in Japanese scrambling under a Simpler Syntax account, and the processing of subject-object ambiguities by English and Dutch L2 learners of German.
Abstract: 1. Preface 2. Part I. Introduction 3. Second language processing and parsing: The issues (by VanPatten, Bill) 4. Part II. Relative clauses and wh-movement 5. Relative clause attachment preferences of Turkish L2 speakers of English: Shallow parsing in the L2? (by Dinctopal-Deniz, Nazik) 6. Evidence of syntactic constraints in the processing of wh-movement: A study of Najdi Arabic learners of English (by Aldwayan, Saad) 7. Constraints on L2 learners' processing of wh-dependencies: Evidence from eye movements (by Cunnings, Ian) 8. Part III. Gender and number 9. The effects of linear distance and working memory on the processing of gender agreement in Spanish (by Keating, Gregory D.) 10. Feature assembly in early stages of L2 acquisition: Processing evidence from L2 French (by Renaud, Claire) 11. Part IV. Subjects and objects 12. Second language processing in Japanese scrambled sentences (by Mitsugi, Sanako) 13. Second language gap processing of Japanese scrambling under a Simpler Syntax account (by Hara, Masahiro) 14. The processing of subject-object ambiguities by English and Dutch L2 learners of German (by Jackson, Carrie N.) 15. Connections between processing, production and placement: Acquiring object pronouns in spanish as a second language (by Malovrh, Paul A.) 16. Part V. Phonology and lexicon 17. The exploitation of fine phonetic detail in the processing of L2 French (by Shoemaker, Ellenor M.) 18. Translation ambiguity: Consequences for learning and processing (by Tokowicz, Natasha) 19. Part VI. Prosody and context 20. Reading aloud in two languages: The interplay of syntax and prosody (by Fernandez, Eva M.) 21. Near-nativelike processing of contrastive focus in L2 French (by Reichle, Robert) 22. Author index 23. Subject index

44 citations


Proceedings Article
02 Jun 2010
TL;DR: This work uses a classification method to aid human annotation of output parses and shows that knowledge about multiword expressions leads to an increase of between 7.5% and 9.
Abstract: There is significant evidence in the literature that integrating knowledge about multiword expressions can improve shallow parsing accuracy. We present an experimental study to quantify this improvement, focusing on compound nominals, proper names and adjective-noun constructions. The evaluation set of multiword expressions is derived from Word-Net and the textual data are downloaded from the web. We use a classification method to aid human annotation of output parses. This method allows us to conduct experiments on a large dataset of unannotated data. Experiments show that knowledge about multiword expressions leads to an increase of between 7.5% and 9.5% in accuracy of shallow parsing in sentences containing these multiword expressions.

32 citations


Proceedings Article
01 May 2010
TL;DR: The paper concentrates on the delimitation of syntactic words (analytical forms, reflexive verbs, discontinuous conjunctions, etc.) and syntactic groups, as well as on problems encountered during the annotation process: syntactic group boundaries, multiword entities, abbreviations, discontinueduous phrases and syntact words.
Abstract: The paper presents the procedure of syntactic annotation of the National Corpus of Polish. The paper concentrates on the delimitation of syntactic words (analytical forms, reflexive verbs, discontinuous conjunctions, etc.) and syntactic groups, as well as on problems encountered during the annotation process: syntactic group boundaries, multiword entities, abbreviations, discontinuous phrases and syntactic words. It includes the complete tagset for syntactic words and the list of syntactic groups recognized in NKJP. The tagset defines grammatical classes and categories according to morphosyntactic and syntactic criteria only. Syntactic annotation in the National Corpus of Polish is limited to making constituents of combinations of words. Annotation depends on shallow parsing and manual post-editing of the results by annotators. Manual annotation is performed by two independents annotators, with a referee in cases of disagreement. The manually constructed grammar, both for syntactic words and for syntactic groups, is encoded in the shallow parsing system Spejd.

30 citations


Proceedings ArticleDOI
23 Aug 2010
TL;DR: A new information extraction system by statistical shallow parsing in unconstrained handwritten documents is introduced that relies on a strong and powerful global handwriting model and is modeled with Hidden Markov Models.
Abstract: In this paper, a new information extraction system by statistical shallow parsing in unconstrained handwritten documents is introduced Unlike classical approaches found in the literature as keyword spotting or full document recognition, our approch relies on a strong and powerful global handwriting model A entire text line is considered as an indivisible entity and is modeled with Hidden Markov Models In this way, text line shallow parsing allows fast extraction of the relevant information in any document while rejecting at the same time irrelevant information First results are promising and show the interest of the approach

25 citations


Dissertation
20 Sep 2010
TL;DR: The results show that it is possible to recognise multiword expressions and decide their compositionality in an unsupervised manner, based on cooccurrence statistics and distributional semantics, andMultiword expressions are beneficial for other fundamental applications of Natural Language Processing either by direct integration or as an evaluation tool.
Abstract: Multiword expressions are expressions consisting of two or more words that correspond to some conventional way of saying things (Manning & Schutze 1999). Due to the idiomatic nature of many of them and their high frequency of occurence in all sorts of text, they cause problems in many Natural Language Processing (NLP) applications and are frequently responsible for their shortcomings. Efficiently recognising multiword expressions and deciding the degree of their idiomaticity would be useful to all applications that require some degree of semantic processing, such as question-answering, summarisation, parsing, language modelling and language generation. In this thesis we investigate the issues of recognising multiword expressions, domainspecific or not, and of deciding whether they are idiomatic. Moreover, we inspect the extent to which multiword expressions can contribute to a basic NLP task such as shallow parsing and ways that the basic property of multiword expressions, idiomaticity, can be employed to define a novel task for Compositional Distributional Semantics (CDS). The results show that it is possible to recognise multiword expressions and decide their compositionality in an unsupervised manner, based on cooccurrence statistics and distributional semantics. Further, multiword expressions are beneficial for other fundamental applications of Natural Language Processing either by direct integration or as an evaluation tool. In particular, termhood-based methods, which are based on nestedness information, are shown to outperform unithood-based methods, which measure the strength of association among the constituents of a multi-word candidate term. A simple heuristic was proved to perform better than more sophisticated methods. A new graph-based algorithm employing sense induction is proposed to address multiword expression compositionality and is shown to perform better than a standard vector space model. Its parameters were estimated by an unsupervised scheme based on graph connectivity. Multiword expressions are shown to contribute to shallow parsing. Moreover, they are used to define a new evaluation task for distributional semantic composition models.

18 citations


Proceedings Article
23 Aug 2010
TL;DR: The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a CRF based sequence classifier for shallow parsing of morphologically rich language- Marathi.
Abstract: Verb suffixes and verb complexes of morphologically rich languages carry a lot of information We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich language- Marathi The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a CRF based sequence classifier Accuracy figures of 94% for Part of Speech Tagging and 97% for Chunking using a modestly sized corpus (20K words) vindicate our claim that for morphologically rich languages linguistic insight can obviate the need for large amount of annotated corpora

17 citations


Book ChapterDOI
01 Jan 2010
TL;DR: How understanding the syntactic and lexical characteristics of this specialised language has practical importance in the development of domain–specific Knowledge Management applications is put in the emphasis.
Abstract: This work is an investigation into the peculiarities of legal language with respect to ordinary language. Based on the idea that a shallow parsing approach can help to provide enough detailed linguistic information, this work presents the results obtained by shallow parsing (i.e. chunking) corpora of Italian and English legal texts and comparing them with corpora of ordinary language. In particular, this paper puts the emphasis of how understanding the syntactic and lexical characteristics of this specialised language has practical importance in the development of domain–specific Knowledge Management applications.

16 citations


Book ChapterDOI
08 Nov 2010
TL;DR: DiSeg is presented, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules, obtaining promising results.
Abstract: Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, obtaining promising results.

Proceedings ArticleDOI
16 Nov 2010
TL;DR: The shallow parsing of isolated text lines allows quick information extraction in any document while rejecting at the same time irrelevant information in unconstrained handwritten documents.
Abstract: In this paper, we introduce an alpha-numerical sequences extraction system (keywords, numerical fields or alpha-numerical sequences) in unconstrained handwritten documents. Contrary to most of the approaches presented in the literature, our system relies on a global handwriting line model describing two kinds of information : i) the relevant information and ii) the irrelevant information represented by a shallow parsing model. The shallow parsing of isolated text lines allows quick information extraction in any document while rejecting at the same time irrelevant information. Results on a public french incoming mails database show the efficiency of the approach.


Proceedings Article
Xian Qian1, Qi Zhang1, Yaqian Zhou1, Xuanjing Huang1, Lide Wu1 
09 Oct 2010
TL;DR: A novel method which integrates graph structures of two sub-tasks into one using virtual nodes, and performs joint training and decoding in the factorized state space is presented.
Abstract: Many sequence labeling tasks in NLP require solving a cascade of segmentation and tagging subtasks, such as Chinese POS tagging, named entity recognition, and so on. Traditional pipeline approaches usually suffer from error propagation. Joint training/decoding in the cross-product state space could cause too many parameters and high inference complexity. In this paper, we present a novel method which integrates graph structures of two sub-tasks into one using virtual nodes, and performs joint training and decoding in the factorized state space. Experimental evaluations on CoNLL 2000 shallow parsing data set and Fourth SIGHAN Bakeoff CTB POS tagging data set demonstrate the superiority of our method over cross-product, pipeline and candidate reranking approaches.

Proceedings Article
Weiwei Sun1
11 Jul 2010
TL;DR: This work proposes semantics-driven shallow parsing, which takes into account both syntactic structures and predicate-argument structures, and introduces several new "path" features to improve shallow parsing based SRL method.
Abstract: One deficiency of current shallow parsing based Semantic Role Labeling (SRL) methods is that syntactic chunks are too small to effectively group words. To partially resolve this problem, we propose semantics-driven shallow parsing, which takes into account both syntactic structures and predicate-argument structures. We also introduce several new "path" features to improve shallow parsing based SRL method. Experiments indicate that our new method obtains a significant improvement over the best reported Chinese SRL result.

01 Jan 2010
TL;DR: This paper describes UAIC’s Question Answering systems participating in the ResPubliQA 2010 competition, designed to answer questions on a juridical corpora in Romanian, English and French monolingual tasks.
Abstract: This paper describes UAIC’s Question Answering systems participating in the ResPubliQA 2010 competition, designed to answer questions on a juridical corpora in Romanian, English and French monolingual tasks. Our systems adhere to the classical architecture of a Question Answering system, with an emphasis on simplicity and real time answers: only shallow parsing was used for question processing, the indexes for the retrieval module were built at coarse-grained paragraph level, and the answer extraction component used simple pattern-based rules and lexical similarity metrics for candidate answer ranking.

01 Jan 2010
TL;DR: A method for classifying Croatian sentences by structure and detecting independent and dependent clauses within these sentences and providing its evaluation is presented and a discussion of the obtained results and future research directions is provided.
Abstract: We present a method for classifying Croatian sentences by structure and detecting independent and dependent clauses within these sentences and provide its evaluation. A prototype system applying the method was implemented by using the NooJ linguistic development environment, both for purposes of this experiment and for further utilization in a prototype rule-based chunking and shallow parsing system for Croatian. With regards to pre-processing, we implemented and evaluated three different approaches to designing the system: (1) no pre-processing of input sentences, (2) automatic morphosyntactic tagging of sentences by using the CroTag stochastic tagger and (3) manual morphosyntactic annotation of input sentences. All three approaches were evaluated for sentence classification and clause detection accuracy in terms of precision and recall. The highest scoring system was the one using sentences with manually assigned morphosyntactic tags as input and it scored an overall F1-measure of 0.861 (P: 0.928, R: 0.813). In the paper, a more detailed discussion of system design and experiment setup is provided, followed by a discussion of the obtained results and future research directions.

DOI
08 Aug 2010
TL;DR: The paper describes Aelred, a web application that demonstrates the use of language technology in the Google App Engine cloud computing environment and a range of linguistic annotations including part-of-speech tagging, shallow parsing, and word sense definitions from WordNet.
Abstract: The paper describes Aelred, a web application that demonstrates the use of language technology in the Google App Engine cloud computing environment. Aelred serves up English literary texts with optional concordances for any word and a range of linguistic annotations including part-of-speech tagging, shallow parsing, and word sense definitions from WordNet. Two alternative approaches are described. In the first approach, annotations are created offline and uploaded to the cloud datastore. In the second approach, annotations are created online within the cloud computing framework. In both cases standard HTML is generated with a template engine so that the annotations can be viewed in ordinary web browsers.

Proceedings ArticleDOI
17 Jan 2010
TL;DR: Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.
Abstract: This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.

01 Jan 2010
TL;DR: In this paper, the authors proposed a hybrid approach, which combines shallow parsing and pattern matching to extract relations between drugs from biomedical texts, and the second approximation is based on a supervised machine learning approach, in particular, kernel methods.
Abstract: A drug-drug interaction occurs when one drug influences the level or activity of another drug. The detection of drug interactions is an important research area in patient safety since these interactions can become very dangerous and increase health care costs. Although there are different databases supporting health care professionals in the detection of drug interactions, this kind of resource is rarely complete. Drug interactions are frequently reported in journals of clinical pharmacology, making medical literature the most effective source for the detection of drug interactions. However, the increasing volume of the literature overwhelms health care professionals trying to keep an up-to-date collection of all reported drug-drug interactions. The development of automatic methods for collecting, maintaining and interpreting this information is crucial to achieving a real improvement in their early detection. Information Extraction techniques can provide an interesting way to reduce the time spent by health care professionals on reviewing the literature. Nevertheless, only a few approaches have tackled the extraction of drug-drug interactions. In this thesis, we have conducted a detailed study about various information extraction techniques applied to biomedical domain. Based on this study, we have proposed two different approximations for the extraction of drug-drug interactions from texts. The first approximation proposes a hybrid approach, which combines shallow parsing and pattern matching to extract relations between drugs from biomedical texts. The second approximation is based on a supervised machine learning approach, in particular, kernel methods. In addition, we have created and annotated the first corpus, DrugDDI, annotated with drug-drug interactions, which allow us to evaluate and compare both approximations. We think the DrugDDI corpus is an important contribution because it could encourage other research groups to investigate in this problem. To the best of our knowledge, the DrugDDI corpus is the only available corpus annotated for drug-drug interactions and this thesis is the first work which addresses the problem of extracting drug-drug interactions from biomedical texts. We have also defined three auxiliary processes to provide crucial information, which will be used by the aforementioned approximations. These auxiliary tasks are as follows: (1) a process for text analysis based on the UMLS MetaMap Transfer tool (MMTx) to provide shallow syntactic and semantic information from texts, (2) a process for drug name recognition and classification, and (3) a process for drug anaphora resolution. Finally, we have developed a pipeline prototype which integrates the different auxiliary processes. The pipeline architecture allows us to easily integrate these modules with each of the approaches proposed in this thesis: pattern-matching or kernels. Several experiments were performed on the DrugDDI corpus. They show while the first approximation based on pattern matching achieves low performance, the approach based on kernel-methods achieves a performance comparable to those obtained by approaches which carry out a similar task as the extraction of protein-protein interactions.

Proceedings ArticleDOI
23 Oct 2010
TL;DR: To improve the retrieval performance, shallow parsing technique for text was introduced and a Chinese Web information retrieval model was designed that evaluates the matching degree between indexed documents and users’ interests based on semantic similarity calculating.
Abstract: To improve the retrieval performance, shallow parsing technique for text was introduced for Chinese Web information retrieval. Firstly, predicate, prepositive nominal component and succedent nominal component close to the predicate were extracted from Chinese sentence. Then, semantic vector of Chinese text was acquired based on converting predicate and nominal component to conception. An algorithm was presented for similarity calculating of semantic vector, and a Chinese Web information retrieval model was designed. The model evaluates the matching degree between indexed documents and users’ interests based on semantic similarity calculating. Users’ interests were expressed by delivering representative documents. Experimental results show that the precision is improved observably compared with the popular Web search engine.

Journal ArticleDOI
TL;DR: Conditional random fields model is the valid probabilistic model to segment and label sequence data and can be used to realize chunk analysis and entities relation extraction in Chinese text.
Abstract: Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.

Proceedings Article
01 Jan 2010
TL;DR: The current paper is mainly focused on testing the suitability of PNEPs to shallow parsing, which is to analyze the main components of the sentences rather than complete sentences.
Abstract: PNEPs (Parsing Networks of Evolutionary Processors) extend NEPs with context free (instead of substituting) rules, leftmost derivation, bad terminals check and indexes to rebuild the derivation tree. It is possible to build a PNEP from any context free grammar without additional constraints, able to generate all the different derivations for ambiguous grammars with a temporal performance bound by the depth of the derivation tree. One of the main difficulties encountered by parsing techniques when building complete parsing trees for natural languages is the spatial and temporal performance of the analysis. Shallow parsing tries to overcome these difficulties. The goal of shallow parsing is to analyze the main components of the sentences (for example, noun groups, verb groups, etc.) rather than complete sentences. The current paper is mainly focused on testing the suitability of PNEPs to shallow parsing.

Journal Article
TL;DR: Discusses the integration of statistical learning method and artificial rule method for PP recognition based on several typical PP recognition model in the shallow parsing level, and proposes that the combination of statistical Learning methods and artificial rules methods is the future direction of development.
Abstract: In recognition of prepositional phrases,statistical learning method and artificial rules method are the two major methods used.Discusses the integration of statistical learning method and artificial rule method for PP recognition based on several typical PP recognition model in the shallow parsing level,and then points out that the feature extraction is an abstract of the pragmatic rules based on corpus.Proposes that the combination of statistical learning methods and artificial rule methods is the future direction of development.

Journal Article
TL;DR: KorLexClas 1.5 is described, which provides a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeralclassifier, and is expected to be used in a variety of NLP applications, including MT.
Abstract: This paper aims to describe KorLexClas 1.5 which provides us with a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeral classifier. Differently from KorLex of other POS, of which the structure depends largely on their reference model (Princeton WordNet), KorLexClas 1.0 and its extended version 1.5 adopt a direct building method. They demand a considerable time and expert knowledge to establish the hierarchies of numeral classifiers and the relationships between lexical items. For the efficiency of construction as well as the reliability of KorLexClas 1.5, we use following processes: (1) to use various language resources while their cross-checking for the selection of classifier candidates; (2) to extend the list of numeral classifiers by using a shallow parsing techniques; (3) to set up the hierarchies of the numeral classifiers based on the previous linguistic studies; and (4) to determine LUB(Least Upper Bound) of the numeral classifiers in KorLexNoun 1.5. The last process provides the open list of the co-occurring nouns for KorLexClas 1.5 with the extensibility. KorLexClas 1.5 is expected to be used in a variety of NLP applications, including MT.

01 Jan 2010
TL;DR: The approach is an effective way to parse a tennis game from a stream of events with minimal human intervention, and makes use of some extra contextual information, namely the time gap between two adjacent match events, which is in itself a reasonable indicator of segmentation.
Abstract: This paper proposes a method to infer the syntactical units of a sports game (tennis) from a stream of game events. We assume that we are given a sequence of events within the game (examples of events are “serve”, “rally”, “score announcement” etc.), with their durations, and our goal is to segment them into “units” that are meaningful for the game, such as a “point”. Such a segmentation is essential for understanding the way that the events relate to each other, and hence for inferring automatically the structure of the game. We use a multi-gram based technique to segment the event steam into variable-length sequences by estimating the optimal (maximum-likelihood) segmentation using the Viterbi algorithm. We then make use of some extra contextual information, namely the time gap between two adjacent match events, which is in itself a reasonable indicator of segmentation. By integrating this feature into the multigram segmentation, we considerably enhance segmentation performance. The results show that our approach is an effective way to parse a tennis game from a stream of events with minimal human intervention. Keywords-Shallow parsing; variable-length unit; segmentation; game learning;

Dissertation
01 Jan 2010
TL;DR: This page needs a pagination widget to browse across all the documents, and since only 10 results are displayed per page, this page needs to have this widget.
Abstract: widget. Since only 10 results are displayed per page, we also need a pagination widget to browse across all the documents.