scispace - formally typeset
Search or ask a question
Author

Huihua Liu

Bio: Huihua Liu is an academic researcher from China University of Geosciences (Wuhan). The author has contributed to research in topics: Hidden Markov model & Part of speech. The author has an hindex of 3, co-authored 3 publications receiving 12 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms, and the advantages and the pitfalls of each typical tagging are discussed and analyzed.
Abstract: In natural language processing, a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabeled data. Presented here is a brief state-of-the-art account on POS tagging. POS tagging approaches make use of labeled corpus to train computational trained models. Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms. The advantages and the pitfalls of each typical tagging are discussed and analyzed. Some rule-based and stochastic methods have been successfully achieved accuracies of 93–96 %, while that of some evolution algorithms are about 96–97 %.

9 citations

Proceedings ArticleDOI
01 Nov 2010
TL;DR: A model of Genetic Expression Programming (GEP) for pos tagging, used to search for appropriate structures in function space, can achieve higher accuracy rate than Genetic Algorithm model and HMM model.
Abstract: Text corpora which are tagged with part-of-speech (pos) information are useful in many areas of linguistic research. This paper proposes a model of Genetic Expression Programming (GEP) for pos tagging. GEP is used to search for appropriate structures in function space. After the evolution of sequence of tags, GEP can find the best individual as solution. Before simulation, a set of appropriate parameters of algorithm is fitted. Experiments on Brown Corpus show that the proposed model can achieve higher accuracy rate than Genetic Algorithm model and HMM model.

7 citations

Journal ArticleDOI
TL;DR: A model of uniform-design genetic expression programming (UGEP) for POS tagging (GEP) is proposed, used to search for appropriate structures in function space of POS tagging problems, which can achieve higher accuracy rate and high accuracy rate on unknown words.
Abstract: In natural language processing (NLP), a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective This paper proposes a model of uniform-design genetic expression programming (UGEP) for POS tagging UGEP is used to search for appropriate structures in function space of POS tagging problems After the evolution of sequence of tags, GEP can find the best individual as solution Experiments on Brown Corpus show that (1) in closed lexicon tests, UGEP model can get higher accuracy rate of 988% which is much better than genetic algorithm model, neural networks and hidden Markov model (HMM) model; (2) in open lexicon tests, the proposed model can also achieve higher accuracy rate of 974% and a high accuracy rate on unknown words of 886%

4 citations


Cited by
More filters
Posted Content
TL;DR: This paper presents a novel and realistic method for speeding up the training time of a transformation-based learner without sacrificing performance and shows that this system is able to achieve a significant improvement in training time while still achieving the same performance as a standard transformation- based learner.
Abstract: Transformation-based learning has been successfully employed to solve many natural language processing problems. It achieves state-of-the-art performance on many natural language processing tasks and does not overtrain easily. However, it does have a serious drawback: the training time is often intorelably long, especially on the large corpora which are often used in NLP. In this paper, we present a novel and realistic method for speeding up the training time of a transformation-based learner without sacrificing performance. The paper compares and contrasts the training time needed and performance achieved by our modified learner with two other systems: a standard transformation-based learner, and the ICA system \cite{hepple00:tbl}. The results of these experiments show that our system is able to achieve a significant improvement in training time while still achieving the same performance as a standard transformation-based learner. This is a valuable contribution to systems and algorithms which utilize transformation-based learning at any part of the execution.

220 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the latest POS tagging articles is provided by discussing the weakness and strengths of the proposed approaches as mentioned in this paper , which emphasized various research gaps and presented future recommendations for the research in advancing DL and ML-based POS tagging.
Abstract: Abstract Natural language processing (NLP) tools have sparked a great deal of interest due to rapid improvements in information and communications technologies. As a result, many different NLP tools are being produced. However, there are many challenges for developing efficient and effective NLP tools that accurately process natural languages. One such tool is part of speech (POS) tagging, which tags a particular sentence or words in a paragraph by looking at the context of the sentence/words inside the paragraph. Despite enormous efforts by researchers, POS tagging still faces challenges in improving accuracy while reducing false-positive rates and in tagging unknown words. Furthermore, the presence of ambiguity when tagging terms with different contextual meanings inside a sentence cannot be overlooked. Recently, Deep learning (DL) and Machine learning (ML)-based POS taggers are being implemented as potential solutions to efficiently identify words in a given sentence across a paragraph. This article first clarifies the concept of part of speech POS tagging. It then provides the broad categorization based on the famous ML and DL techniques employed in designing and implementing part of speech taggers. A comprehensive review of the latest POS tagging articles is provided by discussing the weakness and strengths of the proposed approaches. Then, recent trends and advancements of DL and ML-based part-of-speech-taggers are presented in terms of the proposed approaches deployed and their performance evaluation metrics. Using the limitations of the proposed approaches, we emphasized various research gaps and presented future recommendations for the research in advancing DL and ML-based POS tagging.

39 citations

01 Jan 1995
TL;DR: Constructing a query from machine-readable, bilingual dictionaries and assigning term weights by the evolutionary optimization of a population of potential weighting schemes presents a solution to the difficulties of generating translated queries.
Abstract: Multi-lingual information retrieval (IR) systems apply queries in one language to a document collection in several different languages with the goal of retrieving only those documents relevant to the query. At first glance, deep linguistic analysis and translation of the query appears necessary before retrievals can be performed. IR systems are unique in natural language processing, however, because a pattern of term occurrences in a document generally suffices to determine the subject matter; word order is largely irrelevant. Translated queries are therefore primarily derived by a mapping from a word set in the query language to a word set in the language of the derived query. Large parallel text collections with sentencelevel alignments can provide a baseline for evaluating the correctness of a query translation, but the determination of members of the query translation remains problematic. Constructing a query from machine-readable, bilingual dictionaries and assigning term weights by the evolutionary optimization of a population of potential weighting schemes presents a solution to the difficulties of generating translated queries. In this approach, differences in the rank statistics on the comparative recall results for a query against its native language and its translation against its native language determine the fitness of a tentative query translation.

35 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the latest POS tagging articles is provided by discussing the weakness and strengths of the proposed approaches as discussed by the authors , which emphasized various research gaps and presented future recommendations for the research in advancing DL and ML-based POS tagging.
Abstract: Abstract Natural language processing (NLP) tools have sparked a great deal of interest due to rapid improvements in information and communications technologies. As a result, many different NLP tools are being produced. However, there are many challenges for developing efficient and effective NLP tools that accurately process natural languages. One such tool is part of speech (POS) tagging, which tags a particular sentence or words in a paragraph by looking at the context of the sentence/words inside the paragraph. Despite enormous efforts by researchers, POS tagging still faces challenges in improving accuracy while reducing false-positive rates and in tagging unknown words. Furthermore, the presence of ambiguity when tagging terms with different contextual meanings inside a sentence cannot be overlooked. Recently, Deep learning (DL) and Machine learning (ML)-based POS taggers are being implemented as potential solutions to efficiently identify words in a given sentence across a paragraph. This article first clarifies the concept of part of speech POS tagging. It then provides the broad categorization based on the famous ML and DL techniques employed in designing and implementing part of speech taggers. A comprehensive review of the latest POS tagging articles is provided by discussing the weakness and strengths of the proposed approaches. Then, recent trends and advancements of DL and ML-based part-of-speech-taggers are presented in terms of the proposed approaches deployed and their performance evaluation metrics. Using the limitations of the proposed approaches, we emphasized various research gaps and presented future recommendations for the research in advancing DL and ML-based POS tagging.

22 citations