scispace - formally typeset
Open Access

Integrated Machine Learning Techniques for Arabic Named Entity Recognition

TLDR
The proposed integration approach is an integration approach between two machine learning techniques, namely bootstrapping semi-supervised pattern recognition and Conditional Random Fields (CRF) classifier as a supervised technique that outperforms previous CRF sole work.
Abstract
Named Entity Recognition (NER) task has become essential to improve the performance of many NLP tasks. Its aim is to endeavor a solution to boost accurately the identification of extracted named entities. This paper presents a novel solution for Arabic Named Entity Recognition (ANER) problem. The solution is an integration approach between two machine learning techniques, namely bootstrapping semi-supervised pattern recognition and Conditional Random Fields (CRF) classifier as a supervised technique. The paper solution contributions are the exploit of pattern and word semantic fields as CRF features, the adventure of utilizing bootstrapping semisupervised pattern recognition technique in Arabic Language, and the integration success to improve the performance of its components. Moreover, as per to our knowledge, this proposed integration has not been utilized for NER task of other natural languages. Using 6-fold cross-validation experimental tests, the solution is proved that it outperforms previous CRF sole work

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A survey of arabic named entity recognition and classification

TL;DR: The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated.
Book ChapterDOI

Subjectivity and Sentiment Analysis of Arabic: A Survey

TL;DR: This paper surveys different techniques for SSA for Arabic and describes the main existing techniques and test corpora for Arabic SSA that have been introduced in the literature.
Proceedings Article

CAMeL tools: An open source python toolkit for arabic natural language processing

TL;DR: The design of CAMeL Tools is described and the functionalities it provides are described, including utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis.
Journal ArticleDOI

A machine Learning Approach for Opinion Holder Extraction in Arabic Language

TL;DR: This paper investigates constructing a comprehensive feature set to compensate the lack of parsing structural outcomes in Arabic Language and presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers.
Journal ArticleDOI

A hybrid approach to Arabic named entity recognition

TL;DR: A hybrid named entity recognition (NER) approach that takes the advantages of rule-based and machine learning-based approaches in order to improve the overall system performance and overcome the knowledge elicitation bottleneck and the lack of resources for underdeveloped languages that require deep language processing, such as Arabic.
References
More filters
Journal ArticleDOI

An introduction to variable and feature selection

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Book ChapterDOI

Extracting Patterns and Relations from the World Wide Web

TL;DR: In this article, the authors present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample and test it to extract a relation of (author,title) pairs from the World Wide Web.
Journal ArticleDOI

Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement

TL;DR: It is shown by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance, which is of particular interest to those designing machine learning software libraries and researchers focused onhigh class imbalance.
Related Papers (5)