scispace - formally typeset
Search or ask a question

Showing papers by "Aly A. Fahmy published in 2010"


01 Jan 2010
TL;DR: The proposed integration approach is an integration approach between two machine learning techniques, namely bootstrapping semi-supervised pattern recognition and Conditional Random Fields (CRF) classifier as a supervised technique that outperforms previous CRF sole work.
Abstract: Named Entity Recognition (NER) task has become essential to improve the performance of many NLP tasks. Its aim is to endeavor a solution to boost accurately the identification of extracted named entities. This paper presents a novel solution for Arabic Named Entity Recognition (ANER) problem. The solution is an integration approach between two machine learning techniques, namely bootstrapping semi-supervised pattern recognition and Conditional Random Fields (CRF) classifier as a supervised technique. The paper solution contributions are the exploit of pattern and word semantic fields as CRF features, the adventure of utilizing bootstrapping semisupervised pattern recognition technique in Arabic Language, and the integration success to improve the performance of its components. Moreover, as per to our knowledge, this proposed integration has not been utilized for NER task of other natural languages. Using 6-fold cross-validation experimental tests, the solution is proved that it outperforms previous CRF sole work

83 citations


Proceedings ArticleDOI
15 Dec 2010
TL;DR: The approach proposed depends on DBN in clustering and classification of continuous input data without using back propagation in the DBN architecture to have a better a performance than the traditional neural network.
Abstract: Deep Belief Network (DBN) is a deep architecture that consists of a stack of Restricted Boltzmann Machines (RBM). The deep architecture has the benefit that each layer learns more complex features than layers before it. DBN and RBM could be used as a feature extraction method also used as neural network with initially learned weights. The approach proposed depends on DBN in clustering and classification of continuous input data without using back propagation in the DBN architecture. DBN should have a better a performance than the traditional neural network due the initialization of the connecting weights rather than just using random weights in NN. Each layer in DBN (RBM) depends on Contrastive Divergence method for input reconstruction which increases the performance of the network.

32 citations


Proceedings Article
28 Mar 2010
TL;DR: The proposed error detection mechanism is applied on top of Buckwalter's Arabic morphological analyzer in order to demonstrate the capability of the approach in detecting possible spelling errors and the correction mechanism adopts a rule-based edit distance algorithm.
Abstract: Spellcheckers are widely used in many software products for identifying errors in users' writings. However, they are not designed to address spelling errors made by non-native learners of a language. As a matter of fact, spelling errors made by non-native learners are more than just misspellings. Non-native learners' errors require special handling in terms of detection and correction, especially when it comes to morphologically rich languages such as Arabic, which have few related resources. In this paper, we address common error patterns made by non-native Arabic learners and suggest a two-layer spell-checking approach, including spelling error detection and correction. The proposed error detection mechanism is applied on top of Buckwalter's Arabic morphological analyzer in order to demonstrate the capability of our approach in detecting possible spelling errors. The correction mechanism adopts a rule-based edit distance algorithm. Rules are designed in accordance with common spelling error patterns made by Arabic learners. Error correction uses a multiple filtering mechanism to propose final corrections. The approach utilizes semantic information given in exercising questions in order to achieve highly accurate detection and correction of spelling errors made by non-native Arabic learners. Finally, the proposed approach was evaluated using real test data and promising results were achieved.

27 citations


Proceedings Article
16 Jul 2010
TL;DR: This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia.
Abstract: In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a language independent way. Classification is done using Support Vectors Machine (SVM) classifier at first, and then the threshold of SVM is adjusted in order to improve the recall scores of classification. Threshold adjustment is performed using beta-gamma threshold adjustment algorithm which is a post learning step that shifts the hyperplane of SVM. This approach boosted recall with minimal effect on precision.

8 citations


Proceedings ArticleDOI
01 Nov 2010
TL;DR: Experimental results on synthetic and real life datasets show that the proposed model of a supervised machine learning approach for classification of a dataset is more efficient and effective than the existing techniques.
Abstract: This paper presents a model of a supervised machine learning approach for classification of a dataset. The model extracts a set of patterns common in a single class from the training dataset according to the rules of the pattern-based subspace clustering technique. These extracted patterns are used to classify the objects of that class in the testing dataset. The user-defined threshold dependence problem in this clustering technique has been resolved in the proposed model. Also this model solve the curse of dimensionality problem without the need of using a separate dimensionality reduction method. Another distinguishing point in this model is its dependence on the variation of the values of relative features among different objects. Experimental results on synthetic and real life datasets show that this approach is more efficient and effective than the existing techniques.

7 citations


Proceedings Article
06 May 2010
TL;DR: This paper addresses issues related to the morphological analysis of ill formed Arabic verbs in order to identify the source of errors and provide an in formative feedback to SLLs of Arabic.
Abstract: Arabic is a language of rich and complex morphology. The nature and peculiarity of Arabic make its morphological and phonological rules confusing for second language learners (SLLs). The conjugation of Arabic verbs is central to the formulation of an Arabic sentence because of its richness of form and meaning. In this paper, we address issues related to the morphological analysis of ill formed Arabic verbs in order to identify the source of errors and provide an in formative feedback to SLLs of Arabic. The edit distance and constraint relaxation techniques are used to demonstrate the capability of the proposed approach in generating all possible analyses of erroneous Arabic verbs written by SLLs. Filtering mechanisms are applied to exclude the irrelevant constructions and determine the target stem. A morphological analyzer has been developed and effectively evaluated using real test data. It achieved satisfactory results in terms of the recall rate.

7 citations


Proceedings ArticleDOI
22 Nov 2010
TL;DR: In this paper, the effect of normalization methods before applying the conventional PCA is introduced and it is shown that correlation PCA that uses the correlation matrix in PCA method could avoid such requirement.
Abstract: Principle Component Analysis (PCA) received a lot of attention over the past years and it is considered as a preprocessing method before many data mining models. PCA depends on the assumption that the input is normally distributed which is not true in many real life cases. On the other hand applying normalization on the input could change the structure of data and then affecting the outcome of multivariate analysis and calibration used in data mining. This paper introduces the effect of normalization methods before applying the conventional PCA. And It declares that the correlation PCA that uses the correlation matrix in PCA method could avoid such requirement. It proves that the correlation PCA leads to a better classification performance when the appropriate number of components is selected. The results also show that the resulted classification performance is independent on the normality of input.

6 citations


Journal Article
TL;DR: This paper presents a semi-automatic Model-Based transformational methodology for multi-platform user interface (MPUI) design that puts dialog modeling in the center of the design process.
Abstract: The wide variety of interactive devices and modalities an interactive system must support has created a big challenge in designing a multi-platform user interface and poses a number of issues for the design cycle of interactive systems. Model-Based User Interface Design (MBUID) approaches can provide a useful support in addressing this problem. In MBUID the user interface is described using various models; each describes a different facet of the user interface. Our methodology is based on task models that are attributed to derive a dialog model, from which different concrete models with different appearances can be generated. This paper presents a semi-automatic Model-Based transformational methodology for multi-platform user interface (MPUI) design. The proposed methodology puts dialog modeling in the center of the design process. A core model is integrated in the design process namely our Dialog-States Model (DSM); which represents our initial step to adapting to multiple target platforms by assigning multiple Dialog-State models to the same task model. A multi-step reification process will be taken from abstract models to more concrete models until reaching a final user interface customized according to the target platform.

4 citations


Proceedings ArticleDOI
01 Dec 2010
TL;DR: A pattern-based classification model which extracts the patterns that have similarity among all objects in a specific class is presented, which handles the problem of the dependence on a user-defined threshold that appears in the pattern- based subspace clustering.
Abstract: The use of patterns in predictive models has received a lot of attention in recent years. This paper presents a pattern-based classification model which extracts the patterns that have similarity among all objects in a specific class. This introduced model handles the problem of the dependence on a user-defined threshold that appears in the pattern-based subspace clustering. The experimental results obtained, show that the overall pattern-based classification accuracy is high compared with other machine learning techniques including Support vector machine, Bayesian Network, multi-layer perception and decision trees.

4 citations


23 Mar 2010
TL;DR: This paper presents a semi-automatic Model-Driven transformational approach to MPUI design that addresses a number of issues for the design cycle of interactive systems.
Abstract: The wide variety of interactive devices and modalities an interactive system must support has created a big challenge in designing a multi-platform user interface (MPUI) and poses a number of issues for the design cycle of interactive systems. This paper presents a semi-automatic Model-Driven transformational approach MPUI design.