Showing papers by "Aly A. Fahmy published in 2013"

PDF

Open Access

Journal Article•DOI•

[...]

18 Apr 2013-International Journal of Computer Applications

TL;DR: This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities, and samples of combination between these similarities are presented.

...read moreread less

Abstract: Measuring the similarity between words, sentences, paragraphs and documents is an important component in various tasks such as information retrieval, document clustering, word-sense disambiguation, automatic essay scoring, short answer grading, machine translation and text summarization. This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities. Furthermore, samples of combination between these similarities are presented. General Terms Text Mining, Natural Language Processing. Keywords BasedText Similarity, Semantic Similarity, String-Based Similarity, Corpus-Based Similarity, Knowledge-Based Similarity. NeedlemanWunsch 1. INTRODUCTION Text similarity measures play an increasingly important role in text related research and applications in tasks Nsuch as information retrieval, text classification, document clustering, topic detection, topic tracking, questions generation, question answering, essay scoring, short answer scoring, machine translation, text summarization and others. Finding similarity between words is a fundamental part of text similarity which is then used as a primary stage for sentence, paragraph and document similarities. Words can be similar in two ways lexically and semantically. Words are similar lexically if they have a similar character sequence. Words are similar semantically if they have the same thing, are opposite of each other, used in the same way, used in the same context and one is a type of another. DistanceLexical similarity is introduced in this survey though different String-Based algorithms, Semantic similarity is introduced through Corpus-Based and Knowledge-Based algorithms. String-Based measures operate on string sequences and character composition. A string metric is a metric that measures similarity or dissimilarity (distance) between two text strings for approximate string matching or comparison. Corpus-Based similarity is a semantic similarity measure that determines the similarity between words according to information gained from large corpora. Knowledge-Based similarity is a semantic similarity measure that determines the degree of similarity between words using information derived from semantic networks. The most popular for each type will be presented briefly. This paper is organized as follows: Section two presents String-Based algorithms by partitioning them into two types character-based and term-based measures. Sections three and four introduce Corpus-Based and knowledge-Based algorithms respectively. Samples of combinations between similarity algorithms are introduced in section five and finally section six presents conclusion of the survey.

...read moreread less

718 citations

Proceedings Article•DOI•

Fuzzy clustering and categorization of text documents

[...]

Heba Ayeldeen¹, Aboul Ella Hassanien¹, Aly A. Fahmy¹•Institutions (1)

Cairo University¹

01 Dec 2013

TL;DR: The fuzzy Euclidean distance clustering algorithm has been well studied and used in information retrieval society for clustering documents and cluster-dependent keyword weighting help in partitioning and categorizing theses documents into more meaningful categories.

...read moreread less

Abstract: The fuzzy Euclidean distance clustering algorithm has been well studied and used in information retrieval society for clustering documents. However, the fuzzy logic algorithm poses problems in dealing with large amount of data. In this paper we proposed results for clustering theses documents based on Euclidean distances and cluster-dependent keyword weighting. The proposed approach is based on the Fuzzy Euclidean distance clustering algorithm. The cluster dependent keyword weighting help in partitioning and categorizing the theses documents into more meaningful categories.

...read moreread less

8 citations

Proceedings Article•DOI•

Community detection in social networks by using Bayesian network and Expectation Maximization technique

[...]

Ahmed Ibrahem Hafez¹, Aboul Ella Hassanien², Aly A. Fahmy², Mohamed F. Tolba³•Institutions (3)

Minia University¹, Cairo University², Ain Shams University³

01 Dec 2013

TL;DR: This work adopts the idea and introduces a statistical model of the interactions between social network's actors, and uses Bayesian network (probabilistic graphical model) to show the relation between model variables.

...read moreread less

Abstract: Community detection in complex networks has attracted a lot of attention in recent years. Communities play special roles in the structure-function relationship; therefore, detecting communities can be a way to identify substructures that could correspond to important functions. Social networks can be formalized by a statistical model in which interactions between actors are generated based on some assumptions. We adopt the idea and introduce a statistical model of the interactions between social network's actors, and we use Bayesian network (probabilistic graphical model) to show the relation between model variables. Through the use Expectation Maximization (EM) algorithm, we drive estimates for the model parameters and propose a community detection algorithm based on the EM estimates. The proposed algorithm works well with directed and undirected networks, and with weighted and un-weighted networks. The algorithm yields very promising results when applied to the community detection problem.

...read moreread less

6 citations

Rough Set-Based Identification of Heart Valve Diseases Using Heart Sounds.

[...]

Mostafa A. Salama, Omar S. Soliman, Ilias Maglogiannis, Aboul Ella Hassanien, Aly A. Fahmy - Show less +1 more

01 Jan 2013

3 citations