Home
/
Authors
/
Mamdouh Farouk

Author

Mamdouh Farouk

Bio: Mamdouh Farouk is an academic researcher from Assiut University. The author has contributed to research in topics: Similarity (network science) & Word embedding. The author has an hindex of 5, co-authored 8 publications receiving 49 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Measuring Sentences Similarity: A Survey

[...]

Mamdouh Farouk¹•Institutions (1)

Assiut University¹

06 Oct 2019-arXiv: Computation and Language

TL;DR: Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity, but structure based similarity that measures similarity between sentences structures needs more investigation.

...read moreread less

Abstract: This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity. Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences structures needs more investigation.

...read moreread less

26 citations

Journal Article•DOI•

Measuring Sentences Similarity: A Survey

[...]

Mamdouh Farouk

01 Jul 2019-Indian journal of science and technology

TL;DR: In this paper, a survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories: word-to-word based, structure-based, and vector-based.

...read moreread less

Abstract: Objective/Methods: This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structurebased, and vector-based are the most widely used approaches to find sentences similarity. Findings/Application: Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences’ structures needs more investigation. Keywords: Sentence Representation, Sentences Similarity, Structural Similarity, Word Embedding, Words Similarity

...read moreread less

25 citations

Journal Article•DOI•

Measuring text similarity based on structure and word embedding

[...]

Mamdouh Farouk¹•Institutions (1)

Assiut University¹

01 Oct 2020-Cognitive Systems Research

TL;DR: The proposed approach combines different similarity measures in the calculation of sentence similarity and exploits sentence semantic structure to improve the accuracy of the sentence similarity calculation.

...read moreread less

23 citations

Proceedings Article•DOI•

Sentence Semantic Similarity based on Word Embedding and WordNet

[...]

Mamdouh Farouk¹•Institutions (1)

Assiut University¹

01 Dec 2018

TL;DR: This paper combines the using of pre-trained word vector and WordNet to measure semantic similarity between two sentences and achieves better results comparing with other approaches previously proposed to measure sentence similarity.

...read moreread less

Abstract: Semantic similarity between sentences is a crucial task for many applications. The emerging of word embedding encourages calculating similarity between words and between sentences based on the new semantic word representation. On the other hand, WordNet is widely used to find semantic distance between sentences. This paper combines the using of pre-trained word vector and WordNet to measure semantic similarity between two sentences. In addition, word order similarity is applied to make the final similarity more accurate. The proposed approach has been implemented and tested using standard datasets. Experiments show that presented methods achieves better results comparing with other approaches previously proposed to measure sentence similarity.

...read moreread less

17 citations

Book Chapter•DOI•

Graph Matching Based Semantic Search Engine

[...]

Mamdouh Farouk¹, Mitsuru Ishizuka², Danushka Bollegala³•Institutions (3)

Assiut University¹, National Institute of Informatics², University of Liverpool³

23 Oct 2018

TL;DR: This work proposes a search engine for searching Web data represented in UNL (Universal Networking Language) based on semantic graph matching, which includes semantic expansion for graph nodes and relation matching based on relation meaning.

...read moreread less

Abstract: Explosive growth of the Web has made searching Web data a challenging task for information retrieval systems. Semantic search systems that go beyond the shallow keyword matching approaches and map words to their conceptual meaning representations offer better results to the users. On the other hand, a lot of representation formats have been specified to represent Web data into a semantic format. We propose a search engine for searching Web data represented in UNL (Universal Networking Language). UNL has numerous attractive features to support semantic search. One of the main features is that UNL does not depend on domain ontology. Our proposed search engine is based on semantic graph matching. It includes semantic expansion for graph nodes and relation matching based on relation meaning. The search results are ranked depending on the semantic similarity between the user query and the retrieved documents. We developed a prototype implementing the proposed semantic search engine, and our evaluations demonstrate its effectiveness across a wide-range of semantic search tasks.

...read moreread less

10 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A large language model for electronic health records

[...]

Xi Yang, Aokun Chen, Nima M. Pournejatian, Hoo-Chang Shin, Kaleb E. Smith, Christopher Parisien, Colin B. Compas, Cheryl Martin, Anthony Costa, Mona Flores, Ying Zhang, Tanja Magoc, Christopher A. Harle, Gloria Lipori, Duane Mitchell, William R. Hogan, Elizabeth Shenkman, Jiang Bian, Yonghui Wu - Show less +15 more

01 Dec 2022-npj digital medicine

TL;DR: GatorTron as mentioned in this paper uses >90 billion words of text and systematically evaluates it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference, and medical question answering (MQA).

...read moreread less

Abstract: There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model-GatorTron-using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og .

...read moreread less

69 citations

Journal Article•DOI•

Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models

[...]

Xi Yang¹, Xing He¹, Hansi Zhang¹, Yinghan Ma¹, Jian-Guo Bian¹, Yonghui Wu¹ - Show less +2 more•Institutions (1)

University of Florida¹

23 Nov 2020-JMIR medical informatics

TL;DR: This study demonstrated the efficiency of utilizing transformer-based models to measure semantic similarity for clinical text using Bidirectional Encoder Representations from Transformers, XLNet, and Robustly optimized BERT approach.

...read moreread less

Abstract: Background: Semantic textual similarity (STS) is one of the fundamental tasks in natural language processing (NLP). Many shared tasks and corpora for STS have been organized and curated in the general English domain; however, such resources are limited in the biomedical domain. In 2019, the National NLP Clinical Challenges (n2c2) challenge developed a comprehensive clinical STS dataset and organized a community effort to solicit state-of-the-art solutions for clinical STS. Objective: This study presents our transformer-based clinical STS models developed during this challenge as well as new models we explored after the challenge. This project is part of the 2019 n2c2/Open Health NLP shared task on clinical STS. Methods: In this study, we explored 3 transformer-based models for clinical STS: Bidirectional Encoder Representations from Transformers (BERT), XLNet, and Robustly optimized BERT approach (RoBERTa). We examined transformer models pretrained using both general English text and clinical text. We also explored using a general English STS dataset as a supplementary corpus in addition to the clinical training set developed in this challenge. Furthermore, we investigated various ensemble methods to combine different transformer models. Results: Our best submission based on the XLNet model achieved the third-best performance (Pearson correlation of 0.8864) in this challenge. After the challenge, we further explored other transformer models and improved the performance to 0.9065 using a RoBERTa model, which outperformed the best-performing system developed in this challenge (Pearson correlation of 0.9010). Conclusions: This study demonstrated the efficiency of utilizing transformer-based models to measure semantic similarity for clinical text. Our models can be applied to clinical applications such as clinical text deduplication and summarization.

...read moreread less

30 citations

Journal Article•DOI•

Measuring Sentences Similarity: A Survey

[...]

Mamdouh Farouk¹•Institutions (1)

Assiut University¹

06 Oct 2019-arXiv: Computation and Language

...read moreread less

26 citations

Journal Article•DOI•

Measuring text similarity based on structure and word embedding

[...]

Mamdouh Farouk¹•Institutions (1)

Assiut University¹

01 Oct 2020-Cognitive Systems Research

...read moreread less

23 citations

Posted Content•DOI•

GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records

[...]

Xi Yang, Nima M. Pournejatian, Hoo-Chang Shin, Kaleb E. Smith, Christopher Parisien, Colin B. Compas, Cheryl Martin, Mona Flores, Tanja Magoc, Christopher A. Harle, Gloria Lipori, Duane Mitchell, William R. Hogan, Elizabeth Shenkman, Jiang Bian, Yonghui Wu - Show less +12 more

02 Feb 2022-medRxiv

TL;DR: GatorTron as discussed by the authors is the largest transformer model in the clinical domain that scaled up from the previous 110 million to 8.9 billion parameters and achieved state-of-the-art performance on the 5 clinical NLP tasks targeting various healthcare information documented in EHRs.

...read moreread less

Abstract: There is an increasing interest in developing massive-size deep learning models in natural language processing (NLP) - the key technology to extract patient information from unstructured electronic health records (EHRs). However, there are limited studies exploring large language models in the clinical domain; the current largest clinical NLP model was trained with 110 million parameters (compared with 175 billion parameters in the general domain). It is not clear how large-size NLP models can help machines understand patients' clinical information from unstructured EHRs. In this study, we developed a large clinical transformer model - GatorTron - using >90 billion words of text and evaluated it on 5 clinical NLP tasks including clinical concept extraction, relation extraction, semantic textual similarity, natural language inference, and medical question answering. GatorTron is now the largest transformer model in the clinical domain that scaled up from the previous 110 million to 8.9 billion parameters and achieved state-of-the-art performance on the 5 clinical NLP tasks targeting various healthcare information documented in EHRs. GatorTron models perform better in understanding and utilizing patient information from clinical narratives in ways that can be applied to improvements in healthcare delivery and patient outcomes.

...read moreread less

19 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Collapse