scispace - formally typeset
Proceedings ArticleDOI

Machine Learning Models for Paraphrase Identification and its Applications on Plagiarism Detection

TLDR
Among the compared models, as expected, Recurrent Neural Network is best suited for the paraphrase identification task and it is proposed that Plagiarism detection is one of the areas where Paraphrase Identification can be effectively implemented.
Abstract
Paraphrase Identification or Natural Language Sentence Matching (NLSM) is one of the important and challenging tasks in Natural Language Processing where the task is to identify if a sentence is a paraphrase of another sentence in a given pair of sentences. Paraphrase of a sentence conveys the same meaning but its structure and the sequence of words varies. It is a challenging task as it is difficult to infer the proper context about a sentence given its short length. Also, coming up with similarity metrics for the inferred context of a pair of sentences is not straightforward as well. Whereas, its applications are numerous. This work explores various machine learning algorithms to model the task and also applies different input encoding scheme. Specifically, we created the models using Logistic Regression, Support Vector Machines, and different architectures of Neural Networks. Among the compared models, as expected, Recurrent Neural Network (RNN) is best suited for our paraphrase identification task. Also, we propose that Plagiarism detection is one of the areas where Paraphrase Identification can be effectively implemented.

read more

Citations
More filters
Journal ArticleDOI

Deep Physical Informed Neural Networks for Metamaterial Design

TL;DR: A physical informed neural network approach for designing the electromagnetic metamaterial and a method to solve high frequency Helmholtz equation, which is widely used in physics and engineering is proposed.
Journal ArticleDOI

BLSTM-API: Bi-LSTM Recurrent Neural Network-Based Approach for Arabic Paraphrase Identification

TL;DR: In this paper, an Arabic extrinsic paraphrase identification method is proposed based on a Siamese recurrent neural networks architecture, which is useful for identifying semantic similarity between the obtained source and suspect vectors.
Journal ArticleDOI

An Evolutionary Approach to Compact DAG Neural Network Optimization

TL;DR: This work proposes the use of compact directed acyclic graph neural networks (DAG-NNs) and an evolutionary approach for automating the optimization of their structure and parameters and demonstrates that this approach consistently outperforms conventional neural networks, even while employing fewer nodes.
Proceedings ArticleDOI

A Study of Ensemble Methods for Cyber Security

TL;DR: This study looks at the advantages of ensemble methods when applied to the cybersecurity domain by using the widely used NSL-KDD intrusion detection dataset, specifically the algorithms experimented with are the Voting classifier, boosting, Random forest classifier and AdaBoost classifier.
Proceedings ArticleDOI

Segregating Hazardous Waste Using Deep Neural Networks in Real-Time Video

TL;DR: Through the use of machine learning, the model is able to identify hazardous objects and recyclable items within a pile of trash to help protect all individuals.
References
More filters
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content

SQuAD: 100,000+ Questions for Machine Comprehension of Text

TL;DR: The Stanford Question Answering Dataset (SQuAD) as mentioned in this paper is a reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.
Proceedings ArticleDOI

SQuAD: 100,000+ Questions for Machine Comprehension of Text

TL;DR: The Stanford Question Answering Dataset (SQuAD) as mentioned in this paper is a reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.
Journal ArticleDOI

Privacy-preserving data mining

TL;DR: This work considers the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed and proposes a novel reconstruction procedure to accurately estimate the distribution of original data values.
Related Papers (5)