Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Augment to Prevent: Short-Text Data Augmentation in Deep Learning for Hate-Speech Classification

[...]

Georgios Rizos¹, Konstantin Hemker¹, Björn Schuller¹•Institutions (1)

Imperial College London¹

03 Nov 2019

TL;DR: The proposed framework yields a significant increase in multi-class hate speech detection, outperforming the baseline in the largest online hate speech database by an absolute 5.7% increase in Macro-F1 score and 30% in hate speech class recall.

...read moreread less

Abstract: In this paper, we address the issue of augmenting text data in supervised Natural Language Processing problems, exemplified by deep online hate speech classification. A great challenge in this domain is that although the presence of hate speech can be deleterious to the quality of service provided by social platforms, it still comprises only a tiny fraction of the content that can be found online, which can lead to performance deterioration due to majority class overfitting. To this end, we perform a thorough study on the application of deep learning to the hate speech detection problem: a) we propose three text-based data augmentation techniques aimed at reducing the degree of class imbalance and to maximise the amount of information we can extract from our limited resources and b) we apply them on a selection of top-performing deep architectures and hate speech databases in order to showcase their generalisation properties. The data augmentation techniques are based on a) synonym replacement based on word embedding vector closeness, b) warping of the word tokens along the padded sequence or c) class-conditional, recurrent neural language generation. Our proposed framework yields a significant increase in multi-class hate speech detection, outperforming the baseline in the largest online hate speech database by an absolute 5.7% increase in Macro-F1 score and 30% in hate speech class recall.

...read moreread less

81 citations

Journal Article•DOI•

Identifying antimicrobial peptides using word embedding with deep recurrent neural networks.

[...]

Md-Nafiz Hamid¹, Iddo Friedberg¹•Institutions (1)

Iowa State University¹

01 Jun 2019-Bioinformatics

TL;DR: This work uses word embeddings of protein sequences to represent bacteriocins, and applies a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriOCins from protein sequences without using sequence similarity.

...read moreread less

Abstract: Motivation Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences' low complexity and high variance, which frustrates sequence similarity-based searches. Results Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used. Availability and implementation Data and source code for this project are freely available at: https://github.com/nafizh/NeuBI. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

81 citations

Journal Article•DOI•

An LSTM-Based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level

[...]

Ren-Hung Hwang, Min-Chun Peng, Van-Linh Nguyen, Yu-Lun Chang

19 Aug 2019-Applied Sciences

TL;DR: This research proposes a novel word embedding mechanism to extract packet semantic meanings and adopt LSTM to learn the temporal relation among fields in the packet header and for further classifying whether an incoming packet is normal or a part of malicious traffic.

...read moreread less

Abstract: Recently, deep learning has been successfully applied to network security assessments and intrusion detection systems (IDSs) with various breakthroughs such as using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) to classify malicious traffic. However, these state-of-the-art systems also face tremendous challenges to satisfy real-time analysis requirements due to the major delay of the flow-based data preprocessing, i.e., requiring time for accumulating the packets into particular flows and then extracting features. If detecting malicious traffic can be done at the packet level, detecting time will be significantly reduced, which makes the online real-time malicious traffic detection based on deep learning technologies become very promising. With the goal of accelerating the whole detection process by considering a packet level classification, which has not been studied in the literature, in this research, we propose a novel approach in building the malicious classification system with the primary support of word embedding and the LSTM model. Specifically, we propose a novel word embedding mechanism to extract packet semantic meanings and adopt LSTM to learn the temporal relation among fields in the packet header and for further classifying whether an incoming packet is normal or a part of malicious traffic. The evaluation results on ISCX2012, USTC-TFC2016, IoT dataset from Robert Gordon University and IoT dataset collected on our Mirai Botnet show that our approach is competitive to the prior literature which detects malicious traffic at the flow level. While the network traffic is booming year by year, our first attempt can inspire the research community to exploit the advantages of deep learning to build effective IDSs without suffering significant detection delay.

...read moreread less

80 citations

Journal Article•DOI•

ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding

[...]

Deokgun Park¹, Seungyeon Kim², Jurim Lee³, Jaegul Choo³, Nicholas Diakopoulos⁴, Niklas Elmqvist¹ - Show less +2 more•Institutions (4)

University of Maryland, College Park¹, Google², Korea University³, Northwestern University⁴

01 Jan 2018-IEEE Transactions on Visualization and Computer Graphics

TL;DR: A bipolar concept model and support for specifying irrelevant words are introduced andQuantitative evaluation shows that the bipolar lexicon generated with the ConceptVector methods is comparable to human-generated ones.

...read moreread less

Abstract: Central to many text analysis methods is the notion of a concept : a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.

...read moreread less

80 citations

Journal Article•DOI•

Deep learning and network analysis: Classifying and visualizing accident narratives in construction

[...]

Botao Zhong¹, Xing Pan¹, Peter E.D. Love², Lieyun Ding¹, Weili Fang¹ - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Curtin University²

01 May 2020-Automation in Construction

TL;DR: The proposed automated classification model and LDA-based network analysis method provide a useful approach to enable machine-assisted interpretation of texts-based accident narratives and can provide managers with much-needed information and knowledge to improve safety on-site.

...read moreread less

80 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics