Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Automatically segmenting images based on natural language phrases

[...]

Lin Zhe¹, Lu Xin¹, Shen Xiaohui¹, Jimei Yang¹, Chenxi Liu¹ - Show less +1 more•Institutions (1)

Adobe Systems¹

19 Jan 2018

TL;DR: In this paper, a fully convolutional neural network identifies and encodes the image features and a word embedding model generates the token vectors, and a recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the word vectors.

...read moreread less

Abstract: The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the token vectors. The segmentation map identifies which pixels are included in an image region referenced by the n-gram. A segmented image is generated based on the segmentation map. The RNN may be a convolutional multimodal RNN. A separate RNN, such as a long short-term memory network, may iteratively update an encoding of semantic features based on the order of tokens. The first RNN may update the segmentation map based on the semantic feature encoding.

...read moreread less

21 citations

Posted Content•

Solving Verbal Comprehension Questions in IQ Test by Knowledge-Powered Word Embedding

[...]

Huazheng Wang¹, Bin Gao², Jiang Bian², Fei Tian¹, Tie-Yan Liu² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

29 May 2015-arXiv: Computation and Language

TL;DR: Experimental results have shown that the proposed framework can not only outperform existing methods for solving verbal comprehension questions but also exceed the average performance of the Amazon Mechanical Turk workers involved in the study.

...read moreread less

Abstract: Intelligence Quotient (IQ) Test is a set of standardized questions designed to evaluate human intelligence. Verbal comprehension questions appear very frequently in IQ tests, which measure human's verbal ability including the understanding of the words with multiple senses, the synonyms and antonyms, and the analogies among words. In this work, we explore whether such tests can be solved automatically by artificial intelligence technologies, especially the deep learning technologies that are recently developed and successfully applied in a number of fields. However, we found that the task was quite challenging, and simply applying existing technologies (e.g., word embedding) could not achieve a good performance, mainly due to the multiple senses of words and the complex relations among words. To tackle these challenges, we propose a novel framework consisting of three components. First, we build a classifier to recognize the specific type of a verbal question (e.g., analogy, classification, synonym, or antonym). Second, we obtain distributed representations of words and relations by leveraging a novel word embedding method that considers the multi-sense nature of words and the relational knowledge among words (or their senses) contained in dictionaries. Third, for each type of questions, we propose a specific solver based on the obtained distributed word representations and relation representations. Experimental results have shown that the proposed framework can not only outperform existing methods for solving verbal comprehension questions but also exceed the average performance of the Amazon Mechanical Turk workers involved in the study. The results indicate that with appropriate uses of the deep learning technologies we might be a further step closer to the human intelligence.

...read moreread less

21 citations

Journal Article•DOI•

Bag of meta-words: A novel method to represent document for the sentiment classification

[...]

Mingsheng Fu¹, Hong Qu¹, Li Huang¹, Li Lu¹•Institutions (1)

University of Electronic Science and Technology of China¹

15 Dec 2018-Expert Systems With Applications

TL;DR: The results show that the performance of the proposed bag of meta-words (BoMW) method can exceed the traditional VSM methods and methods using pre-trained word embedding.

...read moreread less

Abstract: It is crucial to represent the semantic information of a document in sentiment classification. Various semantic information representation models have been proposed, however existing approaches have their setbacks. Notable weaknesses among these are: (1) tradition VSM methods, completely ignore the semantic information; (2) averaging word embedding methods, cannot depict the synthetical semantic meaning of the given document; (3) neural network methods, require complex structure and are notoriously difficult to be trained. To overcome these limitations, we introduce a simple but novel method which we call bag of meta-words (BoMW). In our method, the semantic information of the document is indicated by a meta-words vector in which every single meta-word element denotes particular semantic information. Especially, these meta-words are extracted from pre-trained word embeddings through two different but complemental models, naive interval meta-words (NIM) and feature combination meta-words (FCM). In general, our new model BoMW is as simple as traditional VSM model but it can capture the synthetical semantic meanings of the document. Numerous experiments on two benchmarks (IMDB dataset and Pang’s dataset) are carried out to verify the effectiveness of the proposed method, and the results show that the performance of our method can exceed the traditional VSM methods and methods using pre-trained word embedding.

...read moreread less

21 citations

Proceedings Article•DOI•

Relating Word Embedding Gender Biases to Gender Gaps: A Cross-Cultural Analysis

[...]

Scott E. Friedman, Sonja Schmer-Galunder, Anthony Chen, Jeffrey M. Rye

01 Aug 2019

TL;DR: This paper quantified gender bias in word embeddings and then used them to characterize statistical gender gaps in education, politics, economics, and health, and validated these metrics on 2018 Twitter data spanning 51 U.S. regions and 99 countries.

...read moreread less

Abstract: Modern models for common NLP tasks often employ machine learning techniques and train on journalistic, social media, or other culturally-derived text. These have recently been scrutinized for racial and gender biases, rooting from inherent bias in their training text. These biases are often sub-optimal and recent work poses methods to rectify them; however, these biases may shed light on actual racial or gender gaps in the culture(s) that produced the training text, thereby helping us understand cultural context through big data. This paper presents an approach for quantifying gender bias in word embeddings, and then using them to characterize statistical gender gaps in education, politics, economics, and health. We validate these metrics on 2018 Twitter data spanning 51 U.S. regions and 99 countries. We correlate state and country word embedding biases with 18 international and 5 U.S.-based statistical gender gaps, characterizing regularities and predictive strength.

...read moreread less

21 citations

Proceedings Article•DOI•

Modeling Discourse Structure for Document-level Neural Machine Translation

[...]

Junxuan Chen, Li Xiang, Jiarui Zhang, Chulun Zhou, Jianwei Cui, Bin Wang, Jinsong Su - Show less +3 more

08 Jun 2020

TL;DR: The authors propose to improve document-level NMT with the aid of discourse structure information by using a Transformer-based path encoder to embed the discourse structure of each word in the input document.

...read moreread less

Abstract: Recently, document-level neural machine translation (NMT) has become a hot topic in the community of machine translation. Despite its success, most of existing studies ignored the discourse structure information of the input document to be translated, which has shown effective in other tasks. In this paper, we propose to improve document-level NMT with the aid of discourse structure information. Our encoder is based on a hierarchical attention network (HAN) (Miculicich et al., 2018). Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder. Experimental results on the English-to-German dataset show that our model can significantly outperform both Transformer and Transformer+HAN.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics