Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

[...]

Maxim Topaz¹, Maxim Topaz², Ludmila Murga³, Katherine M. Gaddis⁴, Margaret V. McDonald², Ofrit Bar-Bachar³, Yoav Goldberg⁵, Kathryn H. Bowles² - Show less +4 more•Institutions (5)

Columbia University¹, Visiting Nurse Service of New York², University of Haifa³, University of Pennsylvania⁴, Bar-Ilan University⁵

09 Jan 2019-Journal of Biomedical Informatics

TL;DR: Results of a case study aimed at classifying fall-related information (including fall history, fall prevention interventions, and fall risk) in homecare visit notes indicate that clinical text mining can be implemented without the need for large labeled datasets necessary for other types of machine learning.

...read moreread less

47 citations

Proceedings Article•DOI•

Regularizing Matrix Factorization with User and Item Embeddings for Recommendation

[...]

Thanh Tran¹, Kyumin Lee¹, Yiming Liao², Dongwon Lee²•Institutions (2)

Worcester Polytechnic Institute¹, Pennsylvania State University²

17 Oct 2018

TL;DR: Regularized Multi-Embedding (RME) as discussed by the authors proposes a regularized multi-embedding based recommendation model that simultaneously encapsulates the following ideas via decomposition: (1) which items a user likes, (2) which two users co-like the same items, (3) which item users often co-liked, and (4) item items users often disliked.

...read moreread less

Abstract: Following recent successes in exploiting both latent factor and word embedding models in recommendation, we propose a novel Regularized Multi-Embedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition: (1) which items a user likes, (2) which two users co-like the same items, (3) which two items users often co-liked, and (4) which two items users often co-disliked. In experimental validation, the RME outperforms competing state-of-the-art models in both explicit and implicit feedback datasets, significantly improving Recall@5 by 5.9~7.0%, NDCG@20 by 4.3~5.6%, and MAP@10 by 7.9~8.9%. In addition, under the cold-start scenario for users with the lowest number of interactions, against the competing models, the RME outperforms NDCG@5 by 20.2% and 29.4% in MovieLens-10M and MovieLens-20M datasets, respectively. Our datasets and source code are available at: https://github.com/thanhdtran/RME.git.

...read moreread less

47 citations

Proceedings Article•DOI•

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

[...]

Simran Khanuja¹, Sandipan Dandapat², Anirudh Srinivasan, Sunayana Sitaram², Monojit Choudhury² - Show less +1 more•Institutions (2)

Google¹, Microsoft²

25 Jun 2020

TL;DR: This paper presented an evaluation benchmark, GLUECoS, for code-switched languages, that spans several NLP tasks in English-Hindi and English-Spanish, including language identification from text, POS tagging, Named Entity Recognition, Sentiment Analysis, Question Answering and Natural Language Inference.

...read moreread less

Abstract: Code-switching is the use of more than one language in the same conversation or utterance. Recently, multilingual contextual embedding models, trained on multiple monolingual corpora, have shown promising results on cross-lingual and multilingual tasks. We present an evaluation benchmark, GLUECoS, for code-switched languages, that spans several NLP tasks in English-Hindi and English-Spanish. Specifically, our evaluation benchmark includes Language Identification from text, POS tagging, Named Entity Recognition, Sentiment Analysis, Question Answering and a new task for code-switching, Natural Language Inference. We present results on all these tasks using cross-lingual word embedding models and multilingual models. In addition, we fine-tune multilingual models on artificially generated code-switched data. Although multilingual models perform significantly better than cross-lingual models, our results show that in most tasks, across both language pairs, multilingual models fine-tuned on code-switched data perform best, showing that multilingual models can be further optimized for code-switching tasks.

...read moreread less

47 citations

Journal Article•DOI•

DepecheMood++: a Bilingual Emotion Lexicon Built Through Simple Yet Powerful Techniques

[...]

Oscar Araque¹, Lorenzo Gatti², Jacopo Staiano, Marco Guerini•Institutions (2)

Technical University of Madrid¹, University of Twente²

14 Aug 2019-IEEE Transactions on Affective Computing

TL;DR: DepecheMood++ as discussed by the authors is an extension of an existing and widely used emotion lexicon for English and a novel version of the lexicon, targeting Italian, which can be used to boost performance on datasets and tasks of varying degree of domain-specificity.

...read moreread less

Abstract: Several lexica for sentiment analysis have been developed; while most of these come with word polarity annotations (e.g., positive/negative), attempts at building lexica for finer-grained emotion analysis (e.g., happiness, sadness) have recently attracted significant attention. They are often exploited as a building block for developing emotion recognition learning models, and/or used as baselines to which the performance of the models can be compared. In this work, we contribute two new resources, that we call DepecheMood++ (DM++): a) an extension of an existing and widely used emotion lexicon for English; and b) a novel version of the lexicon, targeting Italian. Furthermore, we show how simple techniques can be used, both in supervised and unsupervised experimental settings, to boost performance on datasets and tasks of varying degree of domain-specificity. Also, we report an extensive comparative analysis against other available emotion lexica and state-of-the-art supervised approaches, showing that DepecheMood++ emerges as the best-performing non-domain-specific lexicon in unsupervised settings. We also observe that simple learning models on top of DM++ can provide more challenging baselines. We finally introduce embedding-based methodologies to perform a) vocabulary expansion to address data scarcity and b) vocabulary porting to new languages in case training data is not available.

...read moreread less

46 citations

Patent•

Document processing employing probabilistic topic modeling of documents represented as text words transformed to a continuous space

[...]

Stéphane Clinchant¹, Florent Perronnin¹•Institutions (1)

Xerox¹

02 Feb 2012

TL;DR: In this article, a set of word embedding transforms are applied to transform text words of an input document into K-dimensional word vectors in order to generate a set or sequence of word vectors representing the input document.

...read moreread less

Abstract: A set of word embedding transforms are applied to transform text words of a set of documents into K-dimensional word vectors in order to generate sets or sequences of word vectors representing the documents of the set of documents. A probabilistic topic model is learned using the sets or sequences of word vectors representing the documents of the set of documents. The set of word embedding transforms are applied to transform text words of an input document into K-dimensional word vectors in order to generate a set or sequence of word vectors representing the input document. The learned probabilistic topic model is applied to assign probabilities for topics of the probabilistic topic model to the set or sequence of word vectors representing the input document. A document processing operation such as annotation, classification, or similar document retrieval may be performed using the assigned topic probabilities.

...read moreread less

46 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics