Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Metadata Embeddings for User and Item Cold-start Recommendations

[...]

Maciej Kula

30 Jul 2015-arXiv: Information Retrieval

TL;DR: In this paper, a hybrid matrix factorization model was proposed to represent users and items as linear combinations of their content features' latent factors, which outperformed both collaborative and content-based models in cold start or sparse interaction data scenarios.

...read moreread less

Abstract: I present a hybrid matrix factorisation model representing users and items as linear combinations of their content features' latent factors. The model outperforms both collaborative and content-based models in cold-start or sparse interaction data scenarios (using both user and item metadata), and performs at least as well as a pure collaborative matrix factorisation model where interaction data is abundant. Additionally, feature embeddings produced by the model encode semantic information in a way reminiscent of word embedding approaches, making them useful for a range of related tasks such as tag recommendations.

...read moreread less

45 citations

Journal Article•DOI•

Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings

[...]

Marco Pota¹, Fiammetta Marulli¹, Massimo Esposito¹, Giuseppe De Pietro¹, Hamido Fujita² - Show less +1 more•Institutions (2)

Indian Council of Agricultural Research¹, Iwate Prefectural University²

15 Jan 2019-Knowledge Based Systems

TL;DR: A POS tagging system based on a deep neural network made of a static and task-independent pre-trained model for representing words semantics enriched by morphological information, by approximating the Word Embedding representation learned from an unlabelled corpus by the fastText model is proposed.

...read moreread less

Abstract: Natural Language Processing (NLP) field is taking great advantage from adopting models and methodologies from Artificial Intelligence. In particular, Part-Of-Speech (POS) tagging is a building block for many NLP applications. In this paper, a POS tagging system based on a deep neural network is proposed. It is made of a static and task-independent pre-trained model for representing words semantics enriched by morphological information, by approximating the Word Embedding representation learned from an unlabelled corpus by the fastText model, so as to handle consistently common and known words as well as rare and Out-of-Vocabulary words. A character-level representation of words is dynamically learned according to the POS tagging task, and is concatenated to the previous one. This joint representation is fed to the main network, comprising a Bi-LSTM layer, trained to associate a sequence of tags to a sequence of words. The effectiveness of the contributions of the proposed system with respect to the state-of-the-art is proven by an extensive experimental campaign, which provides evidence that improvements are gained in POS tagging accuracy by using Word Embeddings enriched with morphological information, by estimating embeddings for both known and unknown words, and by concatenating Word Embeddings with character-level information of the same size. Similar trends are obtained for two languages of different characteristics, namely English and Italian: in both cases, the overall accuracy on the POS tagging test set was increased with respect to the most advanced existing systems, with particular improvements on the accuracy of Out-of-Vocabulary words. Finally, the method has a general basis, and could be proficiently used for all languages, particularly for those showing a wide morphological richness.

...read moreread less

45 citations

Journal Article•DOI•

PharmKG: a dedicated knowledge graph benchmark for bomedical data mining

[...]

Shuangjia Zheng¹, Jiahua Rao¹, Ying Song¹, Jixian Zhang, Xianglu Xiao, Evandro Fei Fang², Yuedong Yang¹, Zhangming Niu - Show less +4 more•Institutions (2)

Sun Yat-sen University¹, Akershus University Hospital²

10 Oct 2020-Briefings in Bioinformatics

TL;DR: In this paper, the authors proposed PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500,000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities.

...read moreread less

Abstract: Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.

...read moreread less

45 citations

Journal Article•DOI•

DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction

[...]

Huaizheng Zhang¹, Linsen Dong¹, Guanyu Gao², Han Hu³, Yonggang Wen¹, Kyle Guan⁴ - Show less +2 more•Institutions (4)

Nanyang Technological University¹, Nanjing University of Science and Technology², Beijing Institute of Technology³, Bell Labs⁴

14 Feb 2020-IEEE Transactions on Multimedia

TL;DR: A novel and end-to-end framework to predict video Quality of Experience (QoE) that has the flexibility to fit different datasets, to learn QoE representation, and to perform both classification and regression problems.

...read moreread less

Abstract: Recently, many models have been developed to predict video Quality of Experience (QoE), yet the applicability of these models still faces significant challenges. Firstly, many models rely on features that are unique to a specific dataset and thus lack the capability to generalize. Due to the intricate interactions among these features, a unified representation that is independent of datasets with different modalities is needed. Secondly, existing models often lack the configurability to perform both classification and regression tasks. Thirdly, the sample size of the available datasets to develop these models is often very small, and the impact of limited data on the performance of QoE models has not been adequately addressed. To address these issues, in this work we develop a novel and end-to-end framework termed as DeepQoE. The proposed framework first uses a combination of deep learning techniques, such as word embedding and 3D convolutional neural network (C3D), to extract generalized features. Next, these features are combined and fed into a neural network for representation learning. A learned representation will then serve as input for classification or regression tasks. We evaluate the performance of DeepQoE with three datasets. The results show that for small datasets (e.g., WHU-MVQoE2016 and Live-Netflix Video Database), the performance of state-of-the-art machine learning algorithms is greatly improved by using the QoE representation from DeepQoE (e.g., 35.71% to 44.82%); while for the large dataset (e.g., VideoSet), our DeepQoE framework achieves significant performance improvement in comparison to the best baseline method (90.94% vs. 82.84%). In addition to the much improved performance, DeepQoE has the flexibility to fit different datasets, to learn QoE representation, and to perform both classification and regression problems. We also develop a DeepQoE based adaptive bitrate streaming (ABR) system to verify that our framework can be easily applied to multimedia communication service. The software package of the DeepQoE framework has been released to facilitate the current research on QoE.

...read moreread less

44 citations

Proceedings Article•DOI•

Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts

[...]

Guangxu Xun¹, Yaliang Li¹, Jing Gao¹, Aidong Zhang¹•Institutions (1)

University at Buffalo¹

04 Aug 2017

TL;DR: This paper empirically shows that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models.

...read moreread less

Abstract: A text corpus typically contains two types of context information -- global context and local context. Global context carries topical information which can be utilized by topic models to discover topic structures from the text corpus, while local context can train word embeddings to capture semantic regularities reflected in the text corpus. This encourages us to exploit the useful information in both the global and the local context information. In this paper, we propose a unified language model based on matrix factorization techniques which 1) takes the complementary global and local context information into consideration simultaneously, and 2) models topics and learns word embeddings collaboratively. We empirically show that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models. We also provide qualitative analysis that explains how the cooperation of global and local context information can result in better topic structures and word embeddings.

...read moreread less

44 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics