Institution
Alibaba Group
Company•Hangzhou, China•
About: Alibaba Group is a company organization based out in Hangzhou, China. It is known for research contribution in the topics: Computer science & Terminal (electronics). The organization has 6810 authors who have published 7389 publications receiving 55653 citations. The organization is also known as: Alibaba Group Holding Limited & Alibaba Group (Cayman Islands).
Topics: Computer science, Terminal (electronics), Graph (abstract data type), Node (networking), Deep learning
Papers published on a yearly basis
Papers
More filters
•
TL;DR: A new neural model, Knowledge-Enriched Answer Generator (KEAG), which is able to compose a natural answer by exploiting and aggregating evidence from all four information sources available: question, passage, vocabulary and knowledge.
Abstract: Commonsense and background knowledge is required for a QA model to answer many nontrivial questions. Different from existing work on knowledge-aware QA, we focus on a more challenging task of leveraging external knowledge to generate answers in natural language for a given question with context.
In this paper, we propose a new neural model, Knowledge-Enriched Answer Generator (KEAG), which is able to compose a natural answer by exploiting and aggregating evidence from all four information sources available: question, passage, vocabulary and knowledge. During the process of answer generation, KEAG adaptively determines when to utilize symbolic knowledge and which fact from the knowledge is useful. This allows the model to exploit external knowledge that is not explicitly stated in the given text, but that is relevant for generating an answer. The empirical study on public benchmark of answer generation demonstrates that KEAG improves answer quality over models without knowledge and existing knowledge-aware models, confirming its effectiveness in leveraging knowledge.
26 citations
••
TL;DR: Wang et al. as mentioned in this paper proposed an attribute-specific embedding network (ASEN) to jointly learn multiple attribute specific embeddings, thus measuring the fine-grained similarity in the corresponding space.
Abstract: This paper strives to predict fine-grained fashion similarity. In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute between fashion items. For example, whether the collar designs of the two clothes are similar. It has potential value in many fashion related applications, such as fashion copyright protection. To this end, we propose an Attribute-Specific Embedding Network (ASEN) to jointly learn multiple attribute-specific embeddings, thus measure the fine-grained similarity in the corresponding space. The proposed ASEN is comprised of a global branch and a local branch. The global branch takes the whole image as input to extract features from a global perspective, while the local branch takes as input the zoomed-in region-of-interest (RoI) w.r.t. the specified attribute thus able to extract more fine-grained features. As the global branch and the local branch extract the features from different perspectives, they are complementary to each other. Additionally, in each branch, two attention modules, i.e., Attribute-aware Spatial Attention and Attribute-aware Channel Attention, are integrated to make ASEN be able to locate the related regions and capture the essential patterns under the guidance of the specified attribute, thus make the learned attribute-specific embeddings better reflect the fine-grained similarity. Extensive experiments on three fashion-related datasets, i.e., FashionAI, DARN, and DeepFashion, show the effectiveness of ASEN for fine-grained fashion similarity prediction and its potential for fashion reranking. Code and data are available at this https URL .
26 citations
•
11 Mar 201026 citations
•
TL;DR: In this article, the performance of exact diffusion under the stochastic and adaptive setting, and conditions under which exact diffusion has superior steady-state mean-square deviation (MSD) performance than traditional algorithms without bias-correction are provided.
Abstract: Various bias-correction methods such as EXTRA, gradient tracking methods, and exact diffusion have been proposed recently to solve distributed {\em deterministic} optimization problems. These methods employ constant step-sizes and converge linearly to the {\em exact} solution under proper conditions. However, their performance under stochastic and adaptive settings is less explored. It is still unknown {\em whether}, {\em when} and {\em why} these bias-correction methods can outperform their traditional counterparts (such as consensus and diffusion) with noisy gradient and constant step-sizes.
This work studies the performance of exact diffusion under the stochastic and adaptive setting, and provides conditions under which exact diffusion has superior steady-state mean-square deviation (MSD) performance than traditional algorithms without bias-correction. In particular, it is proven that this superiority is more evident over sparsely-connected network topologies such as lines, cycles, or grids. Conditions are also provided under which exact diffusion method match or may even degrade the performance of traditional methods. Simulations are provided to validate the theoretical findings.
26 citations
•
TL;DR: A novel compression method that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks and incorporates a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task- Adaptive BERT compression.
Abstract: Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick inference with limited resources. Existing methods compress BERT into small models while such compression is task-independent, i.e., the same compressed BERT for all different downstream tasks. Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks. We incorporate a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task-adaptive BERT compression. We evaluate AdaBERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12.7x to 29.3x faster than BERT in inference time and 11.5x to 17.0x smaller in terms of parameter size, while comparable performance is maintained.
26 citations
Authors
Showing all 6829 results
Name | H-index | Papers | Citations |
---|---|---|---|
Philip S. Yu | 148 | 1914 | 107374 |
Lei Zhang | 130 | 2312 | 86950 |
Jian Xu | 94 | 1366 | 52057 |
Wei Chu | 80 | 670 | 28771 |
Le Song | 76 | 345 | 21382 |
Yuan Xie | 76 | 739 | 24155 |
Narendra Ahuja | 76 | 474 | 29517 |
Rong Jin | 75 | 449 | 19456 |
Beng Chin Ooi | 73 | 408 | 19174 |
Wotao Yin | 72 | 303 | 27233 |
Deng Cai | 70 | 326 | 24524 |
Xiaofei He | 70 | 260 | 28215 |
Irwin King | 67 | 476 | 19056 |
Gang Wang | 65 | 373 | 21579 |
Xiaodan Liang | 61 | 318 | 14121 |