scispace - formally typeset
Search or ask a question
Institution

Alibaba Group

CompanyHangzhou, China
About: Alibaba Group is a company organization based out in Hangzhou, China. It is known for research contribution in the topics: Computer science & Terminal (electronics). The organization has 6810 authors who have published 7389 publications receiving 55653 citations. The organization is also known as: Alibaba Group Holding Limited & Alibaba Group (Cayman Islands).


Papers
More filters
Posted Content
Pichao Wang1, Xue Wang1, Fan Wang1, Ming Lin1, Shuning Chang1, Wen Xie1, Hao Li1, Rong Jin1 
TL;DR: A sparse attention scheme, dubbed k-NN attention, which naturally inherits the local bias of CNNs without introducing convolutional operations, and allows for the exploration of long range correlation and filter out irrelevant tokens by choosing the most similar tokens from the entire image.
Abstract: Convolutional Neural Networks (CNNs) have dominated computer vision for years, due to its ability in capturing locality and translation invariance. Recently, many vision transformer architectures have been proposed and they show promising performance. A key component in vision transformers is the fully-connected self-attention which is more powerful than CNNs in modelling long range dependencies. However, since the current dense self-attention uses all image patches (tokens) to compute attention matrix, it may neglect locality of images patches and involve noisy tokens (e.g., clutter background and occlusion), leading to a slow training process and potentially degradation of performance. To address these problems, we propose a sparse attention scheme, dubbed k-NN attention, for boosting vision transformers. Specifically, instead of involving all the tokens for attention matrix calculation, we only select the top-k similar tokens from the keys for each query to compute the attention map. The proposed k-NN attention naturally inherits the local bias of CNNs without introducing convolutional operations, as nearby tokens tend to be more similar than others. In addition, the k-NN attention allows for the exploration of long range correlation and at the same time filter out irrelevant tokens by choosing the most similar tokens from the entire image. Despite its simplicity, we verify, both theoretically and empirically, that $k$-NN attention is powerful in distilling noise from input tokens and in speeding up training. Extensive experiments are conducted by using ten different vision transformer architectures to verify that the proposed k-NN attention can work with any existing transformer architectures to improve its prediction performance.

31 citations

Proceedings ArticleDOI
01 Jun 2021
TL;DR: This work proposes a simple and robust solution to incorporate both types of features with the Synergized-LSTM (Syn-L STM), which clearly captures how the two types of feature interact.
Abstract: It has been shown that named entity recognition (NER) could benefit from incorporating the long-distance structured information captured by dependency trees. We believe this is because both types of features - the contextual information captured by the linear sequences and the structured information captured by the dependency trees may complement each other. However, existing approaches largely focused on stacking the LSTM and graph neural networks such as graph convolutional networks (GCNs) for building improved NER models, where the exact interaction mechanism between the two types of features is not very clear, and the performance gain does not appear to be significant. In this work, we propose a simple and robust solution to incorporate both types of features with our Synergized-LSTM (Syn-LSTM), which clearly captures how the two types of features interact. We conduct extensive experiments on several standard datasets across four languages. The results demonstrate that the proposed model achieves better performance than previous approaches while requiring fewer parameters. Our further analysis demonstrates that our model can capture longer dependencies compared with strong baselines.

31 citations

Proceedings ArticleDOI
01 Jul 2020
TL;DR: This paper proposes two novel KD methods based on structure-level information that approximately minimizes the distance between the student’s and the teachers’ structure- level probability distributions, and aggregates theructure-level knowledge to local distributions and minimizesThe distance between two local probability distributions.
Abstract: Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student’s and the teachers’ structure-level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.

31 citations

Patent
梁捷, 马妙魁1
27 Apr 2013
TL;DR: In this paper, a method for executing extended JavaScript (JS) by using an extended JS interface is presented, which comprises the following steps: inquiring an extension program whether to execute the extended JS at a predetermined occasion when a webpage is loaded, wherein the extension program is loaded when a browser is started up; the browser assembles the extended JavaScript interface according to an open application programming interface (API) when determining that the extension JS is needed to be executed; executing the extended JSP by using the extendedJS interface.
Abstract: The disclosure provides a method for executing extended JavaScript (JS) by using an extended JS interface. The method comprises the following steps: inquiring an extension program whether to execute the extended JS at a predetermined occasion when a webpage is loaded, wherein the extension program is loaded when a browser is started up; the browser assembles the extended JS interface according to an open application programming interface (API) when determining that the extended JS is needed to be executed; executing the extended JS by using the extended JS interface. According to the disclosure, browsing mode and layout mode of a browser can be changed dynamically according to the requirement for webpage contents in the form of extension programs, accordingly browsing experience of users is improved.

31 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed that adoption of clean technology products (e.g., electric vehicles and solar photovoltaic panels) are key to sustainable development of sectors such as transportation and energy.
Abstract: Clean technology products (e.g., electric vehicles and solar photovoltaic panels) are key to sustainable development of sectors such as transportation and energy. Often, adoption of such products r...

31 citations


Authors

Showing all 6829 results

NameH-indexPapersCitations
Philip S. Yu1481914107374
Lei Zhang130231286950
Jian Xu94136652057
Wei Chu8067028771
Le Song7634521382
Yuan Xie7673924155
Narendra Ahuja7647429517
Rong Jin7544919456
Beng Chin Ooi7340819174
Wotao Yin7230327233
Deng Cai7032624524
Xiaofei He7026028215
Irwin King6747619056
Gang Wang6537321579
Xiaodan Liang6131814121
Network Information
Related Institutions (5)
Microsoft
86.9K papers, 4.1M citations

94% related

Google
39.8K papers, 2.1M citations

94% related

Facebook
10.9K papers, 570.1K citations

93% related

AT&T Labs
5.5K papers, 483.1K citations

90% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20235
202230
20211,352
20201,671
20191,459
2018863