Institution

Alibaba Group

Company•Hangzhou, China•

About: Alibaba Group is a company organization based out in Hangzhou, China. It is known for research contribution in the topics: Computer science & Terminal (electronics). The organization has 6810 authors who have published 7389 publications receiving 55653 citations. The organization is also known as: Alibaba Group Holding Limited & Alibaba Group (Cayman Islands).

...read moreread less

Topics: Computer science, Terminal (electronics), Graph (abstract data type), Node (networking), Deep learning ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

KVT: k-NN Attention for Boosting Vision Transformers.

[...]

Pichao Wang¹, Xue Wang¹, Fan Wang¹, Ming Lin¹, Shuning Chang¹, Wen Xie¹, Hao Li¹, Rong Jin¹ - Show less +4 more•Institutions (1)

Alibaba Group¹

28 May 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: A sparse attention scheme, dubbed k-NN attention, which naturally inherits the local bias of CNNs without introducing convolutional operations, and allows for the exploration of long range correlation and filter out irrelevant tokens by choosing the most similar tokens from the entire image.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) have dominated computer vision for years, due to its ability in capturing locality and translation invariance. Recently, many vision transformer architectures have been proposed and they show promising performance. A key component in vision transformers is the fully-connected self-attention which is more powerful than CNNs in modelling long range dependencies. However, since the current dense self-attention uses all image patches (tokens) to compute attention matrix, it may neglect locality of images patches and involve noisy tokens (e.g., clutter background and occlusion), leading to a slow training process and potentially degradation of performance. To address these problems, we propose a sparse attention scheme, dubbed k-NN attention, for boosting vision transformers. Specifically, instead of involving all the tokens for attention matrix calculation, we only select the top-k similar tokens from the keys for each query to compute the attention map. The proposed k-NN attention naturally inherits the local bias of CNNs without introducing convolutional operations, as nearby tokens tend to be more similar than others. In addition, the k-NN attention allows for the exploration of long range correlation and at the same time filter out irrelevant tokens by choosing the most similar tokens from the entire image. Despite its simplicity, we verify, both theoretically and empirically, that $k$-NN attention is powerful in distilling noise from input tokens and in speeding up training. Extensive experiments are conducted by using ten different vision transformer architectures to verify that the proposed k-NN attention can work with any existing transformer architectures to improve its prediction performance.

...read moreread less

31 citations

Proceedings Article•DOI•

Better Feature Integration for Named Entity Recognition

[...]

Lu Xu¹, Zhanming Jie¹, Wei Lu², Lidong Bing³•Institutions (3)

Singapore University of Technology and Design¹, South China University of Technology², Alibaba Group³

01 Jun 2021

TL;DR: This work proposes a simple and robust solution to incorporate both types of features with the Synergized-LSTM (Syn-L STM), which clearly captures how the two types of feature interact.

...read moreread less

Abstract: It has been shown that named entity recognition (NER) could benefit from incorporating the long-distance structured information captured by dependency trees. We believe this is because both types of features - the contextual information captured by the linear sequences and the structured information captured by the dependency trees may complement each other. However, existing approaches largely focused on stacking the LSTM and graph neural networks such as graph convolutional networks (GCNs) for building improved NER models, where the exact interaction mechanism between the two types of features is not very clear, and the performance gain does not appear to be significant. In this work, we propose a simple and robust solution to incorporate both types of features with our Synergized-LSTM (Syn-LSTM), which clearly captures how the two types of features interact. We conduct extensive experiments on several standard datasets across four languages. The results demonstrate that the proposed model achieves better performance than previous approaches while requiring fewer parameters. Our further analysis demonstrates that our model can capture longer dependencies compared with strong baselines.

...read moreread less

31 citations

Proceedings Article•DOI•

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

[...]

Xinyu Wang¹, Yong Jiang¹, Nguyen Bach², Tao Wang², Fei Huang², Kewei Tu² - Show less +2 more•Institutions (2)

ShanghaiTech University¹, Alibaba Group²

01 Jul 2020

TL;DR: This paper proposes two novel KD methods based on structure-level information that approximately minimizes the distance between the student’s and the teachers’ structure- level probability distributions, and aggregates theructure-level knowledge to local distributions and minimizesThe distance between two local probability distributions.

...read moreread less

Abstract: Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student’s and the teachers’ structure-level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.

...read moreread less

31 citations

Patent•

Webpage browsing method, WebApp framework, method and device for executing JavaScript, and mobile terminal

[...]

梁捷, 马妙魁¹•Institutions (1)

Alibaba Group¹

27 Apr 2013

TL;DR: In this paper, a method for executing extended JavaScript (JS) by using an extended JS interface is presented, which comprises the following steps: inquiring an extension program whether to execute the extended JS at a predetermined occasion when a webpage is loaded, wherein the extension program is loaded when a browser is started up; the browser assembles the extended JavaScript interface according to an open application programming interface (API) when determining that the extension JS is needed to be executed; executing the extended JSP by using the extendedJS interface.

...read moreread less

Abstract: The disclosure provides a method for executing extended JavaScript (JS) by using an extended JS interface. The method comprises the following steps: inquiring an extension program whether to execute the extended JS at a predetermined occasion when a webpage is loaded, wherein the extension program is loaded when a browser is started up; the browser assembles the extended JS interface according to an open application programming interface (API) when determining that the extended JS is needed to be executed; executing the extended JS by using the extended JS interface. According to the disclosure, browsing mode and layout mode of a browser can be changed dynamically according to the requirement for webpage contents in the form of extension programs, accordingly browsing experience of users is improved.

...read moreread less

31 citations

Journal Article•DOI•

Promoting Clean Technology Adoption: To Subsidize Products or Service Infrastructure?

[...]

Guangrui Ma¹, Michael K. Lim², Ho-Yin Mak³, Zhixi Wan⁴•Institutions (4)

Alibaba Group¹, Seoul National University², University of Oxford³, University of Oregon⁴

24 Jun 2019-Service science

TL;DR: In this paper, the authors proposed that adoption of clean technology products (e.g., electric vehicles and solar photovoltaic panels) are key to sustainable development of sectors such as transportation and energy.

...read moreread less

Abstract: Clean technology products (e.g., electric vehicles and solar photovoltaic panels) are key to sustainable development of sectors such as transportation and energy. Often, adoption of such products r...

...read moreread less

31 citations

Collapse

Authors

Showing all 6829 results

Name	H-index	Papers	Citations
Philip S. Yu	148	1914	107374
Lei Zhang	130	2312	86950
Jian Xu	94	1366	52057
Wei Chu	80	670	28771
Le Song	76	345	21382
Yuan Xie	76	739	24155
Narendra Ahuja	76	474	29517
Rong Jin	75	449	19456
Beng Chin Ooi	73	408	19174
Wotao Yin	72	303	27233
Deng Cai	70	326	24524
Xiaofei He	70	260	28215
Irwin King	67	476	19056
Gang Wang	65	373	21579
Xiaodan Liang	61	318	14121

Network Information

Related Institutions (5)

Microsoft

86.9K papers, 4.1M citations

94% related

Google

39.8K papers, 2.1M citations

94% related

Facebook

10.9K papers, 570.1K citations

93% related

AT&T Labs

5.5K papers, 483.1K citations

38.6K papers, 1.3M citations

87% related

Performance

Metrics

7,410

Papers

106,380

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	5
2022	30
2021	1,352
2020	1,671
2019	1,459
2018	863