Top 675 papers published by Alibaba Group in 2017

Proceedings Article•DOI•

Global Context-Aware Attention LSTM Networks for 3D Action Recognition

[...]

Jun Liu¹, Gang Wang², Ping Hu¹, Ling-Yu Duan³, Alex C. Kot¹ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Alibaba Group², Peking University³

21 Jul 2017

TL;DR: This work proposes a new class of LSTM network, Global Context-Aware Attention L STM (GCA-LSTM), for 3D action recognition, which is able to selectively focus on the informative joints in the action sequence with the assistance of global contextual information.

...read moreread less

Abstract: Long Short-Term Memory (LSTM) networks have shown superior performance in 3D human action recognition due to their power in modeling the dynamics and dependencies in sequential data. Since not all joints are informative for action analysis and the irrelevant joints often bring a lot of noise, we need to pay more attention to the informative ones. However, original LSTM does not have strong attention capability. Hence we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for 3D action recognition, which is able to selectively focus on the informative joints in the action sequence with the assistance of global contextual information. In order to achieve a reliable attention representation for the action sequence, we further propose a recurrent attention mechanism for our GCA-LSTM network, in which the attention performance is improved iteratively. Experiments show that our end-to-end network can reliably focus on the most informative joints in each frame of the skeleton sequence. Moreover, our network yields state-of-the-art performance on three challenging datasets for 3D action recognition.

...read moreread less

573 citations

Proceedings Article•DOI•

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

[...]

Jun Wang¹, Lantao Yu², Weinan Zhang², Yu Gong³, Yinghui Xu³, Benyou Wang⁴, Peng Zhang⁴, Dell Zhang⁵ - Show less +4 more•Institutions (5)

University College London¹, Shanghai Jiao Tong University², Alibaba Group³, Tianjin University⁴, Birkbeck, University of London⁵

30 May 2017-arXiv: Information Retrieval

TL;DR: A unified framework takes advantage of both schools of thinking in information retrieval modelling and shows that the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model to achieve a better estimation for document ranking.

...read moreread less

Abstract: This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.

...read moreread less

416 citations

Proceedings Article•DOI•

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

[...]

Jun Wang¹, Lantao Yu², Weinan Zhang², Yu Gong³, Yinghui Xu³, Benyou Wang⁴, Peng Zhang⁴, Dell Zhang⁵ - Show less +4 more•Institutions (5)

University College London¹, Shanghai Jiao Tong University², Alibaba Group³, Tianjin University⁴, Birkbeck, University of London⁵

07 Aug 2017

TL;DR: In this paper, a game theoretical minimax game is proposed to iteratively optimise both generative and discriminative models for document ranking, and the generative model is trained to fit the relevance distribution over documents via the signals from the discriminator.

...read moreread less

Abstract: This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.

...read moreread less

413 citations

Proceedings Article•

Cherrypick: adaptively unearthing the best cloud configurations for big data analytics

[...]

Omid Alipourfard¹, Hongqiang Harry Liu², Jianshu Chen², Shivaram Venkataraman³, Minlan Yu¹, Ming Zhang⁴ - Show less +2 more•Institutions (4)

Yale University¹, Microsoft², University of California, Berkeley³, Alibaba Group⁴

27 Mar 2017

TL;DR: CherryPick is a system that leverages Bayesian Optimization to build performance models for various applications, and the models are just accurate enough to distinguish the best or close-to-the-best configuration from the rest with only a few test runs.

...read moreread less

Abstract: Picking the right cloud configuration for recurring big data analytics jobs running in clouds is hard, because there can be tens of possible VM instance types and even more cluster sizes to pick from. Choosing poorly can significantly degrade performance and increase the cost to run a job by 2-3x on average, and as much as 12x in the worst-case. However, it is challenging to automatically identify the best configuration for a broad spectrum of applications and cloud configurations with low search cost. CherryPick is a system that leverages Bayesian Optimization to build performance models for various applications, and the models are just accurate enough to distinguish the best or close-to-the-best configuration from the rest with only a few test runs. Our experiments on five analytic applications in AWS EC2 show that CherryPick has a 45-90% chance to find optimal configurations, otherwise near-optimal, saving up to 75% search cost compared to existing solutions.

...read moreread less

369 citations

Proceedings Article•DOI•

Spatio-Temporal AutoEncoder for Video Anomaly Detection

[...]

Yiru Zhao¹, Bing Deng², Chen Shen³, Yao Liu², Hongtao Lu¹, Xian-Sheng Hua² - Show less +2 more•Institutions (3)

Shanghai Jiao Tong University¹, Alibaba Group², Zhejiang University³

23 Oct 2017

TL;DR: A novel model called Spatio-Temporal AutoEncoding (ST AutoEncoder or STAE), which utilizes deep neural networks to learn video representation automatically and extracts features from both spatial and temporal dimensions by performing 3-dimensional convolutions, which enhances the motion feature learning in videos.

...read moreread less

Abstract: Anomalous events detection in real-world video scenes is a challenging problem due to the complexity of "anomaly" as well as the cluttered backgrounds, objects and motions in the scenes. Most existing methods use hand-crafted features in local spatial regions to identify anomalies. In this paper, we propose a novel model called Spatio-Temporal AutoEncoder (ST AutoEncoder or STAE), which utilizes deep neural networks to learn video representation automatically and extracts features from both spatial and temporal dimensions by performing 3-dimensional convolutions. In addition to the reconstruction loss used in existing typical autoencoders, we introduce a weight-decreasing prediction loss for generating future frames, which enhances the motion feature learning in videos. Since most anomaly detection datasets are restricted to appearance anomalies or unnatural motion anomalies, we collected a new challenging dataset comprising a set of real-world traffic surveillance videos. Several experiments are performed on both the public benchmarks and our traffic dataset, which show that our proposed method remarkably outperforms the state-of-the-art approaches.

...read moreread less

363 citations

Proceedings Article•DOI•

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

[...]

Yijin Guan¹, Hao Liang², Ningyi Xu³, Wenqiang Wang³, Shaoshuai Shi⁴, Xi Chen⁵, Guangyu Sun¹, Wei Zhang⁶, Jason Cong⁷ - Show less +5 more•Institutions (7)

Peking University¹, Alibaba Group², Microsoft³, The Chinese University of Hong Kong⁴, Tsinghua University⁵, Harbin Institute of Technology⁶, University of California, Los Angeles⁷

01 Apr 2017

TL;DR: FP-DNN (Field Programmable DNN), an end-to-end framework that takes TensorFlow-described DNNs as input, and automatically generates the hardware implementations on FPGA boards with RTL-HLS hybrid templates, is proposed.

...read moreread less

Abstract: DNNs (Deep Neural Networks) have demonstrated great success in numerous applications such as image classification, speech recognition, video analysis, etc. However, DNNs are much more computation-intensive and memory-intensive than previous shallow models. Thus, it is challenging to deploy DNNs in both large-scale data centers and real-time embedded systems. Considering performance, flexibility, and energy efficiency, FPGA-based accelerator for DNNs is a promising solution. Unfortunately, conventional accelerator design flows make it difficult for FPGA developers to keep up with the fast pace of innovations in DNNs. To overcome this problem, we propose FP-DNN (Field Programmable DNN), an end-to-end framework that takes TensorFlow-described DNNs as input, and automatically generates the hardware implementations on FPGA boards with RTL-HLS hybrid templates. FP-DNN performs model inference of DNNs with our high-performance computation engine and carefully-designed communication optimization strategies. We implement CNNs, LSTM-RNNs, and Residual Nets with FPDNN, and experimental results show the great performance and flexibility provided by our proposed FP-DNN framework.

...read moreread less

277 citations

Proceedings Article•

Self-Driving Database Management Systems.

[...]

Andrew Pavlo¹, Gustavo Angulo, Joy Arulraj¹, Haibin Lin², Jiexi Lin¹, Lin Ma¹, Prashanth Menon¹, Todd C. Mowry¹, Matthew Perron¹, Ian Quah, Siddharth Santurkar, Anthony Tomasic¹, Skye Toor, Dana Van Aken¹, Ziqi Wang¹, Yingjun Wu³, Ran Xian¹, Tieying Zhang⁴ - Show less +14 more•Institutions (4)

Carnegie Mellon University¹, Amazon.com², National University of Singapore³, Alibaba Group⁴

01 Jan 2017

TL;DR: The architecture of Peloton is presented, the first selfdriving DBMS, which enables new optimizations that are important for modern high-performance DBMSs, but which are not possible today because the complexity of managing these systems has surpassed the abilities of human experts.

...read moreread less

Abstract: In the last two decades, both researchers and vendors have built advisory tools to assist database administrators (DBAs) in various aspects of system tuning and physical design. Most of this previous work, however, is incomplete because they still require humans to make the final decisions about any changes to the database and are reactionary measures that fix problems after they occur. What is needed for a truly “self-driving” database management system (DBMS) is a new architecture that is designed for autonomous operation. This is different than earlier attempts because all aspects of the system are controlled by an integrated planning component that not only optimizes the system for the current workload, but also predicts future workload trends so that the system can prepare itself accordingly. With this, the DBMS can support all of the previous tuning techniques without requiring a human to determine the right way and proper time to deploy them. It also enables new optimizations that are important for modern high-performance DBMSs, but which are not possible today because the complexity of managing these systems has surpassed the abilities of human experts. This paper presents the architecture of Peloton, the first selfdriving DBMS. Peloton’s autonomic capabilities are now possible due to algorithmic advancements in deep learning, as well as improvements in hardware and adaptive database architectures.

...read moreread less

220 citations

Proceedings Article•

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

[...]

Cong Leng¹, Hao Li¹, Shenghuo Zhu¹, Rong Jin¹•Institutions (1)

Alibaba Group¹

24 Jul 2017

TL;DR: This paper focuses on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network, and proposes to solve this problem using extragradient and iterative quantization algorithms that lead to considerably faster convergency compared to conventional optimization methods.

...read moreread less

Abstract: Although deep learning models are highly effective for various learning tasks, their high computational costs prohibit the deployment to scenarios where either memory or computational resources are limited. In this paper, we focus on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network. We model this problem as a discretely constrained optimization problem. Borrowing the idea from Alternating Direction Method of Multipliers (ADMM), we decouple the continuous parameters from the discrete constraints of network, and cast the original hard problem into several subproblems. We propose to solve these subproblems using extragradient and iterative quantization algorithms that lead to considerably faster convergency compared to conventional optimization methods. Extensive experiments on image recognition and object detection verify that the proposed algorithm is more effective than state-of-the-art approaches when coming to extremely low bit neural network.

...read moreread less

217 citations

Proceedings Article•DOI•

Deep Level Sets for Salient Object Detection

[...]

Ping Hu¹, Bing Shuai¹, Jun Liu¹, Gang Wang²•Institutions (2)

Nanyang Technological University¹, Alibaba Group²

01 Jul 2017

TL;DR: This work proposes a deep Level Set network to produce compact and uniform saliency maps and drives the network to learn a Level Set function for salient objects so it can output more accurate boundaries and compact saliency.

...read moreread less

Abstract: Deep learning has been applied to saliency detection in recent years. The superior performance has proved that deep networks can model the semantic properties of salient objects. Yet it is difficult for a deep network to discriminate pixels belonging to similar receptive fields around the object boundaries, thus deep networks may output maps with blurred saliency and inaccurate boundaries. To tackle such an issue, in this work, we propose a deep Level Set network to produce compact and uniform saliency maps. Our method drives the network to learn a Level Set function for salient objects so it can output more accurate boundaries and compact saliency. Besides, to propagate saliency information among pixels and recover full resolution saliency map, we extend a superpixel-based guided filter to be a layer in the network. The proposed network has a simple structure and is trained end-to-end. During testing, the network can produce saliency maps by efficiently feedforwarding testing images at a speed over 12FPS on GPUs. Evaluations on benchmark datasets show that the proposed method achieves state-of-the-art performance.

...read moreread less

203 citations

Proceedings Article•

Scalable Graph Embedding for Asymmetric Proximity

[...]

Chang Zhou¹, Yuqiong Liu¹, Xiaofei Liu², Liu Zhongyi², Jun Gao¹ - Show less +1 more•Institutions (2)

Peking University¹, Alibaba Group²

13 Feb 2017

TL;DR: This paper proposes an asymmetric proximity preserving (APP) graph embedding method via random walk with restart, which captures both asymmetric and high-order similarities between node pairs, and gives theoretical analysis that this method implicitly preserves the Rooted PageRank score for any two vertices.

...read moreread less

Abstract: Graph Embedding methods are aimed at mapping each vertex into a low dimensional vector space, which preserves certain structural relationships among the vertices in the original graph. Recently, several works have been proposed to learn embeddings based on sampled paths from the graph, e.g., DeepWalk, Line, Node2Vec. However, their methods only preserve symmetric proximities, which could be insufficient in many applications, even the underlying graph is undirected. Besides, they lack of theoretical analysis of what exactly the relationships they preserve in their embedding space. In this paper, we propose an asymmetric proximity preserving (APP) graph embedding method via random walk with restart, which captures both asymmetric and high-order similarities between node pairs. We give theoretical analysis that our method implicitly preserves the Rooted PageRank score for any two vertices. We conduct extensive experiments on tasks of link prediction and node recommendation on open source datasets, as well as online recommendation services in Alibaba Group, in which the training graph has over 290 million vertices and 18 billion edges, showing our method to be highly scalable and effective.

...read moreread less

188 citations

Proceedings Article•DOI•

AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine

[...]

Minghui Qiu¹, Feng-Lin Li¹, Siyu Wang, Xing Gao¹, Yan Chen², Weipeng Zhao¹, Haiqing Chen¹, Jun Huang¹, Wei Chu³ - Show less +5 more•Institutions (3)

Alibaba Group¹, Zhejiang University², University of Cologne³

01 Jul 2017

TL;DR: An open-domain chatbot engine that integrates the joint results of Information Retrieval and Sequence to Sequence based generation models and outperforms both IR and generation based models is proposed.

...read moreread less

Abstract: We propose AliMe Chat, an open-domain chatbot engine that integrates the joint results of Information Retrieval (IR) and Sequence to Sequence (Seq2Seq) based generation models. AliMe Chat uses an attentive Seq2Seq based rerank model to optimize the joint results. Extensive experiments show our engine outperforms both IR and generation based models. We launch AliMe Chat for a real-world industrial application and observe better results than another public chatbot.

...read moreread less

Posted Content•

ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation

[...]

Chang Zhou¹, Jinze Bai², Junshuai Song², Xiaofei Liu¹, Zhengchao Zhao¹, Xiusi Chen², Jun Gao² - Show less +3 more•Institutions (2)

Alibaba Group¹, Peking University²

17 Nov 2017-arXiv: Artificial Intelligence

TL;DR: This paper proposes an attention based user behavior modeling framework called ATRank, which it mainly uses for recommendation tasks, and explores ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.

...read moreread less

Abstract: A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.

...read moreread less

Proceedings Article•

ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation.

[...]

Chang Zhou¹, Jinze Bai², Junshuai Song², Xiaofei Liu¹, Zhengchao Zhao¹, Xiusi Chen², Jun Gao² - Show less +3 more•Institutions (2)

Alibaba Group¹, Peking University²

01 Nov 2017

TL;DR: Zhang et al. as discussed by the authors proposed an attention-based user behavior modeling framework called ATRank, which mainly uses for recommendation tasks, which projects all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention.

...read moreread less

Abstract: A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.

...read moreread less

Proceedings Article•DOI•

Missing Modalities Imputation via Cascaded Residual Autoencoder

[...]

Luan Tran¹, Xiaoming Liu¹, Jiayu Zhou¹, Rong Jin²•Institutions (2)

Michigan State University¹, Alibaba Group²

01 Jul 2017

TL;DR: This work proposes a novel Cascaded Residual Autoencoder (CRA) to impute missing modalities by stacking residual autoencoders, which grows iteratively to model the residual between the current prediction and original data.

...read moreread less

Abstract: Affordable sensors lead to an increasing interest in acquiring and modeling data with multiple modalities. Learning from multiple modalities has shown to significantly improve performance in object recognition. However, in practice it is common that the sensing equipment experiences unforeseeable malfunction or configuration issues, leading to corrupted data with missing modalities. Most existing multi-modal learning algorithms could not handle missing modalities, and would discard either all modalities with missing values or all corrupted data. To leverage the valuable information in the corrupted data, we propose to impute the missing data by leveraging the relatedness among different modalities. Specifically, we propose a novel Cascaded Residual Autoencoder (CRA) to impute missing modalities. By stacking residual autoencoders, CRA grows iteratively to model the residual between the current prediction and original data. Extensive experiments demonstrate the superior performance of CRA on both the data imputation and the object recognition task on imputed data.

...read moreread less

Journal Article•DOI•

Quadruplet Network with One-Shot Learning for Fast Visual Object Tracking

[...]

Xingping Dong¹, Jianbing Shen¹, Dongming Wu¹, Kan Guo², Xiaogang Jin³, Fatih Porikli⁴ - Show less +2 more•Institutions (4)

Beijing Institute of Technology¹, Alibaba Group², Zhejiang University³, Australian National University⁴

19 May 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new quadruplet deep network is proposed to examine the potential connections among the training instances, aiming to achieve a more powerful representation and is evaluated by model-free tracking-by-detection of objects from a single initial exemplar in several visual object tracking benchmarks.

...read moreread less

Abstract: In the same vein of discriminative one-shot learning, Siamese networks allow recognizing an object from a single exemplar with the same class label. However, they do not take advantage of the underlying structure of the data and the relationship among the multitude of samples as they only rely on pairs of instances for training. In this paper, we propose a new quadruplet deep network to examine the potential connections among the training instances, aiming to achieve a more powerful representation. We design four shared networks that receive multi-tuple of instances as inputs and are connected by a novel loss function consisting of pair-loss and triplet-loss. According to the similarity metric, we select the most similar and the most dissimilar instances as the positive and negative inputs of triplet loss from each multi-tuple. We show that this scheme improves the training performance. Furthermore, we introduce a new weight layer to automatically select suitable combination weights, which will avoid the conflict between triplet and pair loss leading to worse performance. We evaluate our quadruplet framework by model-free tracking-by-detection of objects from a single initial exemplar in several Visual Object Tracking benchmarks. Our extensive experimental analysis demonstrates that our tracker achieves superior performance with a real-time processing speed of 78 frames-per-second (fps).

...read moreread less

Journal Article•DOI•

Simultaneous Feature and Dictionary Learning for Image Set Based Face Recognition

[...]

Jiwen Lu¹, Gang Wang², Jie Zhou¹•Institutions (2)

Tsinghua University¹, Alibaba Group²

08 Jun 2017-IEEE Transactions on Image Processing

TL;DR: To better exploit the nonlinearity of face samples from different image sets, a deep SFDL (D-SFDL) method is proposed by jointly learning hierarchical non-linear transformations and class-specific dictionaries to further improve the recognition performance.

...read moreread less

Abstract: In this paper, we propose a simultaneous feature and dictionary learning (SFDL) method for image set-based face recognition, where each training and testing example contains a set of face images, which were captured from different variations of pose, illumination, expression, resolution, and motion. While a variety of feature learning and dictionary learning methods have been proposed in recent years and some of them have been successfully applied to image set-based face recognition, most of them learn features and dictionaries for facial image sets individually, which may not be powerful enough because some discriminative information for dictionary learning may be compromised in the feature learning stage if they are applied sequentially, and vice versa. To address this, we propose a SFDL method to learn discriminative features and dictionaries simultaneously from raw face pixels so that discriminative information from facial image sets can be jointly exploited by a one-stage learning procedure. To better exploit the nonlinearity of face samples from different image sets, we propose a deep SFDL (D-SFDL) method by jointly learning hierarchical non-linear transformations and class-specific dictionaries to further improve the recognition performance. Extensive experimental results on five widely used face data sets clearly shows that our SFDL and D-SFDL achieve very competitive or even better performance with the state-of-the-arts.

...read moreread less

Proceedings Article•DOI•

An Empirical Study of Language CNN for Image Captioning

[...]

Jiuxiang Gu¹, Gang Wang², Jianfei Cai¹, Tsuhan Chen¹•Institutions (2)

Nanyang Technological University¹, Alibaba Group²

01 Oct 2017

TL;DR: This paper introduces a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning, and is competitive with the state-of-the-art methods.

...read moreread less

Abstract: Language models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all the previous words and can model the long-range dependencies in history words, which are critical for image captioning. The effectiveness of our approach is validated on two datasets: Flickr30K and MS COCO. Our extensive experimental results show that our method outperforms the vanilla recurrent neural network based language models and is competitive with the state-of-the-art methods.

...read moreread less

Posted Content•

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

[...]

Cong Leng¹, Hao Li¹, Shenghuo Zhu¹, Rong Jin¹•Institutions (1)

Alibaba Group¹

24 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors focus on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network, and model this problem as a discretely constrained optimization problem.

...read moreread less

Abstract: Although deep learning models are highly effective for various learning tasks, their high computational costs prohibit the deployment to scenarios where either memory or computational resources are limited. In this paper, we focus on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network. We model this problem as a discretely constrained optimization problem. Borrowing the idea from Alternating Direction Method of Multipliers (ADMM), we decouple the continuous parameters from the discrete constraints of network, and cast the original hard problem into several subproblems. We propose to solve these subproblems using extragradient and iterative quantization algorithms that lead to considerably faster convergency compared to conventional optimization methods. Extensive experiments on image recognition and object detection verify that the proposed algorithm is more effective than state-of-the-art approaches when coming to extremely low bit neural network.

...read moreread less

Journal Article•DOI•

Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks

[...]

Jun Liu¹, Gang Wang², Ling-Yu Duan³, Kamila Abdiyeva¹, Alex C. Kot¹ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Alibaba Group², Peking University³

18 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Li et al. as discussed by the authors proposed a Global Context-Aware Attention LSTM (GCA-LSTM) for skeleton-based action recognition, which is able to selectively focus on the informative joints in each frame of each skeleton sequence by using a global context memory cell.

...read moreread less

Abstract: Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, Long Short-Term Memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action recognition. This network is capable of selectively focusing on the informative joints in each frame of each skeleton sequence by using a global context memory cell. To further improve the attention capability of our network, we also introduce a recurrent attention mechanism, with which the attention performance of the network can be enhanced progressively. Moreover, we propose a stepwise training scheme in order to train our network effectively. Our approach achieves state-of-the-art performance on five challenging benchmark datasets for skeleton based action recognition.

...read moreread less

Proceedings Article•DOI•

AliMe Assist : An Intelligent Assistant for Creating an Innovative E-commerce Experience

[...]

Feng-Lin Li¹, Minghui Qiu¹, Haiqing Chen¹, Xiongwei Wang¹, Xing Gao¹, Jun Huang¹, Juwei Ren¹, Zhongzhou Zhao¹, Weipeng Zhao¹, Lei Wang¹, Guwei Jin¹, Wei Chu¹ - Show less +8 more•Institutions (1)

Alibaba Group¹

06 Nov 2017

TL;DR: In this paper, the AliMe Assist system is demonstrated, the underlying techniques are presented, and the experience in dealing with real-world QA in the E-commerce field is shared.

...read moreread less

Abstract: We present AliMe Assist, an intelligent assistant designed for creating an innovative online shopping experience in E-commerce. Based on question answering (QA), AliMe Assist offers assistance service, customer service, and chatting service. It is able to take voice and text input, incorporate context to QA, and support multi-round interaction. Currently, it serves millions of customer questions per day and is able to address 85% of them. In this paper, we demonstrate the system, present the underlying techniques, and share our experience in dealing with real-world QA in the E-commerce field.

...read moreread less

Journal Article•

Ranking Saliency

[...]

Lihe Zhang¹, Chuanr Yang², Huchuan Lu¹, Xiang Ruan, Ming-Hsuan Yang³ - Show less +1 more•Institutions (3)

Dalian University of Technology¹, Alibaba Group², University of California, Merced³

01 Sep 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work considers both foreground and background cues in this work, and ranks the similarity of image elements with foreground or background cues via graph-based manifold ranking based on their relevances to the given seeds or queries.

...read moreread less

Abstract: Most existing bottom-up algorithms measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of only considering the contrast between salient objects and their surrounding regions, we consider both foreground and background cues in this work. We rank the similarity of image elements with foreground or background cues via graph-based manifold ranking. The saliency of image elements is defined based on their relevances to the given seeds or queries. We represent an image as a multi-scale graph with fine superpixels and coarse regions as nodes. These nodes are ranked based on the similarity to background and foreground queries using affinity matrices. Saliency detection is carried out in a cascade scheme to extract background regions and foreground salient objects efficiently. Experimental results demonstrate the proposed method performs well against the state-of-the-art methods in terms of accuracy and speed. We also propose a new benchmark dataset containing 5,168 images for large-scale performance evaluation of saliency detection methods.

...read moreread less

Proceedings Article•DOI•

Optimized Cost per Click in Taobao Display Advertising

[...]

Han Zhu¹, Junqi Jin¹, Chang Tan¹, Fei Pan¹, Yifan Zeng¹, Han Li¹, Kun Gai¹ - Show less +3 more•Institutions (1)

Alibaba Group¹

13 Aug 2017

TL;DR: A bid optimizing strategy called optimized cost per click (OCPC) is proposed which automatically adjusts the bid to achieve finer matching of bid and traffic quality of page view (PV) request granularity and yields substantially better results than previous fixed bid manner.

...read moreread less

Abstract: Taobao, as the largest online retail platform in the world, provides billions of online display advertising impressions for millions of advertisers every day. For commercial purposes, the advertisers bid for specific spots and target crowds to compete for business traffic. The platform chooses the most suitable ads to display in tens of milliseconds. Common pricing methods include cost per mille (CPM) and cost per click (CPC). Traditional advertising systems target certain traits of users and ad placements with fixed bids, essentially regarded as coarse-grained matching of bid and traffic quality. However, the fixed bids set by the advertisers competing for different quality requests cannot fully optimize the advertisers' key requirements. Moreover, the platform has to be responsible for the business revenue and user experience. Thus, we proposed a bid optimizing strategy called optimized cost per click (OCPC) which automatically adjusts the bid to achieve finer matching of bid and traffic quality of page view (PV) request granularity. Our approach optimizes advertisers' demands, platform business revenue and user experience and as a whole improves traffic allocation efficiency. We have validated our approach in Taobao display advertising system in production. The online A/B test shows our algorithm yields substantially better results than previous fixed bid manner.

...read moreread less

Proceedings Article•DOI•

Cost Weighting for Neural Machine Translation Domain Adaptation

[...]

Boxing Chen¹, Colin Cherry², George Foster², Samuel Larkin³•Institutions (3)

Alibaba Group¹, Google², National Research Council³

01 Aug 2017

TL;DR: A new domain adaptation technique for neural machine translation called cost weighting is proposed, which is appropriate for adaptation scenarios in which a small in- domain data set and a large general-domain data set are available.

...read moreread less

Abstract: In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large-data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods.

...read moreread less

Journal Article•DOI•

Guaranteeing Deadlines for Inter-Data Center Transfers

[...]

Hong Zhang¹, Kai Chen¹, Wei Bai¹, Dongsu Han², Chen Tian³, Hao Wang⁴, Haibing Guan⁵, Ming Zhang⁶ - Show less +4 more•Institutions (6)

Hong Kong University of Science and Technology¹, KAIST², Nanjing University³, University of Toronto⁴, Shanghai Jiao Tong University⁵, Alibaba Group⁶

01 Feb 2017-IEEE ACM Transactions on Networking

TL;DR: The simulations and test bed experiments show that Amoeba, by harnessing DNA’s malleability, accommodates 15% more user requests with deadlines, while achieving 60% higher WAN utilization than prior solutions.

...read moreread less

Abstract: Inter-data center wide area networks (inter-DC WANs) carry a significant amount of data transfers that require to be completed within certain time periods, or deadlines. However, very little work has been done to guarantee such deadlines. The crux is that the current inter-DC WAN lacks an interface for users to specify their transfer deadlines and a mechanism for provider to ensure the completion while maintaining high WAN utilization. In this paper, we address the problem by introducing a deadline-based network abstraction (DNA) for inter-DC WANs. DNA allows users to explicitly specify the amount of data to be delivered and the deadline by which it has to be completed. The malleability of DNA provides flexibility in resource allocation. Based on this, we develop a system called Amoeba that implements DNA. Our simulations and test bed experiments show that Amoeba , by harnessing DNA’s malleability, accommodates 15% more user requests with deadlines, while achieving 60% higher WAN utilization than prior solutions.

...read moreread less

Proceedings Article•DOI•

A Short-Term Rainfall Prediction Model Using Multi-task Convolutional Neural Networks

[...]

Minghui Qiu¹, Peilin Zhao², Zhang Ke¹, Jun Huang¹, Shi Xing, Wang Xiaoguang, Wei Chu³ - Show less +3 more•Institutions (3)

Alibaba Group¹, South China University of Technology², Hong Kong Polytechnic University³

01 Nov 2017

TL;DR: This is the first attempt to use multi-task learning and deep learning techniques to predict short-term rainfall amount based on multi-site features and significantly outperforms a broad set of baseline models including the European Centre for Medium-range Weather Forecasts system.

...read moreread less

Abstract: Precipitation prediction, such as short-term rainfall prediction, is a very important problem in the field of meteorological service. In practice, most of recent studies focus on leveraging radar data or satellite images to make predictions. However, there is another scenario where a set of weather features are collected by various sensors at multiple observation sites. The observations of a site are sometimes incomplete but provide important clues for weather prediction at nearby sites, which are not fully exploited in existing work yet. To solve this problem, we propose a multi-task convolutional neural network model to automatically extract features from the time series measured at observation sites and leverage the correlation between the multiple sites for weather prediction via multi-tasking. To the best of our knowledge, this is the first attempt to use multi-task learning and deep learning techniques to predict short-term rainfall amount based on multi-site features. Specifically, we formulate the learning task as an end-to-end multi-site neural network model which allows to leverage the learned knowledge from one site to other correlated sites, and model the correlations between different sites. Extensive experiments show that the learned site correlations are insightful and the proposed model significantly outperforms a broad set of baseline models including the European Centre for Medium-range Weather Forecasts system (ECMWF).

...read moreread less

Journal Article•DOI•

Exploiting Web Images for Dataset Construction: A Domain Robust Approach

[...]

Yazhou Yao¹, Jian Zhang¹, Fumin Shen², Xian-Sheng Hua³, Jingsong Xu¹, Zhenmin Tang⁴ - Show less +2 more•Institutions (4)

University of Technology, Sydney¹, University of Electronic Science and Technology of China², Alibaba Group³, Nanjing University of Science and Technology⁴

01 Aug 2017-IEEE Transactions on Multimedia

TL;DR: This work proposes to solve the employed problems by the cutting-plane and concave-convex procedure algorithm and builds an image dataset with 20 categories that can be generalized well to unseen target domains.

...read moreread less

Abstract: Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time-consuming and labor intensive. To reduce the cost of manual labeling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the “dataset bias problem.” To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually nonsalient and less relevant expansions are filtered out. By treating each selected expansion as a “bag” and the retrieved images as “instances,” image selection can be formulated as a multi-instance learning problem with constrained positive bags. We propose to solve the employed problems by the cutting-plane and concave-convex procedure algorithm. By using this approach, images from different distributions can be kept while noisy images are filtered out. To verify the effectiveness of our proposed approach, we build an image dataset with 20 categories. Extensive experiments on image classification, cross-dataset generalization, diversity comparison, and object detection demonstrate the domain robustness of our dataset.

...read moreread less

Posted Content•

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

[...]

Jiuxiang Gu¹, Jianfei Cai¹, Gang Wang², Tsuhan Chen¹•Institutions (2)

Nanyang Technological University¹, Alibaba Group²

11 Sep 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The authors proposed a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions.

...read moreread less

Abstract: The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder's test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.

...read moreread less

Proceedings Article•DOI•

A Hybrid Framework for Text Modeling with Convolutional RNN

[...]

Chenglong Wang¹, Feijun Jiang¹, Hongxia Yang¹•Institutions (1)

Alibaba Group¹

13 Aug 2017

TL;DR: A generic inference hybrid framework for Convolutional Recurrent Neural Network (conv-RNN) of semantic modeling of text is introduced, seamless integrating the merits on extracting different aspects of linguistic information from both convolutional and recurrent neural network structures and thus strengthening the semantic understanding power of the new framework.

...read moreread less

Abstract: In this paper, we introduce a generic inference hybrid framework for Convolutional Recurrent Neural Network (conv-RNN) of semantic modeling of text, seamless integrating the merits on extracting different aspects of linguistic information from both convolutional and recurrent neural network structures and thus strengthening the semantic understanding power of the new framework. Besides, based on conv-RNN, we also propose a novel sentence classification model and an attention based answer selection model with strengthening power for the sentence matching and classification respectively. We validate the proposed models on a very wide variety of data sets, including two challenging tasks of answer selection (AS) and five benchmark datasets for sentence classification (SC). To the best of our knowledge, it is by far the most complete comparison results in both AS and SC. We empirically show superior performances of conv-RNN in these different challenging tasks and benchmark datasets and also summarize insights on the performances of other state-of-the-arts methodologies.

...read moreread less

Proceedings Article•DOI•

Cascade Ranking for Operational E-commerce Search

[...]

Shichen Liu¹, Fei Xiao¹, Wenwu Ou¹, Luo Si¹•Institutions (1)

Alibaba Group¹

13 Aug 2017

TL;DR: In this article, a cascade ranking model is designed and deployed in an operational e-commerce search application to address multiple factors of effectiveness, efficiency and user experience in the real-world application.

...read moreread less

Abstract: In the 'Big Data' era, many real-world applications like search involve the ranking problem for a large number of items. It is important to obtain effective ranking results and at the same time obtain the results efficiently in a timely manner for providing good user experience and saving computational costs. Valuable prior research has been conducted for learning to efficiently rank like the cascade ranking (learning) model, which uses a sequence of ranking functions to progressively filter some items and rank the remaining items. However, most existing research of learning to efficiently rank in search is studied in a relatively small computing environments with simulated user queries. This paper presents novel research and thorough study of designing and deploying a Cascade model in a Large-scale Operational E-commerce Search application (CLOES), which deals with hundreds of millions of user queries per day with hundreds of servers. The challenge of the real-world application provides new insights for research: 1). Real-world search applications often involve multiple factors of preferences or constraints with respect to user experience and computational costs such as search accuracy, search latency, size of search results and total CPU cost, while most existing search solutions only address one or two factors; 2). Effectiveness of e-commerce search involves multiple types of user behaviors such as click and purchase, while most existing cascade ranking in search only models the click behavior. Based on these observations, a novel cascade ranking model is designed and deployed in an operational e-commerce search application. An extensive set of experiments demonstrate the advantage of the proposed work to address multiple factors of effectiveness, efficiency and user experience in the real-world application.

...read moreread less

Proceedings Article•DOI•

Optimized Cost per Click in Taobao Display Advertising

[...]

Han Zhu¹, Junqi Jin¹, Chang Tan¹, Fei Pan¹, Yifan Zeng¹, Han Li¹, Kun Gai¹ - Show less +3 more•Institutions (1)

Alibaba Group¹

27 Feb 2017-arXiv: Computer Science and Game Theory

TL;DR: Li et al. as mentioned in this paper proposed a bid optimizing strategy called optimized cost per click (OCPC) which automatically adjusts the bid to achieve finer matching of bid and traffic quality of page view (PV) request granularity.

...read moreread less

Abstract: Taobao, as the largest online retail platform in the world, provides billions of online display advertising impressions for millions of advertisers every day. For commercial purposes, the advertisers bid for specific spots and target crowds to compete for business traffic. The platform chooses the most suitable ads to display in tens of milliseconds. Common pricing methods include cost per mille (CPM) and cost per click (CPC). Traditional advertising systems target certain traits of users and ad placements with fixed bids, essentially regarded as coarse-grained matching of bid and traffic quality. However, the fixed bids set by the advertisers competing for different quality requests cannot fully optimize the advertisers' key requirements. Moreover, the platform has to be responsible for the business revenue and user experience. Thus, we proposed a bid optimizing strategy called optimized cost per click (OCPC) which automatically adjusts the bid to achieve finer matching of bid and traffic quality of page view (PV) request granularity. Our approach optimizes advertisers' demands, platform business revenue and user experience and as a whole improves traffic allocation efficiency. We have validated our approach in Taobao display advertising system in production. The online A/B test shows our algorithm yields substantially better results than previous fixed bid manner.

...read moreread less

Showing papers by "Alibaba Group published in 2017"