Top 13 papers published by Aixin Sun from Nanyang Technological University in 2021

Journal Article•DOI•

Natural Language Video Localization: A Revisit in Span-based Question Answering Framework.

[...]

Hao Zhang¹, Aixin Sun², Wei Jing³, Liangli Zhen¹, Joey Tianyi Zhou¹, Rick Siow Mong Goh¹ - Show less +2 more•Institutions (3)

Institute of High Performance Computing Singapore¹, Nanyang Technological University², Institute for Infocomm Research Singapore³

23 Feb 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Wang et al. as mentioned in this paper proposed a video span localizing network (VSLNet) to solve the NLVL problem from a span-based question answering (QA) perspective by treating the input video as a text passage.

...read moreread less

Abstract: Natural Language Video Localization (NLVL) aims to locate a target moment from an untrimmed video that semantically corresponds to a text query. Existing approaches mainly solve the NLVL problem from the perspective of computer vision by formulating it as ranking, anchor, or regression tasks. These methods suffer from large performance degradation when localizing on long videos. In this work, we address the NLVL from a new perspective, \ie span-based question answering (QA), by treating the input video as a text passage. We propose a video span localizing network (VSLNet), on top of the standard span-based QA framework (named VSLBase), to address NLVL. VSLNet tackles the differences between NLVL and span-based QA through a simple yet effective query-guided highlighting (QGH) strategy. QGH guides VSLNet to search for the matching video span within a highlighted region. To address the performance degradation on long videos, we further extend VSLNet to VSLNet-L by applying a multi-scale split-and-concatenation strategy to locate the target moment accurately. Extensive experiments show that the proposed methods outperform the state-of-the-art methods; VSLNet-L addresses the issue of performance degradation on long videos. Our study suggests that the span-based QA framework is an effective strategy to solve the NLVL problem.

...read moreread less

42 citations

Proceedings Article•DOI•

Video Corpus Moment Retrieval with Contrastive Learning

[...]

Hao Zhang¹, Aixin Sun¹, Wei Jing², Guoshun Nan³, Liangli Zhen², Joey Tianyi Zhou², Rick Siow Mong Goh² - Show less +3 more•Institutions (3)

Nanyang Technological University¹, Agency for Science, Technology and Research², Singapore University of Technology and Design³

13 May 2021-arXiv: Computation and Language

TL;DR: In this article, the authors propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for video corpus moment retrieval, which is based on two contrastive learning objectives to refine video and text representations separately but with better alignment for VCMR.

...read moreread less

Abstract: Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval (VCMR) is to retrieve a temporal moment (i.e., a fraction of a video) that semantically corresponds to a given text query. As video and text are from two distinct feature spaces, there are two general approaches to address VCMR: (i) to separately encode each modality representations, then align the two modality representations for query processing, and (ii) to adopt fine-grained cross-modal interaction to learn multi-modal representations for query processing. While the second approach often leads to better retrieval accuracy, the first approach is far more efficient. In this paper, we propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR. We adopt the first approach and introduce two contrastive learning objectives to refine video encoder and text encoder to learn video and text representations separately but with better alignment for VCMR. The video contrastive learning (VideoCL) is to maximize mutual information between query and candidate video at video-level. The frame contrastive learning (FrameCL) aims to highlight the moment region corresponds to the query at frame-level, within a video. Experimental results show that, although ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.

...read moreread less

35 citations

Proceedings Article•DOI•

Video Corpus Moment Retrieval with Contrastive Learning

[...]

Hao Zhang¹, Aixin Sun¹, Wei Jing², Guoshun Nan³, Liangli Zhen², Joey Tianyi Zhou², Rick Siow Mong Goh² - Show less +3 more•Institutions (3)

Nanyang Technological University¹, Agency for Science, Technology and Research², Singapore University of Technology and Design³

11 Jul 2021

TL;DR: In this article, the authors propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for video corpus moment retrieval, which is based on two contrastive learning objectives to refine video and text representations separately but with better alignment for VCMR.

...read moreread less

Abstract: Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval (VCMR) is to retrieve a temporal moment (i.e., a fraction of a video) that semantically corresponds to a given text query. As video and text are from two distinct feature spaces, there are two general approaches to address VCMR: (i) to separately encode each modality representations, then align the two modality representations for query processing, and (ii) to adopt fine-grained cross-modal interaction to learn multi-modal representations for query processing. While the second approach often leads to better retrieval accuracy, the first approach is far more efficient. In this paper, we propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR. We adopt the first approach and introduce two contrastive learning objectives to refine video encoder and text encoder to learn video and text representations separately but with better alignment for VCMR. The video contrastive learning (VideoCL) is to maximize mutual information between query and candidate video at video-level. The frame contrastive learning (FrameCL) aims to highlight the moment region corresponds to the query at frame-level, within a video. Experimental results show that, although ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.

...read moreread less

35 citations

Proceedings Article•DOI•

Pre-training Graph Transformer with Multimodal Side Information for Recommendation

[...]

Yong Liu¹, Susen Yang², Chenyi Lei³, Guoxin Wang⁴, Haihong Tang², Juyong Zhang³, Aixin Sun¹, Chunyan Miao¹ - Show less +4 more•Institutions (4)

Nanyang Technological University¹, Alibaba Group², University of Science and Technology of China³, Zhejiang University⁴

17 Oct 2021

TL;DR: In this article, the authors propose a pre-training strategy to learn item representations by considering both item side information and their relationships, e.g., co-purchase, and construct a homogeneous item graph, which provides a unified view of item relations and their associated side information in multimodality.

...read moreread less

Abstract: Side information of items, e.g., images and text description, has shown to be effective in contributing to accurate recommendations. Inspired by the recent success of pre-training models on natural language and images, we propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We relate items by common user activities, e.g., co-purchase, and construct a homogeneous item graph. This graph provides a unified view of item relations and their associated side information in multimodality. We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item. The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction. Experimental results on real datasets demonstrate that the proposed PMGT model effectively exploits the multimodality side information to achieve better accuracies in downstream tasks including item recommendation and click-through ratio prediction. In addition, we also report a case study of testing PMGT in an online setting with 600 thousand users.

...read moreread less

27 citations

Journal Article•DOI•

Point-of-Interest Recommendation with Global and Local Context

[...]

Peng Han¹, Shuo Shang¹, Aixin Sun², Peilin Zhao³, Kai Zheng⁴, Xiangliang Zhang¹ - Show less +2 more•Institutions (4)

King Abdullah University of Science and Technology¹, Nanyang Technological University², Tencent³, University of Electronic Science and Technology of China⁴

16 Feb 2021-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes AUC-MF to address the POI recommendation problem by maximizing Area Under the ROC curve (AUC), and defines a new lambda for AUC to utilize the LambdaMF model, which combines the lambda-based method and matrix factorization model in collaborative filtering.

...read moreread less

Abstract: The task of point of interest (POI) recommendation aims to recommend unvisited places to users based on their check-in history. A major challenge in POI recommendation is data sparsity, because a user typically visits only a very small number of POIs among all available POIs. In this paper, we propose AUC-MF to address the POI recommendation problem by maximizing Area Under the ROC curve (AUC). AUC has been widely used for measuring classification performance with imbalanced data distributions. To optimize AUC, we transform the recommendation task to a classification problem, where the visited locations are positive examples and the unvisited are negative ones. We define a new lambda for AUC to utilize the LambdaMF model, which combines the lambda-based method and matrix factorization model in collaborative filtering. Many studies have shown that geographic information plays an important role in POI recommendation. In this study, we focus on two levels geographic information: local similarity and global similarity. We further show that AUC-MF can be easily extended to incorporate geographical contextual information for POI recommendation.

...read moreread less

22 citations

Journal Article•DOI•

Neural Named Entity Boundary Detection

[...]

Jing Li, Aixin Sun¹, Yukun Ma¹•Institutions (1)

Nanyang Technological University¹

01 Apr 2021-IEEE Transactions on Knowledge and Data Engineering

TL;DR: BdryBot, a recurrent neural network encoder-decoder framework with a pointer network to detect entity boundaries from a given sentence, achieves state-of-the-art performance against five baselines and can be further enhanced when incorporating contextualized language embeddings into token representations.

...read moreread less

Abstract: In this paper, we focus on named entity boundary detection , which is to detect the start and end boundaries of an entity mention in text, without predicting its type. The detected entities are input to entity linking or fine-grained typing systems for semantic enrichment. We propose BdryBot , a recurrent neural network encoder-decoder framework with a pointer network to detect entity boundaries from a given sentence. The encoder considers both character-level representations and word-level embeddings to represent the input words. In this way, BdryBot does not require any hand-crafted features. Because of the pointer network, BdryBot overcomes the problem of variable size output vocabulary and the issue of sparse boundary tags. We conduct two sets of experiments, in-domain detection and cross-domain detection, on six datasets. Our results show that BdryBot achieves state-of-the-art performance against five baselines. In addition, our proposed approach can be further enhanced when incorporating contextualized language embeddings into token representations.

...read moreread less

19 citations

Proceedings Article•DOI•

Generative Inverse Deep Reinforcement Learning for Online Recommendation

[...]

Xiaocong Chen¹, Lina Yao¹, Aixin Sun², Xianzhi Wang³, Xiwei Xu⁴, Liming Zhu⁴ - Show less +2 more•Institutions (4)

University of New South Wales¹, Nanyang Technological University², University of Technology, Sydney³, Commonwealth Scientific and Industrial Research Organisation⁴

26 Oct 2021

TL;DR: This article proposed a generative inverse reinforcement learning approach that avoids the need of defining an elaborative reward function by first generating policies based on observed users' preferences and then evaluating the learned policy by a measurement based on a discriminative actor-critic network.

...read moreread less

Abstract: Deep reinforcement learning enables an agent to capture users' interest through dynamic interactions with the environment. It uses a reward function to learn user's interest and to control the learning process, attracting great interest in recommendation research. However, most reward functions are manually designed; they are either too unrealistic or imprecise to reflect the variety, dimensionality, and non-linearity of the recommendation problem. This impedes the agent from learning an optimal policy in highly dynamic online recommendation scenarios. To address the above issue, we propose a generative inverse reinforcement learning approach that avoids the need of defining an elaborative reward function. In particular, we model the recommendation problem as an automatic policy learning problem. We first generate policies based on observed users' preferences and then evaluate the learned policy by a measurement based on a discriminative actor-critic network. We conduct experiments on an online platform, VirtualTB, and demonstrate the feasibility and effectiveness of our proposed approach via comparisons with several state-of-the-art methods.

...read moreread less

5 citations

Proceedings Article•DOI•

Parallel Attention Network with Sequence Matching for Video Grounding

[...]

Hao Zhang¹, Aixin Sun¹, Wei Jing², Liangli Zhen³, Joey Tianyi Zhou³, Siow Mong Rick Goh - Show less +2 more•Institutions (3)

Nanyang Technological University¹, Agency for Science, Technology and Research², Institute of High Performance Computing Singapore³

01 Aug 2021

3 citations

Journal Article•DOI•

Understanding the stability of medical concept embeddings

[...]

Grace E. Lee¹, Aixin Sun¹•Institutions (1)

Nanyang Technological University¹

01 Mar 2021-Journal of the Association for Information Science and Technology

TL;DR: A detailed analysis on the stability of concept embeddings in medical domain, particularly in relations with concept frequency, reveals the surprising high stability of low‐frequency concepts: low-frequency (<100) concepts have the same high stability as high‐frequency (>1,000) concepts.

...read moreread less

Abstract: Frequency is one of the major factors for training quality word embeddings. Several work has recently discussed the stability of word embeddings in general domain and suggested factors influencing the stability. In this work, we conduct a detailed analysis on the stability of concept embeddings in medical domain, particularly the relation with concept frequency. The analysis reveals the surprising high stability of low-frequency concepts: low-frequency ( 1000) concepts. To develop a deeper understanding of this finding, we propose a new factor, the noisiness of context words, which influences the stability of medical concept embeddings, regardless of frequency. We evaluate the proposed factor by showing the linear correlation with the stability of medical concept embeddings. The correlations are clear and consistent with various groups of medical concepts. Based on the linear relations, we make suggestions on ways to adjust the noisiness of context words for the improvement of stability. Finally, we demonstrate that the proposed factor extends to the word embedding stability in general domain.

...read moreread less

2 citations

Journal Article•DOI•

Will Your Paper Get Promoted by a Citation? A Case Study of Citation Promoter in Computer Science Discipline

[...]

Feiheng Luo¹, Aixin Sun¹, Aravind Sesagiri Raamkumar¹, Mojisola Erdt¹, Yin-Leng Theng¹ - Show less +1 more•Institutions (1)

Nanyang Technological University¹

01 Jan 2021-IEEE Transactions on Emerging Topics in Computing

TL;DR: The comparative results showed that papers would obtain a sharp rise in citation counts shortly after they were cited by citation promoters, and papers that received citation promoters at an early age outperformed other papers in long-term citation counts.

...read moreread less

Abstract: Reseachers have investigated numerous factors influencing citation counts of cited papers. One factor investigated has been the number of gained citations, as this could increase the visibility of cited papers and subsequently induce further citations. In this paper, aiming to identify a particular kind of citation that could trigger a rapid growth in the citation counts of cited papers, a concept of “citation promoter” was proposed. We defined citation promoters based on the annual citation rates of the cited papers and the co-citation counts received by the pair of cited and citing papers. The comparative results showed that papers would obtain a sharp rise in citation counts shortly after they were cited by citation promoters. Papers that received citation promoters at an early age outperformed other papers in long-term citation counts. In addition, we developed a classification model for predicting whether a citing paper would be a citation promoter for its cited paper. Since it was a class imbalanced problem (4 percent positive instances), and there was a lack of content and author features in our dataset, our preliminary models achieved moderate performance with an $F_1$ F 1 score slightly higher than 0.5, while the $F_1$ F 1 score obtained by random guessing was 0.07.

...read moreread less

1 citations

Posted Content•

Parallel Attention Network with Sequence Matching for Video Grounding

[...]

Hao Zhang, Aixin Sun, Wei Jing, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh - Show less +2 more

18 May 2021-arXiv: Computation and Language

TL;DR: In this article, a parallel attention network with sequence matching (SeqPAN) is proposed to address the challenges of multi-modal representation learning, and target moment boundary prediction in video grounding.

...read moreread less

Abstract: Given a video, video grounding aims to retrieve a temporal moment that semantically corresponds to a language query. In this work, we propose a Parallel Attention Network with Sequence matching (SeqPAN) to address the challenges in this task: multi-modal representation learning, and target moment boundary prediction. We design a self-guided parallel attention module to effectively capture self-modal contexts and cross-modal attentive information between video and text. Inspired by sequence labeling tasks in natural language processing, we split the ground truth moment into begin, inside, and end regions. We then propose a sequence matching strategy to guide start/end boundary predictions using region labels. Experimental results on three datasets show that SeqPAN is superior to state-of-the-art methods. Furthermore, the effectiveness of the self-guided parallel attention module and the sequence matching module is verified.

...read moreread less

Proceedings Article•DOI•

DocOIE: A Document-level Context-Aware Dataset for OpenIE

[...]

Kuicai Dong, Zhao Yilin, Aixin Sun¹, Jung-jae Kim², Xiaoli Li³ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Institute for Infocomm Research Singapore², Beijing Normal University³

01 Aug 2021

Posted Content•

DocOIE: A Document-level Context-Aware Dataset for OpenIE

[...]

Kuicai Dong, Yilin Zhao, Aixin Sun, Jung-jae Kim, Xiaoli Li - Show less +1 more

10 May 2021-arXiv: Computation and Language

TL;DR: Li et al. as discussed by the authors proposed DocIE, a document-level context-aware OpenIE model, which can extract structured relational tuples (subject, relation, object) from sentences and plays critical roles for downstream NLP applications.

...read moreread less

Abstract: Open Information Extraction (OpenIE) aims to extract structured relational tuples (subject, relation, object) from sentences and plays critical roles for many downstream NLP applications. Existing solutions perform extraction at sentence level, without referring to any additional contextual information. In reality, however, a sentence typically exists as part of a document rather than standalone; we often need to access relevant contextual information around the sentence before we can accurately interpret it. As there is no document-level context-aware OpenIE dataset available, we manually annotate 800 sentences from 80 documents in two domains (Healthcare and Transportation) to form a DocOIE dataset for evaluation. In addition, we propose DocIE, a novel document-level context-aware OpenIE model. Our experimental results based on DocIE demonstrate that incorporating document-level context is helpful in improving OpenIE performance. Both DocOIE dataset and DocIE model are released for public.

...read moreread less

Showing papers by "Aixin Sun published in 2021"