Showing papers by "Wang-Chien Lee published in 2017"

PDF

Open Access

Proceedings Article•DOI•

HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning

[...]

Tao-Yang Fu¹, Wang-Chien Lee¹, Zhen Lei¹•Institutions (1)

06 Nov 2017

TL;DR: Empirical results show that HIN2Vec soundly outperforms the state-of-the-art representation learning models for network data, including DeepWalk, LINE, node2vec, PTE, HINE and ESim, by 6.6% to 23.8% of $micro$-$f_1$ in multi-label node classification and 5% to 70.8%, in link prediction.

...read moreread less

Abstract: In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich semantics embedded in HINs by exploiting different types of relationships among nodes. Given a set of relationships specified in forms of meta-paths in an HIN, HIN2Vec carries out multiple prediction training tasks jointly based on a target set of relationships to learn latent vectors of nodes and meta-paths in the HIN. In addition to model design, several issues unique to HIN2Vec, including regularization of meta-path vectors, node type selection in negative sampling, and cycles in random walks, are examined. To validate our ideas, we learn latent vectors of nodes using four large-scale real HIN datasets, including Blogcatalog, Yelp, DBLP and U.S. Patents, and use them as features for multi-label node classification and link prediction applications on those networks. Empirical results show that HIN2Vec soundly outperforms the state-of-the-art representation learning models for network data, including DeepWalk, LINE, node2vec, PTE, HINE and ESim, by 6.6% to 23.8% of $micro$-$f_1$ in multi-label node classification and 5% to 70.8% of $MAP$ in link prediction.

...read moreread less

532 citations

Journal Article•DOI•

Geo-social group queries with minimum acquaintance constraints

[...]

Qijun Zhu¹, Haibo Hu², Cheng Xu¹, Jianliang Xu¹, Wang-Chien Lee³ - Show less +1 more•Institutions (3)

Hong Kong Baptist University¹, Hong Kong Polytechnic University², Pennsylvania State University³

01 Oct 2017

TL;DR: This paper proposes a new family of GSGQs with minimum acquaintance constraints, which are more appealing to users as they guarantee a worst-case acquaintance level in the result group and substantially outperform the baseline algorithms under various system settings.

...read moreread less

Abstract: The prosperity of location-based social networking has paved the way for new applications of group-based activity planning and marketing. While such applications heavily rely on geo-social group queries (GSGQs), existing studies fail to produce a cohesive group in terms of user acquaintance. In this paper, we propose a new family of GSGQs with minimum acquaintance constraints. They are more appealing to users as they guarantee a worst-case acquaintance level in the result group. For efficient processing of GSGQs on large location-based social networks, we devise two social-aware spatial index structures, namely SaR-tree and SaR*-tree. The latter improves on the former by considering both spatial and social distances when clustering objects. Based on SaR-tree and SaR*-tree, novel algorithms are developed to process various GSGQs. Extensive experiments on real datasets Gowalla and Twitter show that our proposed methods substantially outperform the baseline algorithms under various system settings.

...read moreread less

71 citations

Proceedings Article•DOI•

On Finding Socially Tenuous Groups for Online Social Networks

[...]

Chih-Ya Shen¹, Liang-Hao Huang², De-Nian Yang², Hong-Han Shuai³, Wang-Chien Lee⁴, Ming-Syan Chen⁵ - Show less +2 more•Institutions (5)

National Tsing Hua University¹, Academia Sinica², National Chiao Tung University³, Pennsylvania State University⁴, National Taiwan University⁵

04 Aug 2017

TL;DR: This paper introduces the notion of k-triangles to measure the tenuity of a group and formulates a new research problem, Minimum k-Triangle Disconnected Group (MkTG), to find a socially tenuous group from online social networks.

...read moreread less

Abstract: Existing research on finding social groups mostly focuses on dense subgraphs in social networks. However, finding socially tenuous groups also has many important applications. In this paper, we introduce the notion of k-triangles to measure the tenuity of a group. We then formulate a new research problem, Minimum k-Triangle Disconnected Group (MkTG), to find a socially tenuous group from online social networks. We prove that MkTG is NP-Hard and inapproximable within any ratio in arbitrary graphs but polynomial-time tractable in threshold graphs. Two algorithms, namely TERA and TERA-ADV, are designed to exploit graph-theoretical approaches for solving MkTG on general graphs effectively and efficiently. Experimental results on seven real datasets manifest that the proposed algorithms outperform existing approaches in both efficiency and solution quality.

...read moreread less

22 citations

Proceedings Article•DOI•

Optimizing Word2Vec Performance on Multicore Systems

[...]

Vasudevan Rengasamy¹, Tao-Yang Fu¹, Wang-Chien Lee¹, Kamesh Madduri¹•Institutions (1)

Pennsylvania State University¹

12 Nov 2017

TL;DR: A new optimization called context combining is introduced to further boost SGNS performance on multicore systems and it is shown that this approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x better than a recent parallel Word2 Vec implementation.

...read moreread less

Abstract: The Skip-gram with negative sampling (SGNS) method of Word2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.

...read moreread less

11 citations

Proceedings Article•DOI•

BTCI: A new framework for identifying congestion cascades using bus trajectory data

[...]

Meng-Fen Chiang¹, Ee-Peng Lim¹, Wang-Chien Lee², Agus Trisnajaya Kwee¹•Institutions (2)

Singapore Management University¹, Pennsylvania State University²

01 Jul 2017

TL;DR: Extensive evaluations on 11.8 million bus trajectory data show that BTCI can effectively identify congestion cascades, the proposed congestion score is effective in extracting congested segments, and the proposed unified approach significantly outperforms alternative approaches in terms of extended precision.

...read moreread less

Abstract: The knowledge of traffic health status is essential to the general public and urban traffic management. To identify congestion cascades, an important phenomenon of traffic health, we propose a Bus Trajectory based Congestion Identification (BTCI) framework that explores the anomalous traffic health status and structure properties of congestion cascades using bus trajectory data. BTCI consists of two main steps, congested segment extraction and congestion cascades identification. The former constructs path speed models from historical vehicle transitions and design a non-parametric Kernel Density Estimation (KDE) function to derive a measure of congestion score. The latter aggregates congested segments (i.e., those with high congestion scores) into traffic congestion cascades by unifying both attribute coherence and spatio-temporal closeness of congested segments within a cascade. Extensive evaluations on 11.8 million bus trajectory data show that (1) BTCI can effectively identify congestion cascades, (2) the proposed congestion score is effective in extracting congested segments, (3) the proposed unified approach significantly outperforms alternative approaches in terms of extended precision, and (4) the identified congestion cascades are realistic, matching well with the traffic news and highly correlated with vehicle speed bands.

...read moreread less

10 citations

Journal Article•DOI•

Crowdsourcing emergency data in non-operational cellular networks

[...]

Georgios Chatzimilioudis¹, Constantinos Costa¹, Demetrios Zeinalipour-Yazti¹, Wang-Chien Lee²•Institutions (2)

University of Cyprus¹, Pennsylvania State University²

01 Mar 2017-Information Systems

TL;DR: In this article, the authors develop techniques that generate the k-Nearest Neighbor (kNN) overlay graph of an arbitrary crowd that interconnects over some short-range communication technology.

...read moreread less

5 citations

Proceedings Article•DOI•

Modeling Menu Bundle Designs of Crowdfunding Projects

[...]

Yusan Lin¹, Peifeng Yin², Wang-Chien Lee¹•Institutions (2)

Pennsylvania State University¹, IBM²

06 Nov 2017

TL;DR: This paper proposes a probabilistic generative model, Menu-Offering-Bundle (MOB) model, to capture the offering and bundling decisions of project creators based on collected data of 14K crowdfunding projects and their 149K reward bundles across a half-year period, and shows that the learned offering and bundle topics carry distinguishable meanings and provide insights of key factors on project success.

...read moreread less

Abstract: Offering products in the forms of menu bundles is a common practice in marketing to attract customers and maximize revenues. In crowdfunding platforms such as Kickstarter, rewards also play an important part in influencing project success. Designing rewards consisting of the appropriate items is a challenging yet crucial task for the project creators. However, prior research has not considered the strategies project creators take to offer and bundle the rewards, making it hard to study the impact of reward designs on project success. In this paper, we raise a novel research question: understanding project creators' decisions of reward designs to level their chance to succeed. We approach this by modeling the design behavior of project creators, and identifying the behaviors that lead to project success. We propose a probabilistic generative model, Menu-Offering-Bundle (MOB) model, to capture the offering and bundling decisions of project creators based on collected data of 14K crowdfunding projects and their 149K reward bundles across a half-year period. Our proposed model is shown to capture the offering and bundling topics, outperform the baselines in predicting reward designs. We also find that the learned offering and bundling topics carry distinguishable meanings and provide insights of key factors on project success.

...read moreread less

2 citations

Posted Content•

Mining Online Social Data for Detecting Social Network Mental Disorders

[...]

Hong-Han Shuai¹, Chih-Ya Shen¹, De-Nian Yang¹, Yi-Feng Lan², Wang-Chien Lee³, Philip S. Yu⁴, Ming-Syan Chen⁵ - Show less +3 more•Institutions (5)

Academia Sinica¹, Tamkang University², Pennsylvania State University³, University of Illinois at Chicago⁴, National Taiwan University⁵

13 Feb 2017-arXiv: Social and Information Networks

TL;DR: Zhang et al. as mentioned in this paper proposed a machine learning framework, namely, Social Network Mental Disorder Detection (SNMDD), that exploits features extracted from social network data to accurately identify potential cases of social network mental disorders.

...read moreread less

Abstract: An increasing number of social network mental disorders (SNMDs), such as Cyber-Relationship Addiction, Information Overload, and Net Compulsion, have been recently noted. Symptoms of these mental disorders are usually observed passively today, resulting in delayed clinical intervention. In this paper, we argue that mining online social behavior provides an opportunity to actively identify SNMDs at an early stage. It is challenging to detect SNMDs because the mental factors considered in standard diagnostic criteria (questionnaire) cannot be observed from online social activity logs. Our approach, new and innovative to the practice of SNMD detection, does not rely on self-revealing of those mental factors via questionnaires. Instead, we propose a machine learning framework, namely, Social Network Mental Disorder Detection (SNMDD), that exploits features extracted from social network data to accurately identify potential cases of SNMDs. We also exploit multi-source learning in SNMDD and propose a new SNMD-based Tensor Model (STM) to improve the performance. Our framework is evaluated via a user study with 3126 online social network users. We conduct a feature analysis, and also apply SNMDD on large-scale datasets and analyze the characteristics of the three SNMD types. The results show that SNMDD is promising for identifying online social network users with potential SNMDs.

...read moreread less

2 citations

Posted Content•

DTNC: A New Server-side Data Cleansing Framework for Cellular Trajectory Services.

[...]

Jian Dai, Fei He, Wang-Chien Lee, Gang Chen, Beng Chin Ooi - Show less +1 more

01 Mar 2017-arXiv: Networking and Internet Architecture

TL;DR: A dynamic transportation network (DTN), which associates a network edge with a probabilistic distribution of travel times updated continuously, and an object motion model, namely, {\em travel-time-aware hidden semi-Markov model} ({\em TT-HsMM), which is used to infer the most probable traveled edge sequences on DTN.

...read moreread less

Abstract: It is essential for the cellular network operators to provide cellular location services to meet the needs of their users and mobile applications. However, cellular locations, estimated by network-based methods at the server-side, bear with {\it high spatial errors} and {\it arbitrary missing locations}. Moreover, auxiliary sensor data at the client-side are not available to the operators. In this paper, we study the {\em cellular trajectory cleansing problem} and propose an innovative data cleansing framework, namely \underline{D}ynamic \underline{T}ransportation \underline{N}etwork based \underline{C}leansing (DTNC) to improve the quality of cellular locations delivered in online cellular trajectory services. We maintain a dynamic transportation network (DTN), which associates a network edge with a probabilistic distribution of travel times updated continuously. In addition, we devise an object motion model, namely, {\em travel-time-aware hidden semi-Markov model} ({\em TT-HsMM}), which is used to infer the most probable traveled edge sequences on DTN. To validate our ideas, we conduct a comprehensive evaluation using real-world cellular data provided by a major cellular network operator and a GPS dataset collected by smartphones as the ground truth. In the experiments, DTNC displays significant advantages over six state-of-the-art techniques.

...read moreread less

2 citations

Newsfeed Screening for Behavioral Therapy to Social Network Mental Disorders.

[...]

Hong-Han Shuai, Yen-Chieh Lien, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip S. Yu - Show less +2 more

01 Jan 2017

TL;DR: A learning framework, namely, SNMD-Aware Personalized nEwsfeed Ranking (SAPER), that exploits features extracted from social network data to measure the addictive degree of a newsfeed and a randomized algorithm called Computing Budget Optimization for MEMIC with Newsfeed Dierentiation (CBOM-ND), which is promising for alleviating the symptoms of online social network users with potential SNMDs.

...read moreread less

Abstract: While the popularity of social network applications continues to grow, increasing cases of social networkmental disorders (SNMDs) are also noted. For behavioral therapy of SNMDs, an idea, similar to providing electronic cigarees to addictive smokers, is to substitute highly-addictive newsfeeds with safer, less-addictive ones to those users. Nevertheless, this idea faces two major challenges: 1) how to measure the addictive degree of a newsfeed to an SNMD user, and 2) how to exploit the theories in Psychology to determine appropriate substitution of newsfeeds for the therapy. To address these issues, in this paper, we propose a learning framework, namely, SNMD-Aware Personalized nEwsfeed Ranking (SAPER), that exploits features extracted from social network data to measure the addictive degree of a newsfeed. With the quantied addictive degrees of newsfeeds, we formulate a new optimization problem, namely, Multi-Ecacy Maximization with Interest Constraint (MEMIC), to maximize the ecacy of the behavioral therapy, without sacricing the interests of users. Accordingly, we propose a randomized algorithm called Computing Budget Optimization for MEMIC with Newsfeed Dierentiation (CBOM-ND). To validate our idea, we conduct a user study on 517 online social network users to evaluate the proposed SAPER framework. Moreover, we conduct experiments on large-scale datasets to evaluate the proposed CBOM-ND. e results show that our approach is promising for alleviating the symptoms of online social network users with potential SNMDs.

...read moreread less