scispace - formally typeset
Search or ask a question

Showing papers by "Wang-Chien Lee published in 2017"


Proceedings ArticleDOI
06 Nov 2017
TL;DR: Empirical results show that HIN2Vec soundly outperforms the state-of-the-art representation learning models for network data, including DeepWalk, LINE, node2vec, PTE, HINE and ESim, by 6.6% to 23.8% of $micro$-$f_1$ in multi-label node classification and 5% to 70.8%, in link prediction.
Abstract: In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich semantics embedded in HINs by exploiting different types of relationships among nodes. Given a set of relationships specified in forms of meta-paths in an HIN, HIN2Vec carries out multiple prediction training tasks jointly based on a target set of relationships to learn latent vectors of nodes and meta-paths in the HIN. In addition to model design, several issues unique to HIN2Vec, including regularization of meta-path vectors, node type selection in negative sampling, and cycles in random walks, are examined. To validate our ideas, we learn latent vectors of nodes using four large-scale real HIN datasets, including Blogcatalog, Yelp, DBLP and U.S. Patents, and use them as features for multi-label node classification and link prediction applications on those networks. Empirical results show that HIN2Vec soundly outperforms the state-of-the-art representation learning models for network data, including DeepWalk, LINE, node2vec, PTE, HINE and ESim, by 6.6% to 23.8% of $micro$-$f_1$ in multi-label node classification and 5% to 70.8% of $MAP$ in link prediction.

532 citations


Journal ArticleDOI
01 Oct 2017
TL;DR: This paper proposes a new family of GSGQs with minimum acquaintance constraints, which are more appealing to users as they guarantee a worst-case acquaintance level in the result group and substantially outperform the baseline algorithms under various system settings.
Abstract: The prosperity of location-based social networking has paved the way for new applications of group-based activity planning and marketing. While such applications heavily rely on geo-social group queries (GSGQs), existing studies fail to produce a cohesive group in terms of user acquaintance. In this paper, we propose a new family of GSGQs with minimum acquaintance constraints. They are more appealing to users as they guarantee a worst-case acquaintance level in the result group. For efficient processing of GSGQs on large location-based social networks, we devise two social-aware spatial index structures, namely SaR-tree and SaR*-tree. The latter improves on the former by considering both spatial and social distances when clustering objects. Based on SaR-tree and SaR*-tree, novel algorithms are developed to process various GSGQs. Extensive experiments on real datasets Gowalla and Twitter show that our proposed methods substantially outperform the baseline algorithms under various system settings.

71 citations


Proceedings ArticleDOI
04 Aug 2017
TL;DR: This paper introduces the notion of k-triangles to measure the tenuity of a group and formulates a new research problem, Minimum k-Triangle Disconnected Group (MkTG), to find a socially tenuous group from online social networks.
Abstract: Existing research on finding social groups mostly focuses on dense subgraphs in social networks. However, finding socially tenuous groups also has many important applications. In this paper, we introduce the notion of k-triangles to measure the tenuity of a group. We then formulate a new research problem, Minimum k-Triangle Disconnected Group (MkTG), to find a socially tenuous group from online social networks. We prove that MkTG is NP-Hard and inapproximable within any ratio in arbitrary graphs but polynomial-time tractable in threshold graphs. Two algorithms, namely TERA and TERA-ADV, are designed to exploit graph-theoretical approaches for solving MkTG on general graphs effectively and efficiently. Experimental results on seven real datasets manifest that the proposed algorithms outperform existing approaches in both efficiency and solution quality.

22 citations


Proceedings ArticleDOI
12 Nov 2017
TL;DR: A new optimization called context combining is introduced to further boost SGNS performance on multicore systems and it is shown that this approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x better than a recent parallel Word2 Vec implementation.
Abstract: The Skip-gram with negative sampling (SGNS) method of Word2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.

11 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: Extensive evaluations on 11.8 million bus trajectory data show that BTCI can effectively identify congestion cascades, the proposed congestion score is effective in extracting congested segments, and the proposed unified approach significantly outperforms alternative approaches in terms of extended precision.
Abstract: The knowledge of traffic health status is essential to the general public and urban traffic management. To identify congestion cascades, an important phenomenon of traffic health, we propose a Bus Trajectory based Congestion Identification (BTCI) framework that explores the anomalous traffic health status and structure properties of congestion cascades using bus trajectory data. BTCI consists of two main steps, congested segment extraction and congestion cascades identification. The former constructs path speed models from historical vehicle transitions and design a non-parametric Kernel Density Estimation (KDE) function to derive a measure of congestion score. The latter aggregates congested segments (i.e., those with high congestion scores) into traffic congestion cascades by unifying both attribute coherence and spatio-temporal closeness of congested segments within a cascade. Extensive evaluations on 11.8 million bus trajectory data show that (1) BTCI can effectively identify congestion cascades, (2) the proposed congestion score is effective in extracting congested segments, (3) the proposed unified approach significantly outperforms alternative approaches in terms of extended precision, and (4) the identified congestion cascades are realistic, matching well with the traffic news and highly correlated with vehicle speed bands.

10 citations


Journal ArticleDOI
TL;DR: In this article, the authors develop techniques that generate the k-Nearest Neighbor (kNN) overlay graph of an arbitrary crowd that interconnects over some short-range communication technology.

5 citations


Proceedings ArticleDOI
06 Nov 2017
TL;DR: This paper proposes a probabilistic generative model, Menu-Offering-Bundle (MOB) model, to capture the offering and bundling decisions of project creators based on collected data of 14K crowdfunding projects and their 149K reward bundles across a half-year period, and shows that the learned offering and bundle topics carry distinguishable meanings and provide insights of key factors on project success.
Abstract: Offering products in the forms of menu bundles is a common practice in marketing to attract customers and maximize revenues. In crowdfunding platforms such as Kickstarter, rewards also play an important part in influencing project success. Designing rewards consisting of the appropriate items is a challenging yet crucial task for the project creators. However, prior research has not considered the strategies project creators take to offer and bundle the rewards, making it hard to study the impact of reward designs on project success. In this paper, we raise a novel research question: understanding project creators' decisions of reward designs to level their chance to succeed. We approach this by modeling the design behavior of project creators, and identifying the behaviors that lead to project success. We propose a probabilistic generative model, Menu-Offering-Bundle (MOB) model, to capture the offering and bundling decisions of project creators based on collected data of 14K crowdfunding projects and their 149K reward bundles across a half-year period. Our proposed model is shown to capture the offering and bundling topics, outperform the baselines in predicting reward designs. We also find that the learned offering and bundling topics carry distinguishable meanings and provide insights of key factors on project success.

2 citations


Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed a machine learning framework, namely, Social Network Mental Disorder Detection (SNMDD), that exploits features extracted from social network data to accurately identify potential cases of social network mental disorders.
Abstract: An increasing number of social network mental disorders (SNMDs), such as Cyber-Relationship Addiction, Information Overload, and Net Compulsion, have been recently noted. Symptoms of these mental disorders are usually observed passively today, resulting in delayed clinical intervention. In this paper, we argue that mining online social behavior provides an opportunity to actively identify SNMDs at an early stage. It is challenging to detect SNMDs because the mental factors considered in standard diagnostic criteria (questionnaire) cannot be observed from online social activity logs. Our approach, new and innovative to the practice of SNMD detection, does not rely on self-revealing of those mental factors via questionnaires. Instead, we propose a machine learning framework, namely, Social Network Mental Disorder Detection (SNMDD), that exploits features extracted from social network data to accurately identify potential cases of SNMDs. We also exploit multi-source learning in SNMDD and propose a new SNMD-based Tensor Model (STM) to improve the performance. Our framework is evaluated via a user study with 3126 online social network users. We conduct a feature analysis, and also apply SNMDD on large-scale datasets and analyze the characteristics of the three SNMD types. The results show that SNMDD is promising for identifying online social network users with potential SNMDs.

2 citations


Posted Content
TL;DR: A dynamic transportation network (DTN), which associates a network edge with a probabilistic distribution of travel times updated continuously, and an object motion model, namely, {\em travel-time-aware hidden semi-Markov model} ({\em TT-HsMM), which is used to infer the most probable traveled edge sequences on DTN.
Abstract: It is essential for the cellular network operators to provide cellular location services to meet the needs of their users and mobile applications. However, cellular locations, estimated by network-based methods at the server-side, bear with {\it high spatial errors} and {\it arbitrary missing locations}. Moreover, auxiliary sensor data at the client-side are not available to the operators. In this paper, we study the {\em cellular trajectory cleansing problem} and propose an innovative data cleansing framework, namely \underline{D}ynamic \underline{T}ransportation \underline{N}etwork based \underline{C}leansing (DTNC) to improve the quality of cellular locations delivered in online cellular trajectory services. We maintain a dynamic transportation network (DTN), which associates a network edge with a probabilistic distribution of travel times updated continuously. In addition, we devise an object motion model, namely, {\em travel-time-aware hidden semi-Markov model} ({\em TT-HsMM}), which is used to infer the most probable traveled edge sequences on DTN. To validate our ideas, we conduct a comprehensive evaluation using real-world cellular data provided by a major cellular network operator and a GPS dataset collected by smartphones as the ground truth. In the experiments, DTNC displays significant advantages over six state-of-the-art techniques.

2 citations


01 Jan 2017
TL;DR: A learning framework, namely, SNMD-Aware Personalized nEwsfeed Ranking (SAPER), that exploits features extracted from social network data to measure the addictive degree of a newsfeed and a randomized algorithm called Computing Budget Optimization for MEMIC with Newsfeed Di‚erentiation (CBOM-ND), which is promising for alleviating the symptoms of online social network users with potential SNMDs.
Abstract: While the popularity of social network applications continues to grow, increasing cases of social networkmental disorders (SNMDs) are also noted. For behavioral therapy of SNMDs, an idea, similar to providing electronic cigareŠes to addictive smokers, is to substitute highly-addictive newsfeeds with safer, less-addictive ones to those users. Nevertheless, this idea faces two major challenges: 1) how to measure the addictive degree of a newsfeed to an SNMD user, and 2) how to exploit the theories in Psychology to determine appropriate substitution of newsfeeds for the therapy. To address these issues, in this paper, we propose a learning framework, namely, SNMD-Aware Personalized nEwsfeed Ranking (SAPER), that exploits features extracted from social network data to measure the addictive degree of a newsfeed. With the quanti€ed addictive degrees of newsfeeds, we formulate a new optimization problem, namely, Multi-Ecacy Maximization with Interest Constraint (MEMIC), to maximize the ecacy of the behavioral therapy, without sacri€cing the interests of users. Accordingly, we propose a randomized algorithm called Computing Budget Optimization for MEMIC with Newsfeed Di‚erentiation (CBOM-ND). To validate our idea, we conduct a user study on 517 online social network users to evaluate the proposed SAPER framework. Moreover, we conduct experiments on large-scale datasets to evaluate the proposed CBOM-ND. Œe results show that our approach is promising for alleviating the symptoms of online social network users with potential SNMDs.