Showing papers by "Wang-Chien Lee published in 2018"
••
TL;DR: This paper proposes a machine learning framework, namely, Social Network Mental Disorder Detection (SNMDD), that exploits features extracted from social network data to accurately identify potential cases of SNMDs and proposes a new SNMD-based Tensor Model (STM) to improve the accuracy.
Abstract: The explosive growth in popularity of social networking leads to the problematic usage. An increasing number of social network mental disorders (SNMDs), such as Cyber-Relationship Addiction, Information Overload, and Net Compulsion, have been recently noted. Symptoms of these mental disorders are usually observed passively today, resulting in delayed clinical intervention. In this paper, we argue that mining online social behavior provides an opportunity to actively identify SNMDs at an early stage. It is challenging to detect SNMDs because the mental status cannot be directly observed from online social activity logs. Our approach, new and innovative to the practice of SNMD detection, does not rely on self-revealing of those mental factors via questionnaires in Psychology. Instead, we propose a machine learning framework, namely, Social Network Mental Disorder Detection (SNMDD) , that exploits features extracted from social network data to accurately identify potential cases of SNMDs. We also exploit multi-source learning in SNMDD and propose a new SNMD-based Tensor Model (STM) to improve the accuracy. To increase the scalability of STM, we further improve the efficiency with performance guarantee. Our framework is evaluated via a user study with 3,126 online social network users. We conduct a feature analysis, and also apply SNMDD on large-scale datasets and analyze the characteristics of the three SNMD types. The results manifest that SNMDD is promising for identifying online social network users with potential SNMDs.
47 citations
•
01 Jan 2018TL;DR: A new Social-aware Diverse and Preferred Live Streaming Channel Query (SDSQ) that jointly selects a set of diverse and preferred live streaming channels and a group of socially tight viewers is formulated, it is proved that SDSQ is NP-hard and inapproximable within any factor, and SDSSel, a 2approximation algorithm with a guaranteed error bound is designed.
Abstract: The popularity of live streaming has led to the explosive growth in new video contents and social communities on emerging platforms such as Facebook Live and Twitch. Viewers on these platforms are able to follow multiple streams of live events simultaneously, while engaging discussions with friends. However, existing approaches for selecting live streaming channels still focus on satisfying individual preferences of users, without considering the need to accommodate real-time social interactions among viewers and to diversify the content of streams. In this paper, therefore, we formulate a new Social-aware Diverse and Preferred Live Streaming Channel Query (SDSQ) that jointly selects a set of diverse and preferred live streaming channels and a group of socially tight viewers. We prove that SDSQ is NP-hard and inapproximable within any factor, and design SDSSel, a 2approximation algorithm with a guaranteed error bound. We perform a user study on Twitch with 432 participants to validate the need of SDSQ and the usefulness of SDSSel. We also conduct large-scale experiments on real datasets to demonstrate the superiority of the proposed algorithm over several baselines in terms of solution quality and efficiency.
22 citations
••
16 Apr 2018
TL;DR: This paper proposes a general adaptive regularization method based on Gaussian Mixture to learn the best regularization function according to the observed parameters, and develops an effective update algorithm which integrates Expectation Maximization with Stochastic Gradient Descent.
Abstract: Deep Learning and Machine Learning models have recently been shown to be effective in many real world applications. While these models achieve increasingly better predictive performance, their structures have also become much more complex. A common and difficult problem for complex models is overfitting. Regularization is used to penalize the complexity of the model in order to avoid overfitting. However, in most learning frameworks, regularization function is usually set as some hyper parameters, and therefore the best setting is difficult to find. In this paper, we propose an adaptive regularization method, as part of a large end-to-end healthcare data analytics software stack, which effectively addresses the above difficulty. First, we propose a general adaptive regularization method based on Gaussian Mixture (GM) to learn the best regularization function according to the observed parameters. Second, we develop an effective update algorithm which integrates Expectation Maximization (EM) with Stochastic Gradient Descent (SGD). Third, we design a lazy update algorithm to reduce the computational cost by 4x. The overall regularization framework is fast, adaptive and easy-to-use. We validate the effectiveness of our regularization method through an extensive experimental study over 13 standard benchmark datasets and three kinds of deep learning/machine learning models. The results illustrate that our proposed adaptive regularization method achieves significant improvement over state-of-the-art regularization methods.
12 citations
••
10 Apr 2018TL;DR: A probabilistic generative model, Dynamic Market Competition (DMC) model, is proposed to capture the competitiveness of projects in crowdfunding very well, and significantly outperforms several baseline approaches in predicting the daily collected funds of crowdfunding projects.
Abstract: The often fierce competition on crowdfunding markets can significantly affect project success. While various factors have been considered in predicting the success of crowdfunding projects, to the best knowledge of the authors, the phenomenon of competition has not been investigated. In this paper, we study the competition on crowdfunding markets through data analysis, and propose a probabilistic generative model, Dynamic Market Competition (DMC) model, to capture the competitiveness of projects in crowdfunding. Through an empirical evaluation using the pledging history of past crowdfunding projects, our approach has shown to capture the competitiveness of projects very well, and significantly outperforms several baseline approaches in predicting the daily collected funds of crowdfunding projects, reducing errors by 31.73% to 45.14%. In addition, our analyses on the correlations between project competitiveness, project design factors, and project success indicate that highly competitive projects, while being winners under various setting of project design factors, are particularly impressive with high pledging goals and high price rewards, comparing to medium and low competitive projects. Finally, the competitiveness of projects learned by DMC is shown to be very useful in applications of predicting final success and days taken to hit pledging goal, reaching 85% accuracy and error of less than 7 days, respectively, with limited information at early pledging stage.
11 citations
••
TL;DR: An efficient algorithm is proposed, which exploits the OB-tree and a binary traversal order of data objects to accelerate query processing of RONN, and the experimental result shows that the RRONN-OBA algorithm outperforms the two R-tree based algorithms and RONn-OA significantly.
Abstract: In this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search. As a natural generalization of continuous obstructed nearest-neighbor (CONN), an RONN query retrieves a set of obstructed nearest neighbors corresponding to every point in a specified range. We propose a new index, namely binary obstructed tree (called OB-tree ), for indexing complex objects in the obstructed space. The novelty of OB-tree lies in the idea of dividing the obstructed space into non-obstructed subspaces , aiming to efficiently retrieve highly qualified candidates for RONN processing. We develop an algorithm for construction of the OB-tree and propose a space division scheme, called optimal obstacle balance (OOB2) scheme, to address the tree balance problem. Accordingly, we propose an efficient algorithm, called RONN by OB-tree Acceleration (RONN-OBA), which exploits the OB-tree and a binary traversal order of data objects to accelerate query processing of RONN. In addition, we extend our work in several aspects regarding the shape of obstacles, and range-based $k$ NN queries in obstructed space. At last, we conduct a comprehensive performance evaluation using both real and synthetic datasets to validate our ideas and the proposed algorithms. The experimental result shows that the RONN-OBA algorithm outperforms the two R-tree based algorithms and RONN-OA significantly.
7 citations
••
17 Oct 2018
TL;DR: By mining OSN data in support of online intervention treatment, data scientists may assist mental healthcare professionals to alleviate the symptoms of users with SNA in early stages, and propose a novel framework, called Newsfeed Substituting and Supporting System (N3S), for newsfeed filtering and dissemination in support.
Abstract: While the popularity of online social network (OSN) apps continues to grow, little attention has been drawn to the increasing cases of Social Network Addictions (SNAs). In this paper, we argue that by mining OSN data in support of online intervention treatment, data scientists may assist mental healthcare professionals to alleviate the symptoms of users with SNA in early stages. Our idea, based on behavioral therapy, is to incrementally substitute highly addictive newsfeeds with safer, less addictive, and more supportive newsfeeds. To realize this idea, we propose a novel framework, called Newsfeed Substituting and Supporting System (N3S), for newsfeed filtering and dissemination in support of SNA interventions. New research challenges arise in 1) measuring the addictive degree of a newsfeed to an SNA patient, and 2) properly substituting addictive newsfeeds with safe ones based on psychological theories. To address these issues, we first propose the Additive Degree Model (ADM) to measure the addictive degrees of newsfeeds to different users. We then formulate a new optimization problem aiming to maximize the efficacy of behavioral therapy without sacrificing user preferences. Accordingly, we design a randomized algorithm with a theoretical bound. A user study with 716 Facebook users and 11 mental healthcare professionals around the world manifests that the addictive scores can be reduced by more than 30%. Moreover, experiments show that the correlation between the SNA scores and the addictive degrees quantified by the proposed model is much greater than that of state-of-the-art preference based models.
4 citations
••
01 Jul 2018TL;DR: Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.
Abstract: Feature hashing is widely used to process large scale sparse features for learning of predictive models. Collisions inherently happen in the hashing process and hurt the model performance. In this paper, we develop a feature hashing scheme called Cuckoo Feature Hashing(CCFH) based on the principle behind Cuckoo hashing, a hashing scheme designed to resolve collisions. By providing multiple possible hash locations for each feature, CCFH prevents the collisions between predictive features by dynamically hashing them into alternative locations during model training. Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.
4 citations
••
TL;DR: The findings of leveraging human behavior patterns and spatial correlation among Wi-Fi access points to infer the location type of an SSID are reported, and the experiment results demonstrate the effectiveness of the schemes.
4 citations
••
TL;DR: New machine learning approaches for detecting outdated POI information via web‐derived features through classification and ranking using a real‐world dataset crawled from Yellow Pages websites are proposed.
4 citations
01 Jan 2018
TL;DR: This research proposes a semi-supervised career track labelling framework to automatically assign career tracks for large set of jobs and reduces the human annotation efforts in maintaining the career track knowledge databases over time across different geographical regions, but also facilitates data science study on career movements.
Abstract: Career track represents a vertical career pathway, where one can gradually move up to take up higher job appointments when relevant skills are acquired. Understanding the propensity of career movements in an evolving job market can enable timely career guidance to job seekers and working professionals. To this end, we harvest career trajectories from online professional network (OPN). Our focus lies on obtaining a macro view on career movements at the track granularity. Specifically, we propose a semi-supervised career track labelling framework to automatically assign career tracks for large set of jobs. To contextually label jobs, we collect example jobs with career track labels identified by human resource specialists and domain experts in Singapore. An intuitive idea is to learn the labelling knowledge from the example jobs and then apply to jobs in OPN. Unfortunately, such small amount of labeled jobs presents a great challenge in our attempt to accurately recover career tracks for plentiful unlabelled jobs. We thus address the issue by resorting to semi-supervised learning methods. This research not only reduces the human annotation efforts in maintaining the career track knowledge databases over time across different geographical regions, but also facilitates data science study on career movements. Extensive experiments are conducted to demonstrate the labelling accuracy as well as to gain insights upon obtained career track labels.
3 citations
••
17 Oct 2018TL;DR: A novel framework to infer occupancy of car trips by exploring characteristics of observed occupied trips, and comprehensive experiments on real vehicle trajectories from self-employed drivers show that the proposed stop point classifier predicts stop point labels with high accuracy, and the proposed segmentation algorithm GS delivers the best accuracy performance with efficient running time.
Abstract: The knowledge of all occupied and unoccupied trips made by self-employed drivers are essential for optimized vehicle dispatch by ride-hailing services (e.g., Didi Dache, Uber, Lyft, Grab, etc.). However, the occupancy status of vehicles is not always known to the service operators due to adoption of multiple ride-hailing apps. In this paper, we propose a novel framework, Learning to INfer Trips (LINT), to infer occupancy of car trips by exploring characteristics of observed occupied trips. Two main research steps, stop point classification and structural segmentation, are included in LINT. In the stop point classification step, we represent a vehicle trajectory as a sequence of stop points, and assign stop points with pick-up, drop-off, and intermediate labels. The classification of vehicle trajectory stop points produces a stop point label sequence. For structural segmentation, we further propose several segmentation algorithms, including greedy segmentation (GS), efficient greedy segmentation (EGS), and dynamic programming-based segmentation (DP) to infer occupied trip from stop point label sequences. Our comprehensive experiments on real vehicle trajectories from self-employed drivers show that (1) the proposed stop point classifier predicts stop point labels with high accuracy, and (2) the proposed segmentation algorithm GS delivers the best accuracy performance with efficient running time.