scispace - formally typeset
Search or ask a question
Author

Jing-Kai Lou

Bio: Jing-Kai Lou is an academic researcher from National Taiwan University. The author has contributed to research in topics: Computer science & Ranking. The author has an hindex of 7, co-authored 19 publications receiving 309 citations. Previous affiliations of Jing-Kai Lou include Academia Sinica & National Chiao Tung University.

Papers
More filters
Proceedings Article
01 Jan 2010
TL;DR: This team is the first prize winner of both tracks (all teams and student teams) of KDD Cup 2010 and combined results of student sub-teams by regularized linear regression.
Abstract: KDD Cup 2010 is an educational data mining competition. Participants are asked to learn a model from students' past behavior and then predict their future performance. At National Taiwan University, we organized a course for this competition. Most student sub-teams expanded features by various binarization and discretization techniques. The resulting sparse feature sets were trained by logistic regression (using LIBLINEAR). One sub-team considered condensed features using simple statistical techniques and applied Random Forest (through Weka) for training. Initial development was conducted on an internal split of training data for training and validation. We identied some useful feature combinations to improve performance. For the nal submission, we combined results of student sub-teams by regularized linear regression. Our team is the rst prize winner of both tracks (all teams and student teams) of KDD Cup 2010.

168 citations

Proceedings ArticleDOI
04 Aug 2014
TL;DR: This paper proposes a fairness-aware recommendation system based on one-class collaborative-filtering techniques for charity and micro-loan platform such as Kiva.org that can largely improve the loan distribution fairness while retaining the accuracy of recommendations.
Abstract: Up to date, more than 15 billion US dollars have been invested in microfinance that benefited more than 160 million people in developing countries. The Kiva organization is one of the successful examples that use a decentralized matching process to match lenders and borrowers. Interested lenders from around the world can look for cases among thousands of applicants they found promising to lend the money to. But how can loan borrowers and lenders be successfully matched up in a microfinance platform like Kiva? We argue that a sophisticate recommender not only pairs up loan lenders and borrowers in accordance to their preferences, but should also help to diversify the distribution of donations to reduce the inequality of loans is highly demanded, as altruism, like any resource, can be congestible.In this paper, we propose a fairness-aware recommendation system based on one-class collaborative-filtering techniques for charity and micro-loan platform such as Kiva.org. Our experiments on real dataset indicates that the proposed method can largely improve the loan distribution fairness while retaining the accuracy of recommendations.

44 citations

Proceedings ArticleDOI
13 May 2013
TL;DR: The behavioral patterns observed in players of Fairyland Online are reported during social interactions when playing as in-game avatars of their own real gender or gender-swapped, and the effect of gender role and self-image in virtual social situations is discussed.
Abstract: Modern Massively Multiplayer Online Role-Playing Games (MMORPGs) provide lifelike virtual environments in which players can conduct a variety of activities including combat, trade, and chat with other players. While the game world and the available actions therein are inspired by their offline counterparts, the games' popularity and dedicated fan base are testaments to the allure of novel social interactions granted to people by allowing them an alternative life as a new character and persona. In this paper we investigate the phenomenon of "gender swapping," which refers to players choosing avatars of genders opposite to their natural ones. We report the behavioral patterns observed in players of Fairyland Online, a globally serviced MMORPG, during social interactions when playing as in-game avatars of their own real gender or gender-swapped. We also discuss the effect of gender role and self-image in virtual social situations and the potential of our study for improving MMORPG quality and detecting online identity frauds.

37 citations

Journal ArticleDOI
TL;DR: This paper analyzes revisitations in online games focusing on two types of revisitations: game revisitations and area revisitations, and discovers four main groups of area revisitation patterns.

15 citations

Proceedings ArticleDOI
11 Jan 2009
TL;DR: A collusion-resistant automation scheme for social moderation systems that detects whether an accusation from a user moderator is fair or malicious based on the structure of mutual accusations of all users in the system.
Abstract: For current Web 2.0 services, manual examination of user uploaded content is normally required to ensure its legitimacy and appropriateness, which is a substantial burden to service providers. To reduce labor costs and the delays caused by content censoring, social moderation has been proposed as a front-line mechanism, whereby user moderators are encouraged to examine content before system moderation is required. Given the immerse amount of new content added to the Web each day, there is a need for automation schemes to facilitate rear system moderation. This kind of mechanism is expected to automatically summarize reports from user moderators and ban misbehaving users or remove inappropriate content whenever possible. However, the accuracy of such schemes may be reduced by collusion attacks, where some work together to mislead the automatic summarization in order to obtain shared benefits. In this paper, we propose a collusion-resistant automation scheme for social moderation systems. Because some user moderators may collude and dishonestly claim that a user misbehaves, our scheme detects whether an accusation from a user moderator is fair or malicious based on the structure of mutual accusations of all users in the system. Through simulations we show that collusion attacks are likely to succeed if an intuitive count-based automation scheme is used. The proposed scheme, which is based on the community structure of the user accusation graph, achieves a decent performance in most scenarios.

12 citations


Cited by
More filters
Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Proceedings Article
04 Dec 2017
TL;DR: It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.
Abstract: Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

4,977 citations

01 Jan 2012

3,692 citations

Journal ArticleDOI
TL;DR: This review pursues a twofold goal, to preserve and enhance the chronicles of recent educational data mining (EDM) advances development, and provides an analysis of the EDM strengths, weakness, opportunities, and threats, whose factors represent, in a sense, future work to be fulfilled.
Abstract: This review pursues a twofold goal, the first is to preserve and enhance the chronicles of recent educational data mining (EDM) advances development; the second is to organize, analyze, and discuss the content of the review based on the outcomes produced by a data mining (DM) approach. Thus, as result of the selection and analysis of 240 EDM works, an EDM work profile was compiled to describe 222 EDM approaches and 18 tools. A profile of the EDM works was organized as a raw data base, which was transformed into an ad-hoc data base suitable to be mined. As result of the execution of statistical and clustering processes, a set of educational functionalities was found, a realistic pattern of EDM approaches was discovered, and two patterns of value-instances to depict EDM approaches based on descriptive and predictive models were identified. One key finding is: most of the EDM approaches are ground on a basic set composed by three kinds of educational systems, disciplines, tasks, methods, and algorithms each. The review concludes with a snapshot of the surveyed EDM works, and provides an analysis of the EDM strengths, weakness, opportunities, and threats, whose factors represent, in a sense, future work to be fulfilled.

414 citations

Proceedings Article
16 Mar 2016
TL;DR: Ernest, a performance prediction framework for large scale analytics, and evaluation on Amazon EC2 using several workloads shows that the prediction error is low while having a training overhead of less than 5% for long-running jobs.
Abstract: Recent workload trends indicate rapid growth in the deployment of machine learning, genomics and scientific workloads on cloud computing infrastructure. However, efficiently running these applications on shared infrastructure is challenging and we find that choosing the right hardware configuration can significantly improve performance and cost. The key to address the above challenge is having the ability to predict performance of applications under various resource configurations so that we can automatically choose the optimal configuration. Our insight is that a number of jobs have predictable structure in terms of computation and communication. Thus we can build performance models based on the behavior of the job on small samples of data and then predict its performance on larger datasets and cluster sizes. To minimize the time and resources spent in building a model, we use optimal experiment design, a statistical technique that allows us to collect as few training points as required. We have built Ernest, a performance prediction framework for large scale analytics and our evaluation on Amazon EC2 using several workloads shows that our prediction error is low while having a training overhead of less than 5% for long-running jobs.

401 citations