Home
/
Authors
/
Jing-Kai Lou

Author

Jing-Kai Lou

Other affiliations: Academia Sinica, National Chiao Tung University

Bio: Jing-Kai Lou is an academic researcher from National Taiwan University. The author has contributed to research in topics: Computer science & Ranking. The author has an hindex of 7, co-authored 19 publications receiving 309 citations. Previous affiliations of Jing-Kai Lou include Academia Sinica & National Chiao Tung University.

Papers

PDF

Open Access

More filters

Proceedings Article•

Feature Engineering and Classifier Ensemble for KDD Cup 2010

[...]

Hsiang-Fu Yu¹, Hung-Yi Lo², Hsun-Ping Hsieh¹, Jing-Kai Lou², Todd G. McKenzie, Jung-Wei Chou¹, Po-Han Chung, Chia-Hua Ho¹, Chun-Fu Chang¹, Jui-Yu Weng¹, En-Syu Yan, Che-Wei Chang, Tsung-Ting Kuo¹, Chien-Yuan Wang¹, Yi-Hung Huang, Yu-Xun Ruan¹, Yu-Shi Lin, Shou-De Lin¹, Hsuan-Tien Lin¹, Chih-Jen Lin¹ - Show less +16 more•Institutions (2)

National Taiwan University¹, Academia Sinica²

01 Jan 2010

TL;DR: This team is the first prize winner of both tracks (all teams and student teams) of KDD Cup 2010 and combined results of student sub-teams by regularized linear regression.

...read moreread less

Abstract: KDD Cup 2010 is an educational data mining competition. Participants are asked to learn a model from students' past behavior and then predict their future performance. At National Taiwan University, we organized a course for this competition. Most student sub-teams expanded features by various binarization and discretization techniques. The resulting sparse feature sets were trained by logistic regression (using LIBLINEAR). One sub-team considered condensed features using simple statistical techniques and applied Random Forest (through Weka) for training. Initial development was conducted on an internal split of training data for training and validation. We identied some useful feature combinations to improve performance. For the nal submission, we combined results of student sub-teams by regularized linear regression. Our team is the rst prize winner of both tracks (all teams and student teams) of KDD Cup 2010.

...read moreread less

168 citations

Proceedings Article•DOI•

Fairness-Aware Loan Recommendation for Microfinance Services

[...]

Eric L. Lee¹, Jing-Kai Lou², Chen Wei-Ming¹, Yen-Chi Chen¹, Shou-De Lin¹, Yen-Sheng Chiang³, Kuan-Ta Chen² - Show less +3 more•Institutions (3)

National Taiwan University¹, Academia Sinica², The Chinese University of Hong Kong³

04 Aug 2014

TL;DR: This paper proposes a fairness-aware recommendation system based on one-class collaborative-filtering techniques for charity and micro-loan platform such as Kiva.org that can largely improve the loan distribution fairness while retaining the accuracy of recommendations.

...read moreread less

Abstract: Up to date, more than 15 billion US dollars have been invested in microfinance that benefited more than 160 million people in developing countries. The Kiva organization is one of the successful examples that use a decentralized matching process to match lenders and borrowers. Interested lenders from around the world can look for cases among thousands of applicants they found promising to lend the money to. But how can loan borrowers and lenders be successfully matched up in a microfinance platform like Kiva? We argue that a sophisticate recommender not only pairs up loan lenders and borrowers in accordance to their preferences, but should also help to diversify the distribution of donations to reduce the inequality of loans is highly demanded, as altruism, like any resource, can be congestible.In this paper, we propose a fairness-aware recommendation system based on one-class collaborative-filtering techniques for charity and micro-loan platform such as Kiva.org. Our experiments on real dataset indicates that the proposed method can largely improve the loan distribution fairness while retaining the accuracy of recommendations.

...read moreread less

44 citations

Proceedings Article•DOI•

Gender swapping and user behaviors in online social games

[...]

Jing-Kai Lou¹, Kunwoo Park², Meeyoung Cha², Juyong Park², Chin-Laung Lei¹, Kuan-Ta Chen³ - Show less +2 more•Institutions (3)

National Taiwan University¹, KAIST², Academia Sinica³

13 May 2013

TL;DR: The behavioral patterns observed in players of Fairyland Online are reported during social interactions when playing as in-game avatars of their own real gender or gender-swapped, and the effect of gender role and self-image in virtual social situations is discussed.

...read moreread less

Abstract: Modern Massively Multiplayer Online Role-Playing Games (MMORPGs) provide lifelike virtual environments in which players can conduct a variety of activities including combat, trade, and chat with other players. While the game world and the available actions therein are inspired by their offline counterparts, the games' popularity and dedicated fan base are testaments to the allure of novel social interactions granted to people by allowing them an alternative life as a new character and persona. In this paper we investigate the phenomenon of "gender swapping," which refers to players choosing avatars of genders opposite to their natural ones. We report the behavioral patterns observed in players of Fairyland Online, a globally serviced MMORPG, during social interactions when playing as in-game avatars of their own real gender or gender-swapped. We also discuss the effect of gender role and self-image in virtual social situations and the potential of our study for improving MMORPG quality and detecting online identity frauds.

...read moreread less

37 citations

Journal Article•DOI•

Analysis of revisitations in online games

[...]

Ruck Thawonmas¹, Keisuke Yoshida¹, Jing-Kai Lou², Kuan-Ta Chen²•Institutions (2)

Ritsumeikan University¹, Academia Sinica²

01 Jan 2011-Entertainment Computing

TL;DR: This paper analyzes revisitations in online games focusing on two types of revisitations: game revisitations and area revisitations, and discovers four main groups of area revisitation patterns.

...read moreread less

15 citations

Proceedings Article•DOI•

A Collusion-Resistant Automation Scheme for Social Moderation Systems

[...]

Jing-Kai Lou¹, Kuan-Ta Chen², Chin-Laung Lei¹•Institutions (2)

National Taiwan University¹, Academia Sinica²

11 Jan 2009

TL;DR: A collusion-resistant automation scheme for social moderation systems that detects whether an accusation from a user moderator is fair or malicious based on the structure of mutual accusations of all users in the system.

...read moreread less

Abstract: For current Web 2.0 services, manual examination of user uploaded content is normally required to ensure its legitimacy and appropriateness, which is a substantial burden to service providers. To reduce labor costs and the delays caused by content censoring, social moderation has been proposed as a front-line mechanism, whereby user moderators are encouraged to examine content before system moderation is required. Given the immerse amount of new content added to the Web each day, there is a need for automation schemes to facilitate rear system moderation. This kind of mechanism is expected to automatically summarize reports from user moderators and ban misbehaving users or remove inappropriate content whenever possible. However, the accuracy of such schemes may be reduced by collusion attacks, where some work together to mislead the automatic summarization in order to obtain shared benefits. In this paper, we propose a collusion-resistant automation scheme for social moderation systems. Because some user moderators may collude and dishonestly claim that a user misbehaves, our scheme detects whether an accusation from a user moderator is fair or malicious based on the structure of mutual accusations of all users in the system. Through simulations we show that collusion attacks are likely to succeed if an intuitive count-based automation scheme is used. The proposed scheme, which is based on the community structure of the user accusation graph, achieves a decent performance in most scenarios.

...read moreread less

12 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

Proceedings Article•

LightGBM: a highly efficient gradient boosting decision tree

[...]

Guolin Ke¹, Qi Meng², Thomas Finley¹, Taifeng Wang¹, Wei Chen¹, Weidong Ma¹, Qiwei Ye¹, Tie-Yan Liu¹ - Show less +4 more•Institutions (2)

Microsoft¹, Peking University²

04 Dec 2017

TL;DR: It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.

...read moreread less

Abstract: Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

...read moreread less

4,977 citations

Social Network Analysis

[...]

Tom A. B. Snijders

01 Jan 2012

3,692 citations

Journal Article•DOI•

Review: Educational data mining: A survey and a data mining-based analysis of recent works

[...]

Alejandro Peña-Ayala¹•Institutions (1)

Instituto Politécnico Nacional¹

01 Mar 2014-Expert Systems With Applications

TL;DR: This review pursues a twofold goal, to preserve and enhance the chronicles of recent educational data mining (EDM) advances development, and provides an analysis of the EDM strengths, weakness, opportunities, and threats, whose factors represent, in a sense, future work to be fulfilled.

...read moreread less

Abstract: This review pursues a twofold goal, the first is to preserve and enhance the chronicles of recent educational data mining (EDM) advances development; the second is to organize, analyze, and discuss the content of the review based on the outcomes produced by a data mining (DM) approach. Thus, as result of the selection and analysis of 240 EDM works, an EDM work profile was compiled to describe 222 EDM approaches and 18 tools. A profile of the EDM works was organized as a raw data base, which was transformed into an ad-hoc data base suitable to be mined. As result of the execution of statistical and clustering processes, a set of educational functionalities was found, a realistic pattern of EDM approaches was discovered, and two patterns of value-instances to depict EDM approaches based on descriptive and predictive models were identified. One key finding is: most of the EDM approaches are ground on a basic set composed by three kinds of educational systems, disciplines, tasks, methods, and algorithms each. The review concludes with a snapshot of the surveyed EDM works, and provides an analysis of the EDM strengths, weakness, opportunities, and threats, whose factors represent, in a sense, future work to be fulfilled.

...read moreread less

414 citations

Proceedings Article•

Ernest: efficient performance prediction for large-scale advanced analytics

[...]

Shivaram Venkataraman¹, Zongheng Yang¹, Michael J. Franklin¹, Benjamin Recht¹, Ion Stoica¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

16 Mar 2016

TL;DR: Ernest, a performance prediction framework for large scale analytics, and evaluation on Amazon EC2 using several workloads shows that the prediction error is low while having a training overhead of less than 5% for long-running jobs.

...read moreread less

Abstract: Recent workload trends indicate rapid growth in the deployment of machine learning, genomics and scientific workloads on cloud computing infrastructure. However, efficiently running these applications on shared infrastructure is challenging and we find that choosing the right hardware configuration can significantly improve performance and cost. The key to address the above challenge is having the ability to predict performance of applications under various resource configurations so that we can automatically choose the optimal configuration. Our insight is that a number of jobs have predictable structure in terms of computation and communication. Thus we can build performance models based on the behavior of the job on small samples of data and then predict its performance on larger datasets and cluster sizes. To minimize the time and resources spent in building a model, we use optimal experiment design, a statistical technique that allows us to collect as few training points as required. We have built Ernest, a performance prediction framework for large scale analytics and our evaluation on Amazon EC2 using several workloads shows that our prediction error is low while having a training overhead of less than 5% for long-running jobs.

...read moreread less

401 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

Collapse