scispace - formally typeset
Search or ask a question
Author

Ke Ji

Other affiliations: Beijing Jiaotong University
Bio: Ke Ji is an academic researcher from University of Jinan. The author has contributed to research in topics: Collaborative filtering & Recommender system. The author has an hindex of 8, co-authored 29 publications receiving 197 citations. Previous affiliations of Ke Ji include Beijing Jiaotong University.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel method for alleviating cold start problem for new users and new items by incorporating content-based information about users and items, i.e., tags and keywords, which outperforms other state-of-the-art CF algorithms for historical data, but also has good scalability for new data.
Abstract: Cold start problem for new users and new items is a major challenge facing most collaborative filtering systems. Existing methods to collaborative filtering (CF) emphasize to scale well up to large and sparse dataset, lacking of scalable approach to dealing with new data. In this paper, we consider a novel method for alleviating the problem by incorporating content-based information about users and items, i.e., tags and keywords. The user-item ratings imply the relevance of users' tags to items' keywords, so we convert the direct prediction on the user-item rating matrix into the indirect prediction on the tag-keyword relation matrix that adopts to the emergence of new data. We first propose a novel neighborhood approach for building the tag-keyword relation matrix based on the statistics of tag-keyword pairs in the ratings. Then, with the relation matrix, we propose a 3-factor matrix factorization model over the rating matrix, for learning every user's interest vector for selected tags and every item's correlation vector for extracted keywords. Finally, we integrate the relation matrix with the two kinds of vectors to make recommendations. Experiments on real dataset demonstrate that our method not only outperforms other state-of-the-art CF algorithms for historical data, but also has good scalability for new data.

45 citations

Journal ArticleDOI
TL;DR: A malware detection method that uses the URLs visited by apps to identify malware that can not only effectively detect malware discovered in different months of a certain year, but also detect potentially malicious apps in the third-party app market.

38 citations

Journal ArticleDOI
TL;DR: A reconstructive method that compresses low-rank approximation into a cluster-level rating-pattern referred to as a codebook, and then constructs an improved approximation by expending the codebook improves the prediction accuracy of the state-of theart matrix factorization and social recommendation models.

37 citations

Proceedings ArticleDOI
04 Jun 2018
TL;DR: A method that uses the URLs visited by applications to identify malicious apps and can not only effectively detect malware discovered in different months of a certain year, but also detect potentially malicious apps in the third-party app market.
Abstract: In recent years, the scale and diversity of malicious software on mobile networks are constantly increasing, thereby causing considerable danger to users' property and personal privacy. In this study, we devise a method that uses the URLs visited by applications to identify malicious apps. A multi-view neural network is used to create a malware detection model that emphasizes depth and width. This neural network can create multiple views of the input automatically and distribute soft attention weights to focus on different features of input. Multiple views preserve rich semantic information from input for classification without requiring complicated feature engineering. In addition, we conduct comprehensive experiments to compare the proposed method with others and verify the validity of the detection model. The experimental results show that our method has a certain timeliness. It can not only effectively detect malware discovered in different months of a certain year, but also detect potentially malicious apps in the third-party app market. We also compare the detection results of the proposed method on wild apps with 10 popular anti-virus scanners, and the final result shows that our approach ranks second in terms of detection performance.

23 citations

Journal ArticleDOI
TL;DR: A Topic-based probabilistic model named GISTis proposed to infer group activities, and make group recommendations, which shows that the recommendation accuracy is significantly improved by GIST comparing with the state-of-the-art methods.
Abstract: In this paper, a Topic-based probabilistic model named GISTis proposed to infer group activities, and make group recommendations. Compared with existing individual-based aggregation methods, it not only considers individual members’ interest, but also consider some subgroups’ interest. Intuition might seem that when a group of users want to take part in an activity, not every group member is decisive, instead, more likely the subgroups of members having close relationships lead to the final activity decision. That motivates our study on jointly considering individual members’ choices and subgroups’ choices for group recommendations. Based on this, our model uses two kinds of unshared topics to model individual members’ interest and subgroups’ interest separately, and then make final recommendations according to the choices from the two aspects with a weight-based scheme. Moreover, the link information in the graph topology of the groups can be used to optimize the weights of our model. The experimental results on real-life data show that the recommendation accuracy is significantly improved by GIST comparing with the state-of-the-art methods.

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The authors found that people are much more likely to believe stories that favor their preferred candidate, especially if they have ideologically segregated social media networks, and that the average American adult saw on the order of one or perhaps several fake news stories in the months around the 2016 U.S. presidential election, with just over half of those who recalled seeing them believing them.
Abstract: Following the 2016 U.S. presidential election, many have expressed concern about the effects of false stories (“fake news”), circulated largely through social media. We discuss the economics of fake news and present new data on its consumption prior to the election. Drawing on web browsing data, archives of fact-checking websites, and results from a new online survey, we find: (i) social media was an important but not dominant source of election news, with 14 percent of Americans calling social media their “most important” source; (ii) of the known false news stories that appeared in the three months before the election, those favoring Trump were shared a total of 30 million times on Facebook, while those favoring Clinton were shared 8 million times; (iii) the average American adult saw on the order of one or perhaps several fake news stories in the months around the election, with just over half of those who recalled seeing them believing them; and (iv) people are much more likely to believe stories that favor their preferred candidate, especially if they have ideologically segregated social media networks.

3,959 citations

Journal ArticleDOI
TL;DR: This survey takes an interdisciplinary approach to cover studies related to CatBoost in a single work, and provides researchers an in-depth understanding to help clarify proper application of Cat boost in solving problems.
Abstract: Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

247 citations

Journal ArticleDOI
TL;DR: The recent hybrid CF-based recommendation techniques fusing social networks to solve data sparsity and high dimensionality are introduced and provide a novel point of view to improve the performance of RS, thereby presenting a useful resource in the state-of-the-art research result for future researchers.
Abstract: In the era of big data, recommender system (RS) has become an effective information filtering tool that alleviates information overload for Web users. Collaborative filtering (CF), as one of the most successful recommendation techniques, has been widely studied by various research institutions and industries and has been applied in practice. CF makes recommendations for the current active user using lots of users’ historical rating information without analyzing the content of the information resource. However, in recent years, data sparsity and high dimensionality brought by big data have negatively affected the efficiency of the traditional CF-based recommendation approaches. In CF, the context information, such as time information and trust relationships among the friends, is introduced into RS to construct a training model to further improve the recommendation accuracy and user’s satisfaction, and therefore, a variety of hybrid CF-based recommendation algorithms have emerged. In this paper, we mainly review and summarize the traditional CF-based approaches and techniques used in RS and study some recent hybrid CF-based recommendation approaches and techniques, including the latest hybrid memory-based and model-based CF recommendation algorithms. Finally, we discuss the potential impact that may improve the RS and future direction. In this paper, we aim at introducing the recent hybrid CF-based recommendation techniques fusing social networks to solve data sparsity and high dimensionality and provide a novel point of view to improve the performance of RS, thereby presenting a useful resource in the state-of-the-art research result for future researchers.

177 citations

Journal ArticleDOI
TL;DR: A framework that characterizes context-aware recommendation processes in terms of the recommendation techniques used at every stage of the process and the techniques used to incorporate context is characterized, providing a clear understanding about the integration of context into recommender systems.
Abstract: Context-aware recommender systems leverage the value of recommendations by exploiting context information that affects user preferences and situations, with the goal of recommending items that are really relevant to changing user needs. Despite the importance of context-awareness in the recommender systems realm, researchers and practitioners lack guides that help them understand the state of the art and how to exploit context information to smarten up recommender systems. This paper presents the results of a comprehensive systematic literature review we conducted to survey context-aware recommenders and their mechanisms to exploit context information. The main contribution of this paper is a framework that characterizes context-aware recommendation processes in terms of: i) the recommendation techniques used at every stage of the process, ii) the techniques used to incorporate context, and iii) the stages of the process where context is integrated into the system. This systematic literature review provides a clear understanding about the integration of context into recommender systems, including context types more frequently used in the different application domains and validation mechanismsexplained in terms of the used datasets, properties, metrics, and evaluation protocols. The paper concludes with a set of research opportunities in this field.

152 citations

Journal ArticleDOI
TL;DR: This survey aims to address the challenges in DL-based Android malware detection and classification by systematically reviewing the latest progress, including FCN, CNN, RNN, DBN, AE, and hybrid models, and organize the literature according to the DL architecture.
Abstract: Deep Learning (DL) is a disruptive technology that has changed the landscape of cyber security research. Deep learning models have many advantages over traditional Machine Learning (ML) models, particularly when there is a large amount of data available. Android malware detection or classification qualifies as a big data problem because of the fast booming number of Android malware, the obfuscation of Android malware, and the potential protection of huge values of data assets stored on the Android devices. It seems a natural choice to apply DL on Android malware detection. However, there exist challenges for researchers and practitioners, such as choice of DL architecture, feature extraction and processing, performance evaluation, and even gathering adequate data of high quality. In this survey, we aim to address the challenges by systematically reviewing the latest progress in DL-based Android malware detection and classification. We organize the literature according to the DL architecture, including FCN, CNN, RNN, DBN, AE, and hybrid models. The goal is to reveal the research frontier, with the focus on representing code semantics for Android malware detection. We also discuss the challenges in this emerging field and provide our view of future research opportunities and directions.

151 citations