scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Efficient Tag Recommendation for Real-Life Data

TL;DR: A hybrid tag recommendation system together with a scalable, highly efficient system architecture that is able to utilize user feedback to tune its parameters to specific characteristics of the underlying tagging system and adapt the recommendation models to newly added content.
Abstract: Despite all of the advantages of tags as an easy and flexible information management approach, tagging is a cumbersome task. A set of descriptive tags has to be manually entered by users whenever they post a resource. This process can be simplified by the use of tag recommendation systems. Their objective is to suggest potentially useful tags to the user. We present a hybrid tag recommendation system together with a scalable, highly efficient system architecture. The system is able to utilize user feedback to tune its parameters to specific characteristics of the underlying tagging system and adapt the recommendation models to newly added content. The evaluation of the system on six real-life datasets demonstrated the system’s ability to combine tags from various sources (e.g., resource content or tags previously used by the user) to achieve the best quality of recommended tags. It also confirmed the importance of parameter tuning and content adaptation. A series of additional experiments allowed us to better understand the characteristics of the system and tagging datasets and to determine the potential areas for further system development.
Citations
More filters
Journal ArticleDOI
TL;DR: The libFM as mentioned in this paper tool is a software implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least-squares (ALS) optimization, as well as Bayesian inference using Markov Chain Monto Carlo (MCMC).
Abstract: Factorization approaches provide high accuracy in several important prediction problems, for example, recommender systems. However, applying factorization approaches to a new prediction problem is a nontrivial task and requires a lot of expert knowledge. Typically, a new model is developed, a learning algorithm is derived, and the approach has to be implemented.Factorization machines (FM) are a generic approach since they can mimic most factorization models just by feature engineering. This way, factorization machines combine the generality of feature engineering with the superiority of factorization models in estimating interactions between categorical variables of large domain. libFM is a software implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least-squares (ALS) optimization, as well as Bayesian inference using Markov Chain Monto Carlo (MCMC). This article summarizes the recent research on factorization machines both in terms of modeling and learning, provides extensions for the ALS and MCMC algorithms, and describes the software tool libFM.

1,271 citations

Journal ArticleDOI
01 Apr 2017
TL;DR: This article proposes a taxonomy for tag recommendation methods, classifying them according to the target of the recommendations, their objectives, exploited data sources, and underlying techniques, and provides a critical overview of these methods.
Abstract: Tags keywords freely assigned by users to describe web content have become highly popular on Web 2.0 applications, because of the strong stimuli and easiness for users to create and describe their own content. This increase in tag popularity has led to a vast literature on tag recommendation methods. These methods aim at assisting users in the tagging process, possibly increasing the quality of the generated tags and, consequently, improving the quality of the information retrieval IR services that rely on tags as data sources. Regardless of the numerous and diversified previous studies on tag recommendation, to our knowledge, no previous work has summarized and organized them into a single survey article. In this article, we propose a taxonomy for tag recommendation methods, classifying them according to the target of the recommendations, their objectives, exploited data sources, and underlying techniques. Moreover, we provide a critical overview of these methods, pointing out their advantages and disadvantages. Finally, we describe the main open challenges related to the field, such as tag ambiguity, cold start, and evaluation issues.

62 citations


Cites background from "Efficient Tag Recommendation for Re..."

  • ...Lipczak, Hu, Kollet, and Milios (2009) and Lipczak and Milios (2011) proposed a hybrid method that extracts terms from the title and description of the target object (a contentbased technique) and then expand the set of candidate tags by exploiting tag co-occurrences....

    [...]

  • ...1Some references (e.g., Lipczak & Milios, 2011; Rendle & SchmidtThie, 2010) use the term resource instead of object....

    [...]

  • ...…recommendation (Bel em et al., 2014, 2016; Cao et al., 2009; Gemmell, Schimoler, Mobasher, & Burke, 2010; Heymann et al., 2008; Lin et al., 2012; Lipczak & Milios, 2011; Lipczak et al., 2009; Lu et al., 2009; Martins et al., 2013, 2015; Menezes et al., 2010; Rendle, Balby Marinho, Nanopoulos, &…...

    [...]

Journal ArticleDOI
TL;DR: A review of the tagging recommendation systems and the constraints that affects the available tag recommendation systems is presented, and the use of spreading activation algorithm to study the role of constructed topic ontology for efficient tag recommendations is proposed.
Abstract: The advent of high-speed Internet connections has revolutionized the way research is being carried out to obtain relevant information. Conversely, retrieving pertinent information from the copious resources available is not only difficult but also time consuming. In the recent years, tagging activity has been perceived as a potential source of knowledge on personal preferences, interests, targets, goals, and other attributes. Tags allow users to effectively annotate resources using keywords to personalize their recommendations and organize the resources for easy retrieval. However, the preference of users varies extremely resulting in tagging being counterproductive. These shortcomings reduce the application of the tagging system for filtering as well as retrieval of information. The tag recommendation system becomes useful by suggesting a set of relevant keywords to annotate the resources. This paper presents a review of the tag recommendation systems and the constraints that affects the available tag recommendation systems. Furthermore, we propose the use of spreading activation algorithm to study the role of constructed topic ontology for efficient tag recommendations. This approach is founded on the assumption that tags that are recommended to the user are predicted from the extracted keywords from the existing blogs and the topics in constructed topic ontology. We have also proposed a tag classification system, namely Correlation-based Feature Selection-Hybrid Genetic Algorithm and classifier HGA-SVM support vector machine, and have compared the results with results produced by other existing feature selection methods. The results obtained from the experiments have been presented. WIREs Data Mining Knowl Discov 2015, 5:87-112. doi: 10.1002/widm.1149

53 citations

Journal ArticleDOI
TL;DR: This work proposes new heuristic methods that extend state-of-the-art strategies by including new metrics that estimate how accurately a candidate tag describes the target object and finds that the best personalized method outperforms the best object-centered strategy, with average gains in precision of 10%.
Abstract: Several Web 2.0 applications allow users to assign keywords (or tags) to provide better organization and description of the shared content. Tag recommendation methods may assist users in this task, improving the quality of the available information and, thus, the effectiveness of various tag-based information retrieval services, such as searching, content recommendation and classification. This work addresses the tag recommendation problem from two perspectives. The first perspective, centered at the object, aims at suggesting relevant tags to a target object, jointly exploiting the following three dimensions: (i) tag co-occurrences, (ii) terms extracted from multiple textual features (e.g., title, description), and (iii) various metrics to estimate tag relevance. The second perspective, centered at both object and user, aims at performing personalized tag recommendation to a target object-user pair, exploiting, in addition to the three aforementioned dimensions, a metric that captures user interests. In particular, we propose new heuristic methods that extend state-of-the-art strategies by including new metrics that estimate how accurately a candidate tag describes the target object. We also exploit three learning-to-rank (L2R) based techniques, namely, RankSVM, Genetic Programming (GP) and Random Forest (RF), for generating ranking functions that exploit multiple metrics as attributes to estimate the relevance of a tag to a given object or object-user pair. We evaluate the proposed methods using data from four popular Web 2.0 applications, namely, Bibsonomy, LastFM, YouTube and YahooVideo. Our new heuristics for object-centered tag recommendation provide improvements in precision over the best state-of-the-art alternative of 12% on average (up to 20% in any single dataset), while our new heuristics for personalized tag recommendation produce average gains in precision of 121% over the baseline. Similar performance gains are also achieved in terms of other metrics, notably recall, Normalized Discounted Cumulative Gain (NDCG) and Mean-Reciprocal Rank (MRR). Further improvements, for both object-centered (up to 23% in precision) and personalized tag recommendation (up to 13% in precision), can also be achieved with our new L2R-based strategies, which are flexible and can be easily extended to exploit other aspects of the tag recommendation problem. Finally, we also quantify the benefits of personalized tag recommendation to provide better descriptions of the target object when compared to object-centered recommendation by focusing only on the relevance of the suggested tags to the object. We find that our best personalized method outperforms the best object-centered strategy, with average gains in precision of 10%.

43 citations


Cites background from "Efficient Tag Recommendation for Re..."

  • ...For both LATRE and LATRE + DP (as well as for the L2R-based strategies), we set ‘ 1⁄4 3, as in Menezes et al. (2010). Parameters rmin and hmin directly impact the number of association rules generated, thus affecting the processing time of the recommender....

    [...]

BookDOI
05 Dec 2012
TL;DR: This comprehensive text/reference examines in depth the synergy between multimedia content analysis, personalization, and next-generation networking and demonstrates how this integration can result in robust, personalized services that provide users with an improved multimedia-centric quality of experience.
Abstract: This comprehensive text/reference examines in depth the synergy between multimedia content analysis, personalization, and next-generation networking. The book demonstrates how this integration can result in robust, personalized services that provide users with an improved multimedia-centric quality of experience. Each chapter offers a practical step-by-step walkthrough for a variety of concepts, components and technologies relating to the development of applications and services. Topics and features: introduces the fundamentals of social media retrieval, presenting the most important areas of research in this domain; examines the important topic of multimedia tagging in social environments, including geo-tagging; discusses issues of personalization and privacy in social media; reviews advances in encoding, compression and network architectures for the exchange of social media information; describes a range of applications related to social media.

33 citations

References
More filters
Journal ArticleDOI
TL;DR: A dynamic model of collaborative tagging is presented that predicts regularities in user activity, tag frequencies, kinds of tags used, bursts of popularity in bookmarking and a remarkable stability in the relative proportions of tags within a given URL.
Abstract: Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. Recently, collaborative tagging has grown in popularity on the web, on sites that allow users to tag bookmarks, photographs and other content. In this paper we analyze the structure of collaborative tagging systems as well as their dynamic aspects. Specifically, we discovered regularities in user activity, tag frequencies, kinds of tags used, bursts of popularity in bookmarking and a remarkable stability in the relative proportions of tags within a given URL. We also present a dynamic model of collaborative tagging that predicts these stable patterns and relates them to imitation and shared knowledge.

1,965 citations

Proceedings ArticleDOI
04 Feb 2010
TL;DR: The factorization model PITF (Pairwise Interaction Tensor Factorization) is presented which is a special case of the TD model with linear runtime both for learning and prediction and shows that this model outperforms TD largely in runtime and even can achieve better prediction quality.
Abstract: Tagging plays an important role in many recent websites. Recommender systems can help to suggest a user the tags he might want to use for tagging a specific item. Factorization models based on the Tucker Decomposition (TD) model have been shown to provide high quality tag recommendations outperforming other approaches like PageRank, FolkRank, collaborative filtering, etc. The problem with TD models is the cubic core tensor resulting in a cubic runtime in the factorization dimension for prediction and learning.In this paper, we present the factorization model PITF (Pairwise Interaction Tensor Factorization) which is a special case of the TD model with linear runtime both for learning and prediction. PITF explicitly models the pairwise interactions between users, items and tags. The model is learned with an adaption of the Bayesian personalized ranking (BPR) criterion which originally has been introduced for item recommendation. Empirically, we show on real world datasets that this model outperforms TD largely in runtime and even can achieve better prediction quality. Besides our lab experiments, PITF has also won the ECML/PKDD Discovery Challenge 2009 for graph-based tag recommendation.

705 citations

Book ChapterDOI
17 Sep 2007
TL;DR: In this paper, the authors evaluate and compare two recommendation algorithms on large-scale real-life datasets: an adaptation of user-based collaborative filtering and a graph-based recommender built on top of FolkRank.
Abstract: Collaborative tagging systems allow users to assign keywords--so called "tags"--to resources. Tags are used for navigation, finding resources and serendipitous browsing and thus provide an immediate benefit for users. These systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource, but also consolidating the tag vocabulary across users. In practice, however, only very basic recommendation strategies are applied. In this paper we evaluate and compare two recommendation algorithms on large-scale real life datasets: an adaptation of user-based collaborative filtering and a graph-based recommender built on top of FolkRank. We show that both provide better results than non-personalized baseline methods. Especially the graph-based recommender outperforms existing methods considerably.

564 citations

Proceedings ArticleDOI
23 Oct 2009
TL;DR: This paper introduces an approach based on Latent Dirichlet Allocation (LDA) for recommending tags of resources in order to improve search and shows that the approach achieves significantly better precision and recall than the use of association rules.
Abstract: Tagging systems have become major infrastructures on the Web. They allow users to create tags that annotate and categorize content and share them with other users, very helpful in particular for searching multimedia content. However, as tagging is not constrained by a controlled vocabulary and annotation guidelines, tags tend to be noisy and sparse. Especially new resources annotated by only a few users have often rather idiosyncratic tags that do not reflect a common perspective useful for search. In this paper we introduce an approach based on Latent Dirichlet Allocation (LDA) for recommending tags of resources in order to improve search. Resources annotated by many users and thus equipped with a fairly stable and complete tag set are used to elicit latent topics to which new resources with only a few tags are mapped. Based on this, other tags belonging to a topic can be recommended for the new resource. Our evaluation shows that the approach achieves significantly better precision and recall than the use of association rules, suggested in previous work, and also recommends more specific tags. Moreover, extending resources with these recommended tags significantly improves search for new resources.

500 citations

Proceedings ArticleDOI
28 Jun 2009
TL;DR: This paper proposes a method for tag recommendation based on tensor factorization (TF) and provides a gradient descent algorithm to solve the optimization problem and demonstrates that this method outperforms other state-of-the-art tag recommendation methods like FolkRank, PageRank and HOSVD both in quality and prediction runtime.
Abstract: Tag recommendation is the task of predicting a personalized list of tags for a user given an item. This is important for many websites with tagging capabilities like last.fm or delicious. In this paper, we propose a method for tag recommendation based on tensor factorization (TF). In contrast to other TF methods like higher order singular value decomposition (HOSVD), our method RTF ('ranking with tensor factorization') directly optimizes the factorization model for the best personalized ranking. RTF handles missing values and learns from pairwise ranking constraints. Our optimization criterion for TF is motivated by a detailed analysis of the problem and of interpretation schemes for the observed data in tagging systems. In all, RTF directly optimizes for the actual problem using a correct interpretation of the data. We provide a gradient descent algorithm to solve our optimization problem. We also provide an improved learning and prediction method with runtime complexity analysis for RTF. The prediction runtime of RTF is independent of the number of observations and only depends on the factorization dimensions. Besides the theoretical analysis, we empirically show that our method outperforms other state-of-the-art tag recommendation methods like FolkRank, PageRank and HOSVD both in quality and prediction runtime.

399 citations