Characterization of public datasets for Recommender Systems
read more
Citations
Hybrid recommender systems: A systematic literature review
Hybrid Recommender Systems: A Systematic Literature Review
Social Media Recommender Systems: Review and Open Research Issues
A survey of recommender systems for energy efficiency in buildings: Principles, challenges and prospects
MoodyLyrics: A Sentiment Annotated Lyrics Dataset
References
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
Improving recommendation lists through topic diversification
Eigentaste: A Constant Time Collaborative Filtering Algorithm
The million song dataset
Related Papers (5)
Frequently Asked Questions (10)
Q2. What are the main purposes of the million song dataset?
The main purposes ofthe dataset are: encouraging research on algorithms that scale to commercial sizes providing a reference dataset for evaluating algorithms by using the audio features being a shortcut alternative for creating a large dataset with the Echo Nest's API helping new researchers get started in the MIR field, develop music recommendationsand study music similarity.
Q3. How many users have used tags in the Today Module?
The dataset includes annotations from users which have less than 1k tags and have used at least 10 different tags in 5 different websites.
Q4. What is the purpose of the dataset?
This dataset originates from theAPOSDLE EU project which is an adaptive work-integrated learning system aiming to improveknowledge worker productivity by supportinglearning situations within everyday work tasks.
Q5. What is the purpose of the dataTEL challenge?
Theme Team of the STELLARNetwork of Excellence lunched the the first dataTELChallange [15] which is a call to research groups tosubmit datasets from Technology Enhanced Learningapplications.
Q6. How many resources have been accessed by registered users?
They hold together about 47k tags, 12k classification terms and many other actions performed by the users such as viewing and downloading.
Q7. What is the version of the dataset that contains the jokes?
There are three versions of it:Dataset1 contains more than 4.1M continuous ratings (-10.00 to +10.00) of 100 jokes from 73421 users collected between April 1999 to May 2003.
Q8. What was the purpose of the dataTEL challenge?
It was actually used at[18] to provide data about library readership, librarystart and article tags and experiment with user-basedand item-based collaborative filtering algorithms forTEL.
Q9. What search engines are used to find the datasets?
The authors searched in SpringerLink, ScienceDirect,IEEExplore and ACM using keywords like"Evaluating Recommender Systems", "PublicDatasets for Recommender Systems" etc.
Q10. What are the main reasons why the million song dataset is not available?
For a long time Music Information Retrieval (MIR) research has suffered the lack of publically available and large-scale open data for personalized music recommendations, mainly because of the privacy and intellectual property concerns.