# Dataset from Judging Similarity: A User-Centric Study of Related Item Recommendations

This dataset describes survey results about the similarity of movies from the MovieLens recommender system. These data are described in the research paper "Judging Similarity: A User-Centric Study of Related Item Recommendations", published in the ACM Conference on Recommender Systems (RecSys), 2018. The data were collected on [movielens.org](http://movielens.org) between March 21 and April 16, 2018.

This material is based on work supported by grants from Amazon and Google.

This readme was written by Max Harper on August 1, 2018.


## Citation

Yuan Yao and F. Maxwell Harper. 2018. Judging Similarity: A User-Centric Study of Related Item Recommendations. In Proceedings of RecSys ’18, Vancouver, Canada, October 2-7, 2018, 9 pages. <https://doi.org/10.1145/3240323.3240351>


## Contact Information

If you have questions, contact Max Harper <max@umn.edu> or <grouplens-info@umn.edu>


## License

This work is licensed under a [Creative Commons Attribution 3.0 United States License](https://creativecommons.org/licenses/by/3.0/us/).


# Description of CSV Files

General notes:

* Movie identifiers are consistent with those used in the MovieLens datasets: <https://grouplens.org/datasets/movielens/>
* User identifiers have been obfuscated to protect users' privacy.
* Both movie and user identifiers are consistent across this dataset.


## test-set.csv

This file describes the test set of 100 "seed" movie IDs.

* movieId -- MovieLens movie identifier of the seed movie
* title -- Title and release year of the seed movie


## neighbors.csv

This file describes the 1000 "neighbors" generated by each of the six experimental algorithms.

* algorithm -- Algorithm identifier
* movieId -- MovieLens movie identifier of the seed movie
* neighborId -- MovieLens movie identifier of the neighbor movie
* rank -- Similarity rank (1--10) where 1 is the most similar, 2 is the 2nd most similar, etc.
* title -- Title and release year of the neighbor movie


## pair-responses.csv

This file describes users' similarity and recommendation quality judgments.

* userId -- MovieLens user identifier
* movieId -- MovieLens movie identifier of the seed movie
* neighborId -- MovieLens movie identifier of the neighbor movie
* sim -- "In your opinion, how similar are these two movies?", 0=="not at all similar", 1=="slightly similar", 2=="somewhat similar", 3=="moderately similar", 4=="extremely similar"
* goodRec -- "How likely would you be to recommend (neighbor movie) to someone who likes (seed movie)?", 0=="extremely unlikely", 2=="neutral", 4=="extremely likely"


## survey-responses.csv

This file describes users' responses to the survey about movie similarity and related item recommendations.

The following questions correspond to multiple columns in the CSV:

1. FACTORS_IMPORTANCE -- "Rate the importance of the following factors in helping you find a similar movie to watch next.", 0==not important, 4==very important

2. RANK_REC_FEATURES_IMPORTANCE -- "How important are the following MovieLens recommendation features to you?", 1=="most important", 2=="2nd most important (if any)", 3=="3rd most important (if any)"

3. RANK_LABELS -- "Which of the following labels best describes what you're looking for in the similar movies section?", 1=="best", 2=="2nd best (if any)", 3=="3rd best (if any)"

Columns:

* userId -- MovieLens user identifier
* howImportantSimRecs -- "After watching a movie that you enjoyed, how often do you seek out a similar movie to watch next?", 0=="never", 1=="rarely", 2=="sometimes", 3=="often", 4=="always"
* simFactorGenre -- FACTORS_IMPORTANCE "genre"
* simFactorPlot -- FACTORS_IMPORTANCE "plot"
* simFactorSetting -- FACTORS_IMPORTANCE "setting"
* simFactorTheme -- FACTORS_IMPORTANCE "theme"
* simFactorMood -- FACTORS_IMPORTANCE "mood"
* simFactorDialogue -- FACTORS_IMPORTANCE "dialogue"
* simFactorMpaa -- FACTORS_IMPORTANCE "content rating (e.g., MPAA rating)"
* simFactorLanguage -- FACTORS_IMPORTANCE "spoken language"
* simFactorCast -- FACTORS_IMPORTANCE "cast"
* simFactorCrew -- FACTORS_IMPORTANCE "crew"
* simFactorPopularity -- FACTORS_IMPORTANCE "popularity/obscurity"
* simFactorCriticalReception -- FACTORS_IMPORTANCE "critical reception"
* simFactorAwards -- FACTORS_IMPORTANCE "awards"
* simFactorAvgRating -- FACTORS_IMPORTANCE "average user rating"
* simFactorReleaseDate -- FACTORS_IMPORTANCE "release date"
* whichFactorsOther -- "other factors that you consider", free text
* rankFeatureOverall -- RANK_REC_FEATURES_IMPORTANCE "top picks"
* rankFeaturePerGenre -- RANK_REC_FEATURES_IMPORTANCE "recommendations by genre"
* rankFeatureRecent -- RANK_REC_FEATURES_IMPORTANCE "recent releases"
* rankFeatureSimilar -- RANK_REC_FEATURES_IMPORTANCE "recommendations by movie (similar movies)"
* similarMoviesRecPrefs -- "When showing similar movies, MovieLens should ...", most-similar=="display the most similar movies", prioritize-similar=="prioritize similar movies, as long as they somewhat fit my tastes", prioritize-recs-weak=="prioritize movies that fit my tastes, as long as they are somewhat similar", prioritize-recs-strong=="prioritize movies that fit my tastes, even if they are not very similar"
* showAlreadyRated -- "When showing similar movies, MovieLens should display movies that I have already rated.", 0=="no", 1=="yes"
* showDiverse -- "MovieLens should ensure that the similar movies section contains a variety of types of movies.", -2=="strongly disagree", 0=="neutral", 2=="strongly agree"
* rankLabelsSimilar -- RANK_LABELS "similar movies"
* rankLabelsRelated -- RANK_LABELS "related movies"
* rankLabelsAlsoRec -- RANK_LABELS "also recommended"
* rankLabelsPeopleAlsoLiked -- RANK_LABELS "people also liked"
* rankLabelsRecForPeople -- RANK_LABELS "recommended for people who like ___"
* rankLabelsPeopleWhoLikeX -- RANK_LABELS "people who like ___ also like"
* otherFeedback -- "If you have any other feedback about similar movie recommendations in MovieLens, please leave it here:", free text.