Apr
3
Another Visualization of the Netflix Prize Dataset
April 3, 2007 |
Here’s a recent visualization I did of the dataset used in the Netflix Prize Competition. The dataset is 17,700 movies and 31 gigs of user ratings. This viz shows similar movies close to one another, with the similarities determined by a formula based on ratings.
I found most interesting a cluster of movies (in blue) that I’d say are generally acclaimed. The cluster contains movies of across all genres, such as Schindler’s List, BraveHeart, and Super Size Me. Beyond that, there’s a bunch of clusters which are mostly defined by a genre such as music, sports, documentary, Imax, children’s films, or bonus material. The big blob in the center is mostly what I’d call junk movies.
I’ve labeled some movies just to give some sense of what the clusters contain. There’s an interactive version of the viz as well, so you can explore the movies for yourself…
Comments
7 Comments so far
You said that “this viz shows similar movies close to one another, with the similarities determined by a formula based on ratings.”
Can you share the details of how you determined the (x,y) position of each movie in the plot? (perhaps also on the Visualization section of the Netflix prize forums?)
The similarities were computed using the measure found in Sarwar, et al:
http://www.ra.ethz.ch/CDstore/www10/papers/pdf/p519.pdf
The ordination was done using the VxOrd algorithm (best-in-show for cluster visualizations)…
http://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/clusterstab.pdf
Cheers,
Todd
Hi Todd.
These visualizations are wonderful. I tried looking for Napolean Dynamite in both the static and interactive visuals. I couldn’t find it. This movie is apparently particularly polarizing. Any idea where in this data cloud, it might reside?
Thanks.
Satindra.
Hey Todd,
if you dont mind sharing I’d like to know more about the function you used to transport the graphs to 2D. What’s your D-function (the density thing)? I am asking because I was experimenting on the same way but my movies never seemed to cluster in any way (just a big bunch)
Pat
[…] Another Visualization of the Netflix Prize Dataset […]
[…] ABeautifulWWW.com - have a look at this page “Another Visualization of the Netflix Prize Dataset” and look around the site. You will be […]
Where is i can get 17700 titles of movies from imdb, please give me the links. thank you.