netflixAllMoviesSmall Here’s a recent visualization I did of the dataset used in the Netflix Prize Competition. The dataset is 17,700 movies and 31 gigs of user ratings. This viz shows similar movies close to one another, with the similarities determined by a formula based on ratings.

I found most interesting a cluster of movies (in blue) that I’d say are generally acclaimed. The cluster contains movies of across all genres, such as Schindler’s List, BraveHeart, and Super Size Me. Beyond that, there’s a bunch of clusters which are mostly defined by a genre such as music, sports, documentary, Imax, children’s films, or bonus material. The big blob in the center is mostly what I’d call junk movies.

I’ve labeled some movies just to give some sense of what the clusters contain. There’s an interactive version of the viz as well, so you can explore the movies for yourself…


Comments

7 Comments so far

  1. chef-ele on April 26, 2007 8:24 pm

    You said that “this viz shows similar movies close to one another, with the similarities determined by a formula based on ratings.”

    Can you share the details of how you determined the (x,y) position of each movie in the plot? (perhaps also on the Visualization section of the Netflix prize forums?)

  2. Geek on April 26, 2007 9:30 pm

    The similarities were computed using the measure found in Sarwar, et al:

    http://www.ra.ethz.ch/CDstore/www10/papers/pdf/p519.pdf

    The ordination was done using the VxOrd algorithm (best-in-show for cluster visualizations)…

    http://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/clusterstab.pdf

    Cheers,
    Todd

  3. Satindra Chakravorty on May 1, 2007 9:20 am

    Hi Todd.

    These visualizations are wonderful. I tried looking for Napolean Dynamite in both the static and interactive visuals. I couldn’t find it. This movie is apparently particularly polarizing. Any idea where in this data cloud, it might reside?

    Thanks.
    Satindra.

  4. Pat on May 8, 2007 6:02 pm

    Hey Todd,

    if you dont mind sharing I’d like to know more about the function you used to transport the graphs to 2D. What’s your D-function (the density thing)? I am asking because I was experimenting on the same way but my movies never seemed to cluster in any way (just a big bunch)

    Pat

  5. Visualizing the ‘Power Struggle’ in Wikipedia | A Beautiful WWW on May 20, 2007 5:31 pm

    […] Another Visualization of the Netflix Prize Dataset […]

  6. » Scalability, Similarity, Geo-Semantics and Visualizations [ Data Sciences Analytics ] on August 20, 2007 3:11 am

    […] ABeautifulWWW.com - have a look at this page “Another Visualization of the Netflix Prize Dataset” and look around the site. You will be […]

  7. yunia_r on November 20, 2007 9:42 pm

    Where is i can get 17700 titles of movies from imdb, please give me the links. thank you.

Name (required)

Email (required)

Website

Speak your mind









Admin