Netflix and the Amazing Experiment in the Crowd Sourcing of Analytics

xime.bangalore.3 · Feb 5, 2013

Netflix and the Amazing Experiment in Crowd Sourcing of Analytics[/b]

By Dr. Regi Mathew[/b]

Internet has grown as a platform for bringing experts together. There have been many attempts to pool collective knowledge and wisdom for achieving a common objective with or without rewards. One of such successful initiative is Kaggle.com whose stated objective is to make ‘data science a sport’. This site publishes business problems and data provided by businesses along with evaluation criteria and reward. The platform currently runs about 10 competitions with highest prize money of $ 3 Million by Heritage Provider Network. There has been many attempts to use competition to generate interest in Analytics even before internet era. Most well-known being the KDD cup conducted KDnuggets. However, we should thank Netflix for using competition to address a live business problem using real data.

Background: Netflix is an American corporation engaged in movie rental business. Their mode of delivery is by post and hence very different from their competitors who relied on shops. The subscribers visit their website and book DVDs and these will arrive by post. After watching, the DVD is mailed back enclosed in the return cover. Next DVD from the waiting list will be mailed as soon as they receive it.

The website Kaggle.com provides the subscribers the option of rating the movies they have watched. In fact you can rate even if you did not rent a particular movie from Netflix. This rating will be used to recommend movies to you. This algorithm is named Cinematch. Netflix has got metrics to assess how good their recommendation engine is (predicted versus actual rating). The competition that was opened to public is to improve this recommendation engine by 10%. The team that could achieve this can walk away with the prize of $1 Million ( yes, one million dollars).

For three years, computer science teams from all over the world competed to get this prize. Within 7 days of starting the competition, some of the teams surpassed Cinematch’s performance. Since then, it has been a fight to get to the magical number of 10% improvement. AT many points in this journey, some experts commented that 10% improvement may not be possible. They felt that whatever information that can be squeezed is taken out and there is not much point in trying more now.

Winners: On June 26, 2009 the tem “Bellkor’s Pragmatic Chaos”, achieved a 10.05% improvement over Cinematch ( an RMSE of 0.8558). With this, the competition entered the “last call” period for the Prize. In accordance with the Rules, all teams had 30 days, until July 26, 2009, to make submissions that will be considered for the Prize. On July 25, 2009 the team “The Ensemble”, a merger of two teams “Grand Prize Team” and “Opera solutions and Vandelay United”, achieved a 10.09% improvement over Cinematch (an RMSE of 0.8554). On July 26, 2009, Netflix has stopped accepting new submissions. Final standing of the Leaderboard show that two teams meet the minimum requirements for the Grand Prize.

“The Ensemble” with a 10.10% improvement (RMSE of 0.5883), and “Bellkor’s Pragmatic Chaos” with a 10.09% improvement. The Grand Prize winner was the one with the better performance on the Test set. According to the Netflix Prize forum, team “Bellkor’s Pragmatic Chaos” was the leading contender for the prize with a better Test set RMSE.

Data: What do the teams work on? Netflix provided a training data set of over 100 million ratings that over 480,000 users gave to nearly 18,000 movies. Each training rating is a quadruplet (user, movie, date of grade, grade). The user and movie fields are integer Ids, while grades are from 1 to 5 stars.

The qualifying data set contains over 2.8 million triplets (user, movie, date of grade), with grades known only to the jury. A participating team’s algorithm must predict grades on the entire qualifying set, but they are only informed of the score for half of the data, the quiz set. The other half is the test set, and performance on this is used by the hury to determine potential prize winners. This arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades i terms of Root Mean Squared Error (RMSE), and the goal is to reduce this error as much as possible.

Prizes are based on improvement over the Cinematch algorithm. Cinematch, uses “straightforward statistical linear models with a lot of data conditioning”. Using only the training data, Cinematch scores an RMSE of 0.9514 on the quiz data. In order to win the grand prize of $1 Million, a participating team must improve this by another 10%, to achieve 0.8572 on the test set.

Learnings: Now that the competition is over, there is an introspection oon what this has achieved given the huge amount of man hours spent by teams around the world.

First and foremost, this is a wonderful example of how people across the globe can collaborate and work together. It was not the success of a single idea or algorithm which won the prize. Instead, the formula for success was to bring together people with complimentary skills and combine different methods of problem-solving.

Secondly, this is an example of power of crowd sourcing. Crowd sourcing is a distributed problem solving model. Problems are broadcasted to an unknown group of solvers in the form of an open call for solutions. Solvers, also known as the crowd typically forms into online communities, and the crowd submits solutions. Sometimes, the crowd also sorts through the solutions, finding the best ones. These best solutions are then owned by the entity that broadcast the problem in the first place. The winning individuals in the crowd are sometimes rewarded, like in the Netflix case. In other cases, the only rewards may be kudos or intellectual satisfaction. Crowd sourcing may produce solutions from amateurs or volunteers working in their spare time, or from experts or small businesses which were unknown to the initiating organization.

Thirdly, this also had few benefits to academics and business. A few research paper was published based on this contest. The enhanced understanding of prediction using collaborative filtering approach will find application in many web based applications.