The data is both plentiful and noisy. We have billions of ratings by users about their musical interests. One one hand the the large amount of data means we can build robust models. On the other hand, the data does come from people, with all their idiosyncratic behavior and opinions. This wealth of personal data---we have to assume it is all correct---sometimes means what we think it means, and other times represents personal behaviors unrelated to anybody else's opinion. Separating out the signal from the noise is the new frontier for web sciences.
I'll illustrate my talk with several kinds of technologies we find interesting, drawing from successes we have had from all types of multimedia. These approaches impact recommendations, tagging, and search. Our approaches draw heavily from the world of machine learning, often taking novel directions because of the size of our datasets. The frontiers of web science are wonderful.
Emmanouil
Benetos