Paris, Île-de-France, FranceIntern
Just Hack it!
With 53 millions of tracks and a presence in 180 countries,
Deezer is the most personal music streaming service in the world.
Behind the code and the pixels is our team of 500 music lovers, and we’re building something incredible together. Want in? If you’re looking for an adventure, not just a job, and you fancy seeing ideas come to life in a heartbeat, you’re in the right place.
We dare to challenge the status quo and believe innovation is part of our DNA.
Deezer is a leading company in the music streaming industry, with one of the largest catalogs on the market (over 56 million titles available) and with over 14 million active users spread in more than 180 countries. In this industry, music recommendation is a key component to retain and attract users. Suggesting personalized relevant artists, albums or playlists helps users actively explore the vast and mostly unknown audio landscape. It is also central to all enjoyable passive experience relying on generated and personalized content.
In this direction, performing artist clustering, i.e. identifying homogeneous groups of artists, appears as an effective way to select new content to recommend. If a user tends to listen to some artists from a specific cluster, he/she might be interested in discovering tracks of other artists from this same cluster. A promising approach to learn such clustering consists in leveraging graph-based representations of artists, constructed from streaming usage on Deezer. In such graphs, artists nodes are connected together if they are jointly listened by the same regular users (“collaborative filtering”). Then, clusters of artists can be computed with scalable community detection algorithms [1, 4, 6].
However, as the number of artists in the catalog constantly increases, and as listening usage evolves over time, this clustering needs to be updated on a regular (e.g. weekly) basis. The simplest way to proceed would be to re-learn artist clusters from scratch at every computation on updated listening data, i.e. on every new graph. Unfortunately, such simple approach leads to instabilities and inconsistencies over time. The goal of this internship will be to study and to propose more advanced strategies, aiming at improving the quality and robustness of artist clustering at Deezer
The intern will notably consider the integration of user-level feedback from past clusters when computing new ones, aiming at stabilizing “good” clusters over time while dynamically updating “bad” clusters. For instance, the quality of an artist cluster could be assessed from the number of liked, disliked or skipped recommended tracks, or by considering qualitative feedback from music experts such as Deezer editors. At Deezer, at this time such user-level feedback is only used in post-processing steps (e.g. to avoid recommending an artist “banned” by a user) but not in the modeling itself.
A complementary approach, that the intern will also study, would be to leverage metadata information on artists, such as their genre or country, to insure homogeneity of artists from the same cluster. Metadata information are also crucial when dealing with cold-start situations, i.e. new artists with few listening usage information. Artist genre/country/mood homogeneity can be captured either by modifying the graph structure, or by designing a new clustering algorithm involving a penalty term on intra-cluster heterogeneity.
After exploring the literature on the topic, including recent advances on recommender systems [2, 3, 7, 8] and on graph representation learning [5, 8, 9], and after getting familiar with Deezer data and stack, the intern will have to develop and test machine learning models to tackle the aforementioned challenges. In this direction, the intern will be supervised by a Research Scientist from the R&D team and by a Data Scientist from the Recommendation team. They will provide material and theoretical help, and access to cutting edge technology and appropriate calculus power. The intern will be encouraged to propose solutions and to work autonomously. The internship could lead to a scientific publication, and/or to deployments in production.
 Blondel, V. D., Guillaume, J. L., Lambiotte, R., and Lefebvre, E. (2008) Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
 Bobadilla, J., Ortega, F., Hernando, A., and Gutiérrez, A. (2013). Recommender systems survey. Knowledge-based systems, 46, 109-132.
 Dacrema, M. F., Cremonesi, P., and Jannach, D. (2019). Are we really making much progress? A worrying analysis of recent neural recommendation approaches. In Proceedings of the 13th ACM Conference on Recommender Systems (pp. 101-109).
 Fortunato, S. (2010). Community detection in graphs. Physics reports, 486(3-5), 75-174.
 Hamilton, W., Ying R., and Jure Leskovec. (2017) Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin.
 Khan, B. S., & Niazi, M. A. (2017). Network community detection: A review and visual survey. arXiv preprint arXiv:1708.00977.
 Lu, J., Wu, D., Mao, M., Wang, W., and Zhang, G. (2015). Recommender system application developments: a survey. Decision Support Systems, 74, 12-32.
 Wang, X., He X and Chua T., (2019) Learning and Reasoning on Graph for Recommendation, Tutorial at ACM International Conference on Information & Knowledge Management (CIKM). https://next-nus.github.io/
 Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2019). A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596.
Master student with a background in Computer Science, Machine Learning, Applied Mathematics or Statistics
Good knowledge of theoretical and applied Machine Learning
Good programming skills for data processing and experimentation
Knowledge of Recommender Systems and/or Machine Learning on Graphs is a plus
Curiosity, autonomy and motivation
Life @ Deezer Paris