Paris, Île-de-France, FranceIntern
Just Hack it!
With 53 millions of tracks and a presence in 180 countries,
Deezer is the most personal music streaming service in the world.
Behind the code and the pixels is our team of 500 music lovers, and we’re building something incredible together. Want in? If you’re looking for an adventure, not just a job, and you fancy seeing ideas come to life in a heartbeat, you’re in the right place.
We dare to challenge the status quo and believe innovation is part of our DNA.
Speech style transfer is the task of transforming a snippet of speech to another speaker’s voice while conserving its textual content [6,7]. In the context of singing voice, style transfer consists in replacing the voice of the singer by the voice of another singer: the textual content and the main melody should remain untouched while every singer vocal related characteristics (timbre, singing style) should be transformed from the original singer to the target singer. While speech style transfer was already largely studied, literature about singing voice style transfer remains quite scarce: the tasks was addressed with a capellas [1,2], and from polyphonic mixture [3,4], with mixed results and quite poor naturalness so far.
Leveraging Deezer’s large catalog of music, and source separation state-of-the-art tools , the intern will implement singing voice style transfer models and try to improve naturalness of the sound using disentangled representations from audio waveforms. If applicable, the submission of a scientific article to a conference will be encouraged.
The intern will be supervised by research scientists and research engineers from the Deezer R&D team who provide practical and scientific help with the performed task. The intern will nonetheless be encouraged to propose solutions and work autonomously. For data experiments, Deezer will ensure cutting edge technology and appropriate calculus power.
 Juheon Lee, Hyeong-Seok Choi, Junghyun Koo, Kyogu Lee, Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System (https://juheo.github.io/DTS/)
 Singing Expression Transfer from One Voice to Another for a Given Song, Sangeon Yong, Juhan Nam ICASSP 2018
 Cheng-Wei Wu, Jen-Yu Liu, Yi-Hsuan Yang, Jyh-Shing R. Jang "Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks" (https://arxiv.org/abs/1807.02254, http://mirlab.org/users/haley.wu/cybegan/)
 Rema Daher, Mohammad Kassem Zein, Julia El Zini, Mariette Awad, Daniel Asmar "Change your singer: a transfer learning generative adversarial framework for song to song conversion" (https://arxiv.org/abs/1911.02933)
 Romain Hennequin, Anis Khlif, Felix Voituret and Manuel Moussallam "Spleeter: A Fast And State-of-the Art Music Source Separation Tool With Pre-trained Models", Late-Breaking/Demo ISMIR 2019
 Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu “Neural Discrete Representation Learning”, NIPS 2017
 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, and Nobukatsu Hojo “CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion” ICASSP 2019
Master or PhD student with a background in computer science / mathematics / statistics.
Strong knowledge in audio signal processing and applied machine learning.
Good programming skills for data processing and experimentation
Prior experience with deep learning frameworks such as Tensorflow or PyTorch
Creativity and autonomy
Life @ Deezer Paris