Research - Singing voice style transfer (Internship)

Paris, IDF, FranceIntern

Company Description

We are music and tech fans hailing from all over the globe, working to make Deezer the most personal music streaming service. From data scientists to tech experts, artists & labels specialists to marketers, and even in-house music editors, our team is spreading the love for music to over 180 countries. Supporting local and international artists and bringing them closer to their fans is our mission - we believe music is about diversity, multiculturalism and togetherness. Ready to join the team? We're all ears.

Job Description

How about you:

Speech style transfer is the task of transforming a snippet of speech to another speaker’s voice while conserving its textual content [6,7]. In the context of singing voice, style transfer consists in replacing the voice of the singer by the voice of another singer: the textual content and the main melody should remain untouched while every singer vocal related characteristics (timbre, singing style) should be transformed from the original singer to the target singer. While speech style transfer was already largely studied, literature about singing voice style transfer remains quite scarce: the tasks was addressed with a capellas [1,2], and from polyphonic mixture [3,4], with mixed results and quite poor naturalness so far.

Leveraging Deezer’s large catalog of music, and source separation state-of-the-art tools [5], the intern will implement singing voice style transfer models and try to improve naturalness of the sound using disentangled representations from audio waveforms. If applicable, the submission of a scientific article to a conference will be encouraged.

 

The intern will be supervised by research scientists and research engineers from the Deezer Research team who provide practical and scientific help with the performed task. The intern will nonetheless be encouraged to propose solutions and work autonomously. For data experiments, Deezer will ensure cutting edge technology and appropriate calculus power.

 

References

[1] Juheon Lee, Hyeong-Seok Choi, Junghyun Koo, Kyogu Lee, Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System (https://juheo.github.io/DTS/)

[2] Singing Expression Transfer from One Voice to Another for a Given Song, Sangeon Yong, Juhan Nam ICASSP 2018

[3] Cheng-Wei Wu, Jen-Yu Liu, Yi-Hsuan Yang, Jyh-Shing R. Jang "Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks" (https://arxiv.org/abs/1807.02254, http://mirlab.org/users/haley.wu/cybegan/)

[4] Rema Daher, Mohammad Kassem Zein, Julia El Zini, Mariette Awad, Daniel Asmar  "Change your singer: a transfer learning generative adversarial framework for song to song conversion" (https://arxiv.org/abs/1911.02933)

[5] Romain Hennequin, Anis Khlif, Felix Voituret and Manuel Moussallam "Spleeter: A Fast And State-of-the Art Music Source Separation Tool With Pre-trained Models", Late-Breaking/Demo ISMIR 2019

[6] Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu “Neural Discrete Representation Learning”, NIPS 2017

[7] Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, and Nobukatsu Hojo “CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion” ICASSP 2019

Qualifications

What we are looking for:

  • Master or PhD student with a background in computer science / mathematics / statistics.
  • Strong knowledge in audio signal processing and applied machine learning.
  • Good programming skills for data processing and experimentation
  • Prior experience with deep learning frameworks such as Tensorflow or PyTorch
  • Creativity and autonomy

Additional Information

Life @ Deezer HQ:

> Start-up environment with an at home vibe and outdoor space
> Kitchen stocked with free drinks and snacks daily
> Friday drinks & seasonal parties
> Gym access, plus yoga, pilates and boxing classes
> English and French language courses
> Hackathons & meetups

We are an equal opportunity employer