Music Transcription with Transformers

Music Transcription with Transformers

Automatic Music Transcription (AMT) is the task of extracting symbolic representations of music from raw audio. AMT is valuable in that it not only helps with understanding, but also enables new forms of creation via training powerful language models (such as Music Transformer) and building interactive applications (such as piano Genie and Magenta Studio) that rely on symbolic representations of music.

Notes are a powerful and intuitive such representation, motivating our effort to dramatically improve AMT in the past several years. We focused initially on piano transcription with the Onsets and Frames model by Hawthorne et al. and the MAESTRO piano dataset. In 2020, we expanded the set of instruments we’re able to transcribe by adapting Onsets and Frames to drum transcription. However, adding new instruments one at a time is tedious; furthermore these architectures are specifically designed for percussive instruments with well-defined note onsets and less suitable for other instruments.

Recently, we’ve been exploring how to make general-purpose music transcription systems — systems that don’t need to be redesigned by hand for each new instrument or task. In this blog post, we highlight some of our recent advances toward more general music transcription systems.

In short, the main things we’ve discovered recently are:

We discuss each of these below.

Most work in music transcription over the years has focused on transcribing piano recordings. As we mentioned above, many researchers have hand-designed neural network architectures based on the specifics of how piano notes sound. A great example of this is the Onsets and Frames architecture, which has dedicated output “heads” for the piano note onset, the velocity of the note (how hard it is struck), and the continued presence of the note (i.e. “frames”). Some research goes even further, modeling piano pedal events or the ADSR curves of piano notes.

However, as piano transcription architectures became more task-specific, researchers in other areas of machine learning were using generalized architectures to solve multiple tasks; in particular, the Transformer architecture has shown remarkable performance on a diverse set of tasks within both language and vision. We wondered: is it possible to use Transformers for piano transcription?

To answer this question, we modeled piano transcription as a sequence-to-sequence task, using the Transformer architecture from T5 (“T5-small” to be specific). We used spectrogram frames as our input sequence, and output a sequence of tokens from a MIDI-like vocabulary to represent note onsets, velocities, and offsets.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

AI in healthcare: The tech is here, the users are not

26 May, 2021

Since the beginning of the year, there has been a significant uptick across health plans, healthcare providers, and analytics firms …

Read more

Machine Learning Sentiment Analysis Could Help in Wildlife Conservation

27 Mar, 2020

Analysis of nearly 30 years of species reintroduction studies using off-the-shelf machine learning, natural language processing and sentiment analysis will …

Read more

Getting Real World Results From Agile Data Science Teams

15 Feb, 2017

In this post, I’ll look at the practical ingredients of managing agile data science. By using agile data science methods, …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.