Deep learning turns mono recordings into immersive sound

Deep learning turns mono recordings into immersive sound

Listen to a bird singing in a nearby tree, and you can relatively quickly identify its approximate location without looking. Listen to the roar of a car engine as you cross the road, and you can usually tell immediately whether it is behind you.

The human ability to locate a sound in three-dimensional space is extraordinary. The phenomenon is well understood—it is the result of the asymmetric shape of our ears and the distance between them.

But while researchers have learned how to create 3D images that easily fool our visual systems, nobody has found a satisfactory way to create synthetic 3D sounds that convincingly fool our aural systems.

Today, that looks set to change at least in part, thanks to the work of Ruohan Gao at the University of Texas at and Kristen Grauman at Facebook Research. They have used a trick that humans also exploit to teach an AI system to convert ordinary mono sounds into pretty good 3D sound. The researchers call it 2.5D sound.

First some background. The brain uses a variety of clues to work out where a sound is coming from in 3D space. One important clue is the difference between a sound’s arrival times at each ear—the interaural time difference.

A sound produced on your left will obviously arrive at your left ear before the right. And although you are not conscious of this difference, the brain uses it to determine where the sound has come from.

Another clue is the difference in volume. This same sound will be louder in the left ear than in the right, and the brain uses this information as well to make its reckoning. This is called the interaural level difference.

These differences depend on the distance between the ears. Stereo recordings do not reproduce this effect, because the separation of stereo microphones does not match it.

The way sound interacts with ear flaps is also important. The flaps distort the sound in ways that  depend on the direction it arrives from. For example, a sound from the front reaches the ear canal before hitting the ear flap. By contrast, the same sound coming from behind the head is distorted by the ear flap before it reaches the ear canal.

The brain can sense these differences too. In fact, the asymmetric shape of the ear is the reason we can tell when a sound is coming from above, for example, or from many other directions.

The trick to reproducing 3D sound artificially is to reproduce the effect that all this geometry has on sound. And that’s a tough problem.

One way to measure the distortion is with binaural recording. This is a recording made by placing a microphone inside each ear, which can pick up these tiny variations.

By analyzing the variations, researchers can then reproduce them using a mathematical algorithm known as a head-related transfer function. That turns any ordinary pair of headphones into extraordinary 3D sound machines.

But because everybody’s ears are different, everybody hears sound in a different way.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Using Artificial Intelligence To Fix Healthcare

28 Nov, 2018

The healthcare industry should be using Artificial Intelligence (AI) to a far greater degree than at present, but progress has …

Read more

What the Future of Fintech Looks Like

3 Jun, 2019

E-payment has become an indispensable and specialized tool for people across the world. How will it evolve in the next …

Read more

Insight platforms as a service: What they are and why they matter

16 Sep, 2017

An integrated set of data management, analytics, and insight application development and management components, offered as a platform the enterprise …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.