Google released Omnitone, what has changed for VR audio production?

Lei Fengwang press: Zhao Longxi, author of this article, is an advanced senior investment manager. He graduated from Peking University with a bachelor's degree in chemistry. Focus on VR/AR, culture and entertainment, business services and other directions.

Recently, Google announced a new open source project called Omnitone. The project aims to achieve 3D panoramic sound effects for ordinary headsets in a VR environment, allowing users to achieve better VR immersion.

VR technology is gradually affecting our lives


CJ is over. Let's talk about VR again.

VR can actually be understood as a process in which mankind pursues more natural human-computer interaction. From the beginning you can only go through the programming language and computer Say Hi's 286 era (imagining the robot friend of that era), to the first generation of Win95 visual operating windows, and now we are wearing helmets to enter a 360-degree virtual space . The interaction between people and machines has become more and more natural, and Virtual World is more and more like our Reality. This is Virtual Reality.

In terms of images, we take a helmet, turn up and down, we can see different images. This is the natural human-computer interaction. The sound is the same. If we turn around, the footsteps behind us will suddenly increase or decrease. Listening to the ear, the buzzing sound next door can be heard more clearly. Then the real virtual world may not be far away from us.

So today, "Let's talk about the voice in VR."

Voice in VR

Simply introduce to everyone, the first 3D sound field has several solutions. You may think that it's actually very simple. Make a virtual space, any sound source in the space, calibrate with 3D coordinates, then import the coordinate data in the VR helmet, and then output the corresponding sound, and you get a perfect " Natural sound field.

This idea is right, the premise is: Your audio source number is enough, the operation speed is fast enough. Known State of the Art Dolby Cinema program can support up to 128 sound sources ... but the environment on mobile phones and PCs can't be compared to movie theaters. (And it is impossible to approach the real world sound field with this scheme. You can understand that the reason is that the number of sound fields will rise sharply when approaching the real world, just because the number of atoms in the crystal is 10^23 level. In order to characterize the crystal structure, it is best to use the same two wave-functions as Quad Binaural and Ambisonic.

Quad Binaural

Quad Binaural is to represent a sound field in four directions of 0, 90, 180, and 270 degrees.

For example, if recording a sound field at a point, I record the sounds in the four directions (actually front, back, left, and right) in each direction, two channels in each direction, and finally eight channels. Then when I head toward any direction that is not exactly right, backwards and forwards, say 45 degrees, we use a four-direction sound to make a weighting and get a new direction sound.

The problem with this method is that the sound does not change when you look up or down. But the benefit is very simple, decoding is very easy, for example, we naturally think of the sound at 45 degrees is half the 0 degree + half of the 90 (although the actual situation will be more complicated), and compared to the now commonly used first-order Ambisonic (FOA, First Order Ambisonic), which is more sensitive to the horizontal direction.


Ambisonic starts from spherical harmonics and (n+1)^2 channels to characterize the sound field.

For example, the Omnitone released by Google is the first-order Ambisonic, so there are (1+1)^2=4 channels, as shown in the figure below, w, x, y, z. w can be understood as a background sound, and x, y, z are sounds from three directions of the Cartesian coordinate system, respectively.

The more complex the higher order ambisonic function decoding

At present, the top-level foreign countries do the 5th-order Ambisonic function, 36 components (because high-order spherical harmonics are difficult to use in the physical direction, so use Components instead of the direction), collect them with 32 microphones, and then approximate them through combination and matrix transformation. These 36 Components. The benefit of Ambisonic is that with z (vertical) this direction, heads or heads will be different in the VR world, and as the computing power you provide improves, you can get better with higher-order Ambisonic functions. Effect. The downside is that decoding will be more complicated.

To give a simple example, if your up, down, left and right directions correspond to the six singers ABCDEF respectively, the four buzzers of Quad Binaural will be filled with CDEF sounds, each containing part of the A, B After the sound field has been recorded, if you turn your head straight from the right, slightly twisting 30 degrees, then you will dominate from the initial C to hear more D sounds, but whether you look up or bow, A, B's voice will not change.

The first-order Ambisonic mixes the sounds of C and E in X, Y mixes the sounds of D and F, Z mixes the sounds of A and B, and W is the background sound. As your head rotates, it will change the weight of XYZ. . But wisely, you must also find that first-order Ambisonic (FOA) mixes the sounds (C, D) from the opposite direction at the same time into X, so when you turn your head, the listener's direction is at some angles. The sensitivity is not as good as that of Quad Binaural.

Considering: (1) Singers sing in your crotch is relatively rare, so in the case of A only, vertical FOA is far better than Quad Binaural, (2) If you are concerned about concert videos that have been produced abroad, It has been found that in order to improve the sense of orientation of the sound, the producer often places the sound source in the front or right direction, which is a problem that both the FOA and the QB cannot solve. However, with the increase in computing power, high-level Ambisonic (HOA) can overcome this problem, which is why Google chose FOA.

Google Omnitone

Going back to Google, it published on the official blog the technical details of its Omnitone project on its web VR audio system. The Omnitone project is a cross-browser supported open source spatial audio renderer that mainly supports the FOA (First Order Ambisonic) format that is currently used in the industry. This is the main panoramic sound format recommended by the YouTube App.

Omnitone audio processing diagram

As can be seen from the above figure, the Ambisonic decoder of Google's Omnitone system adopts the mainstream algorithm flow in the industry. According to the orientation information given by the sensor, a rotation operator is used to realize the sound field rotation, and then the two-channel output is used.

Given the dominance of the Google Chrome browser in the market, we can be optimistic that the FOA-based panoramic sound will have a relatively high rate of penetration at the web end, which is a culprit for VR content creators and even for the entire VR industry. Big favors.

Of course, we can still see many problems while we are excited.

Problems that exist

One of the questions is where the FOA sound files come from. VR content makers use a lot of Tetra Mic (CoreSound production), which is used by Jaunt, a well-known VR company, and it costs a lot more money than a microphone. The microphone itself costs $1,000. It also needs extra lines and dedicated recording equipment. Other brands of sound field microphones (such as the SoundField series) are more expensive. These recording devices are very inconvenient to carry and require professional training to use, which restricts the widespread adoption of FOA audio.

Tetra Mic

At present there are companies trying to solve this situation. Times Top Spirit introduced a civilian-grade portable sound field recorder that natively supports the FOA (4-way WXYZ) audio format. The output audio can be played directly by FOA-enabled software or devices. It is believed that with the reduction of the production threshold, FOA-based content will usher in an explosive growth.

Times Top Spirit Portable Sound Field Recorder

In addition, because both Quad Binaural and Ambisonic are just a train of thought. How to connect real-world audio files with the 3D audio finally played out? How to use this idea to create 3D sound fields? We still need a VR audio production engine. And playback system. (Just like Google released Hadoop papers, big data companies use this theory to guide them to develop their own algorithms.) This point, Toping also walk in the forefront, its sound field engine, not only supports cross-platform support Multiple Video and game engines support conversion between Quad Binaural and Ambisonic from the ground up, which is a shortcut to connect VR content creators and 3D sound fields.

This article was authorised by the author to be published by Lei Fengnet (search for “Lei Feng Net” public number) , and it is forbidden to reprint without permission!

DC Linear Actuator

Dc Linear Actuator,Linear Actuator,Linear Actuator 12V,Electric Linear Actuator

Changzhou Sherry International Trading Co., Ltd. ,