Audio is the most important aspect of your Zoom Room!
Without audio, the meeting cannot happen. In this article, we will discuss many of the concepts and features that are most important when thinking about audio performance in a Zoom Room design.
Before we dive into the technicality of audio process from end to end, please review our article on Acoustics & Audio Concepts. Once the room sounds good, we can proceed into how we capture that experience.
Think about everything that happens from the word being spoken to that word being heard by a participant on the far end. The following are all factors along the way:
This article covers:
Audio is vibration that travels through air that can be perceived when it reaches an ear. In a video conference, a few extra steps are added to that explanation. We represent our ears as a microphone that hears this audio. The Zoom Room takes that audio, processes it if needed, and transmits it over the internet. The audio is turned back into audio waves by speakers so that you can perceive that audio. All of the steps along the way play a role in the perception when it hits the human ear.
Digital Signal Processors or DSP are audio processors that are software based and may have associated hardware which optimizes audio for different applications. There are two methods for processing audio within a Zoom Room. There are two approaches to audio processing in a Zoom Room:
If the input and output device are the same, such as a Logitech Rally System, Logitech Meetup, Aver VB342, Polycom Trio, or rack-mounted DSP, that device will handle all of the audio logic that is needed to have an optimal audio experience. Since Zoom is not handling the DSP in this instance, Zoom Room noise suppression should be disabled.
It is important to note that certain devices have been developed to automatically disable Zoom SAP upon selection. If any adjustment is made after the initial setup, Zoom SAP may be automatically enabled, which would not be desired in this situation.
For external DSP designs, please reference:
Here is an overview of Zoom Rooms echo cancellation:
There is no need for an external device to do this processing for you if you need to use a mixer or other microphone source that is not integrated with a speaker output. Zoom will do all of the optimization based on adaptive processing to learn the room and optimize the audio. Zoom can hear multiple independent channels of audio in certain applications and apply processing to each channel of audio for an optimized experience. To enable the Zoom echo cancellation, on the Zoom Room controller, tap Settings, then Microphone, then tap the Echo Cancellation toggle.
This will be selected automatically whenever the input and output devices differ. In other words, if the mic cannot reference the speaker within the device itself, this can be enabled for echo cancellation and audio optimization.
There is another Zoom audio setting which will suppress some of the room noise and reverberation. Keep in mind that highly reverberant & noisy rooms will still sound reverberant, but this setting may make it more tolerable with some processing applied to mitigate the issue.
On the ZR Controller, tap Settings, then Microphone, then tap Noise Suppression:
Then select Auto, High or Off.
Note: If you select a different speaker such as internal computer speaker and go back to the other speaker that matches the microphone, that may trigger this setting to turn on when it is not wanted.
For Zoom DSP designs, please reference:
Here we will discuss some utilities to test your Zoom Rooms environment. It is always recommended to do a test call with at least a couple of peers to hear the space, check each microphone and validate performance.
Now that we know the inputs and outputs are working, verify that the software audio processing toggle is in the correct location.
Also see: Zoom Rooms Daily Audio Testing
Once passed, you are ready to set up your test call with peers to validate the room's performance. Based on feedback, you may need to check firmware, adjust DSP site files, adjust microphone placement, increase microphone counts etc.
For the Zoom Rooms application, there are four key components which we will elaborate on:
This is the reduction of steady noise such as HVAC or electrical hum. Steady noises are identified by the DSP and reduced at those frequencies that the DSP determines as recurring and inhibitive to the signal. With the steady noise attenuated, speech will be more intelligible as it will pass through the system without reduction.
Note: Noise Reduction will not reduce traffic noise, papers/typing, and most importantly, reverberation. A reverberant room will always sound reverberant both to the ears of the participants in the room as well as the microphones.
AEC is the removal of your voice that is heard by the microphone on the far end through the speaker on the far end. Here is a diagram that explains the concept for two endpoints:
If AEC is working properly, you will not hear your own voice back in the call. If it is not working, you will hear an echo of your voice that the microphone on that endpoint is picking up and sending back to you.
Note: The endpoint that does not hear the echo is where the issue exists.
Auto Gain Control or AGC is utilized to deliver the optimum volume to the system depending on the circumstances. The big variation here is people. Some people have loud voices and other people have soft voices. When either is the primary audio source, they will be adjusted up or down. This is something that is automatic within Zoom's DSP and may need to be enabled and configured if it is a feature of an external DSP.
Equalization or EQ is a means to eliminate unwanted frequencies and boost wanted frequencies. Human speech sits in a range from about 250 Hz up to about 6,000 Hz which sits within the range of human hearing which is about 20 Hz up to 20,000 Hz. This means anything between 20 and 250 Hz and 6,000 to 20,000 Hz will be heard if not eliminated and will never be part of the human speech we want to hear.
It is best practice to include a boost around 2,000 to 4,000 Hz to increase intelligibility as this range is the most sensitive to the human ear. Giving this frequency some extra attention will improve intelligibility.
Scooping is another technique that may improve a room based on a frequency that is being emitted in a space or an unwanted reverberation at a specific pitch. By scooping that frequency, performance may be improved in a space. Scooping low-mid frequencies may alleviate some of the resonance in the room.