For most multiplayer online games, such as online board games and MMORPG games, in-game voice chat is vital for great communication and teamwork among players. With in-game voice chat, players can easily have some chitchat or party up to win the game, taking the fun of the game to the next level.

No matter whether you plan to build an in-game chat feature for your game on your own or by using a third-party real-time voice SDK (e.g., Zego.im, Agora, etc.), the solution must be tailor-made to fit the purpose of having voice chat while playing games. Building in-game voice chat into a game involves many considerations, including audio quality, latency, system resource consumption, etc.  Choosing a proper audio codec is obviously a critical decision to make as it affects many of these aspects.

In order to build more innovative practices of "Language Chat + Scenes", some companies have launched standardized package SDKs for the language chat room scenes, such as zego, agora, etc. The platform can realize the core functions of the language chat room by simply coding.

So, let’s talk about how to choose an audio codec that is suitable for implementing in-game voice chat.

voice or music

First, let’s discuss the audio quality.

For in-game voice chat, the audio traffic is mostly human voice. In some cases, music may also need to be included. Talking about audio quality, let’s have a look at the human’s perception of sound. The human ear can nominally hear sounds in the range of 20 Hz to 20,000 Hz. Within this range, there are four sound frequency bands, defined as narrowband, wideband, super-wideband, and fullband.

Therefore, the narrowband sound quality can meet the real-time voice communication requirements of games. Considering that the combination of real-time voice and live broadcast of the game has produced some new gameplay methods, such as playing with the host, or live game broadcasting, the sound quality requirements are relatively high. The sound quality of wideband can meet the needs of games and live broadcast scenes. Here, the bandwidth of the game voice is more determined according to the budget cost of the game operator, because the bit rate is directly related to the bandwidth, and the bit rate is ultimately the cost.

Choosing the Right Audio Codec

The audio codec has an important influence on the real-time voice solution of the game. The type, attributes and quality of the audio encoder determine the bit rate, algorithm delay, bandwidth, and sound quality of the compiled audio stream; the algorithm complexity of the audio encoder determines the consumption of CPU, memory, and power.

Therefore, the audio codec suitable for the real-time voice solution of the game has the following four characteristics:

1) The bit rate is relatively low, meeting the requirements of controllable cost, generally not exceeding 16kbps. A sample can be compiled with 1 bit, then 8kHz sampling rate (narrowband) corresponds to a code rate of 8kbps, and 16kHz sampling rate (wideband) corresponds to a code rate of 16kbps. The essence of bit rate is cost.

2) The delay time should be low enough to meet the interactive needs, generally not more than 300 milliseconds.

3) The algorithm complexity should be relatively low, the system CPU, memory and power consumption should be low, and the impact on the game system should be as low as possible.

The following figure lists a set of mainstream audio codecs, showing how the sound quality changes as the bit rate changes. This is drawn based on the results of the codec listening test, which is of reference significance for selecting audio codecs. According to the above analysis and referring to the figure below, it is found that low-bit-rate speech codecs with a bit rate lower than 16kbps include: Opus (SILK), Speex, AMR-NB, AMR-WB, and iLBC.

The following figure is another set of mainstream audio codecs, showing the corresponding changes in the algorithm delay time as the bit rate changes. According to the above analysis and referring to the figure below, it is found that the algorithm delay time is less than 60 milliseconds, and the speech codecs with a bit rate of less than 16kbps include: Opus (SILK), Speex (NB, WB), G.729 , And G.729.1.

Therefore, the real-time voice solution for games is to match the game application scenarios and technical methods. Only by thoroughly understanding the requirements of game application scenarios can we be able to figure out how to choose voice codecs, how to deploy media server resources, how to configure CDN networks, etc., to polish a set of real-time voice solutions that meet the requirements of game application scenarios.

There are many solutions that can use the language chat room in the game voice scene, such as casual chess and card games. The next section analyzes how to select the appropriate voice codec for the game scene mentioned.

ZegoDoc
ZEGO Developer Center