Sample rates for Nexmo's call a websocket feature

For the call a websocket feature currently Nexmo supports LPCM 16bit 16Khz sample rate.

As nearly all our source audio is coming from the PSTN and it is at 8Khz, Nexmo upsamples it on arrival. However, some speech recognition models don’t work optimally with the upsampled audio.

For example, IBM has a narrowband model which still takes 16Khz audio but works well with our source. Google's narrowband model requires an 8Khz source.

Testing Google's narrowband model worked significantly better with Nexmo audio for some languages e.g. English was fine but Dutch and German both worked better at 8Khz.

It is possible to downconvert the Nexmo audio on a websocket back to 8Khz then feed that to Google, this is an example on how to do it in node.js.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request