As nearly all our source audio is coming from the PSTN and it is at 8Khz, Nexmo upsamples it on arrival. However, some speech recognition models don’t work optimally with the upsampled audio.
Testing Google's narrowband model worked significantly better with Nexmo audio for some languages e.g. English was fine but Dutch and German both worked better at 8Khz.
It is possible to downconvert the Nexmo audio on a websocket back to 8Khz then feed that to Google, this is an example on how to do it in node.js.