Microphone use

This page provides a quick overview of how to access the microphones on Navel.

Devices

The hardware devices for all 7 microphones are not diectly accessible, as they wouldn’t be useful for most applications. Instead, they are routed through ODAS to enable sound-source tracking.

The tracked sound sources are exposed through a 4-channel int16 48kHz audio device named odas, and each separate channel is additionally routed to a 1-channel odas_n device for convenience. From now on, we will use the device names to refer to each channel/device. The first one, odas_1, is static, meaning it will always be available and doing beamforming in the same direction (directly in front of Navel’s face). The other three are dynamic, meaning they will track any new sounds that are perceived.

Speech recognition

Although you could use any device/channel for speech recognition, odas_1 is recommended for general use, due to it being static (i.e. always available). Because of this, it’s also set as the default input device on the system. For how to use this default device for speech recognition, have a look at the code in the included chat.py example.

Sound source tracking

As mentioned before, odas_2-4 give access to dynamic sound sources. The metadata associated with each sound source (location, activity level, etc.) can be accessed through the next_frame() method of the Robot class. This method returns a PerceptionData frame which includes information about all tracked sound sources as SstMeta objects in a list, with the index of each sound source in the list matching the device it’s associated with (e.g. the first one is always odas_1).

Only the direction of sounds can be calculated, not their exact position in 3d space, so the directions are mapped to the surface of a 2m-radius sphere before being added to the perception data for convenience. It’s important to note that therefore, the loc attribute of each SstMeta object represents the direction the sound was perceived from, and is not the the exact point where the sound source is.

The following example should print out all perceived sound locations, so you can get an idea for what this means:

import asyncio

import navel


async def main():
    print("Listening forever, press Ctrl+C to stop...")
    async with navel.Robot() as robot:
        while True:
            perc = await robot.next_frame()
            for channel, metadata in enumerate(perc.sst_tracks_latest):

                if metadata.activity > 0.2:
                    print(
                        f"Heard a sound on channel {channel + 1} at {metadata.loc}"
                    )


if __name__ == "__main__":
    asyncio.run(main())

Recording

Eventually, you may want to record sound directly from a specific channel, e.g. to save it to a file or perform speech recognition locally/with a different framework. In that case, we recommend using the PyAudio library which exposes a simple API for audio streams:

import asyncio
import wave

import pyaudio

SAMPLE_RATE = 48000
BYTES_PER_SAMPLE = 2


async def main():
    device = "odas_1"
    rec_len = 5
    path = "output.wav"

    p = pyaudio.PyAudio()
    buffer = b""

    print(f"Recording {rec_len} seconds of audio from {device}")
    stream = p.open(
        format=pyaudio.paInt16,
        channels=1,
        input_device_index=get_audio_device_index(p, device),
        rate=SAMPLE_RATE,
        input=True,
    )

    while len(buffer) / SAMPLE_RATE / BYTES_PER_SAMPLE < rec_len:
        buffer += stream.read(SAMPLE_RATE // 10)

    stream.close()
    p.terminate()

    print(f"Writing recording to {path}")
    with wave.open(path, "wb") as fp:
        fp.setnchannels(1)
        fp.setsampwidth(BYTES_PER_SAMPLE)
        fp.setframerate(SAMPLE_RATE)
        fp.writeframes(buffer)


def get_audio_device_index(p: pyaudio.PyAudio, device: str):
    info = p.get_host_api_info_by_type(pyaudio.paALSA)

    for i in range(0, info["deviceCount"]):
        dev = p.get_device_info_by_host_api_device_index(0, i)

        if device == dev["name"]:
            return i

    raise ValueError(f"Device '{device}' does not exist")


if __name__ == "__main__":
    asyncio.run(main())

Note

Although this example uses the blocking API from PyAudio for simplicity, you may find their callback-based API more useful for bigger projects. Make sure to consult their documentation for more information on the differences between the two and how to use one or the other.