Microphone use
This page provides a quick overview of how to access the microphones on Navel.
Devices
The hardware devices for all 7 microphones are not diectly accessible, as they wouldn’t be useful for most applications. Instead, they are routed through ODAS to enable sound-source tracking.
The tracked sound sources are exposed through a 4-channel int16 48kHz audio
device named odas
, and each separate channel is additionally routed to a
1-channel odas_n
device for convenience. From now on, we will use the device
names to refer to each channel/device. The first one, odas_1
, is static,
meaning it will always be available and doing beamforming in the same direction
(directly in front of Navel’s face). The other three are dynamic, meaning they
will track any new sounds that are perceived.
Speech recognition
Although you could use any device/channel for speech recognition, odas_1
is
recommended for general use, due to it being static (i.e. always available).
Because of this, it’s also set as the default input device on the system. For
how to use this default device for speech recognition, have a look at the code
in the included chat.py
example.
Sound source tracking
As mentioned before, odas_2-4
give access to dynamic sound sources. The
metadata associated with each sound source (location, activity level, etc.) can
be accessed through the next_frame()
method of the
Robot
class. This method returns a PerceptionData
frame which includes information about all tracked sound sources as
SstMeta
objects in a list, with the index of each sound source in
the list matching the device it’s associated with (e.g. the first one is always
odas_1).
Only the direction of sounds can be calculated, not their exact position in 3d
space, so the directions are mapped to the surface of a 2m-radius sphere before
being added to the perception data for convenience. It’s important to note that
therefore, the loc
attribute of each SstMeta
object represents
the direction the sound was perceived from, and is not the the exact point where
the sound source is.
The following example should print out all perceived sound locations, so you can get an idea for what this means:
1import asyncio
2
3import navel
4
5
6async def main():
7 print("Listening forever, press Ctrl+C to stop...")
8 async with navel.Robot() as robot:
9 while True:
10 perc = await robot.next_frame()
11 for channel, metadata in enumerate(perc.sst_tracks_latest):
12
13 if metadata.activity > 0.2:
14 print(
15 f"Heard a sound on channel {channel + 1} at {metadata.loc}"
16 )
17
18
19if __name__ == "__main__":
20 asyncio.run(main())
Recording
Eventually, you may want to record sound directly from a specific channel, e.g. to save it to a file or perform speech recognition locally/with a different framework. In that case, we recommend using the PyAudio library which exposes a simple API for audio streams:
1import asyncio
2import wave
3
4import pyaudio
5
6SAMPLE_RATE = 48000
7BYTES_PER_SAMPLE = 2
8
9
10async def main():
11 device = "odas_1"
12 rec_len = 5
13 path = "output.wav"
14
15 p = pyaudio.PyAudio()
16 buffer = b""
17
18 print(f"Recording {rec_len} seconds of audio from {device}")
19 stream = p.open(
20 format=pyaudio.paInt16,
21 channels=1,
22 input_device_index=get_audio_device_index(p, device),
23 rate=SAMPLE_RATE,
24 input=True,
25 )
26
27 while len(buffer) / SAMPLE_RATE / BYTES_PER_SAMPLE < rec_len:
28 buffer += stream.read(SAMPLE_RATE // 10)
29
30 stream.close()
31 p.terminate()
32
33 print(f"Writing recording to {path}")
34 with wave.open(path, "wb") as fp:
35 fp.setnchannels(1)
36 fp.setsampwidth(BYTES_PER_SAMPLE)
37 fp.setframerate(SAMPLE_RATE)
38 fp.writeframes(buffer)
39
40
41def get_audio_device_index(p: pyaudio.PyAudio, device: str):
42 info = p.get_host_api_info_by_type(pyaudio.paALSA)
43
44 for i in range(0, info["deviceCount"]):
45 dev = p.get_device_info_by_host_api_device_index(0, i)
46
47 if device == dev["name"]:
48 return i
49
50 raise ValueError(f"Device '{device}' does not exist")
51
52
53if __name__ == "__main__":
54 asyncio.run(main())
Note
Although this example uses the blocking API from PyAudio for simplicity, you may find their callback-based API more useful for bigger projects. Make sure to consult their documentation for more information on the differences between the two and how to use one or the other.