Microphone use
==============
This page provides a quick overview of how to
access the microphones on Navel.

-------
Devices
-------
The hardware devices for all 7 microphones are not diectly accessible, as they
wouldn't be useful for most applications. Instead, they are routed through
`ODAS <https://github.com/introlab/odas>`_ to enable sound-source tracking.

The tracked sound sources are exposed through a 4-channel int16 48kHz audio
device named ``odas``, and each separate channel is additionally routed to a
1-channel ``odas_n`` device for convenience. From now on, we will use the device 
names to refer to each channel/device. The first one, ``odas_1``, is static,
meaning it will always be available and doing beamforming in the same direction
(directly in front of Navel's face). The other three are dynamic, meaning they 
will track any new sounds that are perceived.

------------------
Speech recognition
------------------
Although you could use any device/channel for speech recognition, ``odas_1`` is
recommended for general use, due to it being static (i.e. always available).
Because of this, it's also set as the default input device on the system. For
how to use this default device for speech recognition, have a look at the code
in the included ``chat.py`` example.

---------------------
Sound source tracking
---------------------
As mentioned before, ``odas_2-4`` give access to dynamic sound sources. The
metadata associated with each sound source (location, activity level, etc.) can
be accessed through the :py:meth:`~.Robot.next_frame` method of the 
:py:class:`~.Robot` class. This method returns a :py:class:`~.PerceptionData` 
frame which includes information about all tracked sound sources as 
:py:class:`~.SstMeta` objects in a list, with the index of each sound source in
the list matching the device it's associated with (e.g. the first one is always 
odas_1).

Only the direction of sounds can be calculated, not their exact position in 3d
space, so the directions are mapped to the surface of a 2m-radius sphere before
being added to the perception data for convenience. It's important to note that
therefore, the ``loc`` attribute of each :py:class:`~.SstMeta` object represents
the direction the sound was perceived from, and is not the the exact point where
the sound source is.

The following example should print out all perceived sound locations, so you can
get an idea for what this means:

.. code-block:: python
    :linenos:

    import asyncio

    import navel


    async def main():
        print("Listening forever, press Ctrl+C to stop...")
        async with navel.Robot() as robot:
            while True:
                perc = await robot.next_frame()
                for channel, metadata in enumerate(perc.sst_tracks_latest):

                    if metadata.activity > 0.2:
                        print(
                            f"Heard a sound on channel {channel + 1} at {metadata.loc}"
                        )


    if __name__ == "__main__":
        asyncio.run(main())


---------
Recording
---------
Eventually, you may want to record sound directly from a specific channel, e.g.
to save it to a file or perform speech recognition locally/with a different
framework. In that case, we recommend using the 
`PyAudio <https://people.csail.mit.edu/hubert/pyaudio/docs/>`_ library which
exposes a simple API for audio streams:

.. code-block:: python
    :linenos:

    import asyncio
    import wave

    import pyaudio

    SAMPLE_RATE = 48000
    BYTES_PER_SAMPLE = 2


    async def main():
        device = "odas_1"
        rec_len = 5
        path = "output.wav"

        p = pyaudio.PyAudio()
        buffer = b""

        print(f"Recording {rec_len} seconds of audio from {device}")
        stream = p.open(
            format=pyaudio.paInt16,
            channels=1,
            input_device_index=get_audio_device_index(p, device),
            rate=SAMPLE_RATE,
            input=True,
        )

        while len(buffer) / SAMPLE_RATE / BYTES_PER_SAMPLE < rec_len:
            buffer += stream.read(SAMPLE_RATE // 10)

        stream.close()
        p.terminate()

        print(f"Writing recording to {path}")
        with wave.open(path, "wb") as fp:
            fp.setnchannels(1)
            fp.setsampwidth(BYTES_PER_SAMPLE)
            fp.setframerate(SAMPLE_RATE)
            fp.writeframes(buffer)


    def get_audio_device_index(p: pyaudio.PyAudio, device: str):
        info = p.get_host_api_info_by_type(pyaudio.paALSA)

        for i in range(0, info["deviceCount"]):
            dev = p.get_device_info_by_host_api_device_index(0, i)

            if device == dev["name"]:
                return i

        raise ValueError(f"Device '{device}' does not exist")


    if __name__ == "__main__":
        asyncio.run(main())

.. note::
    Although this example uses the blocking API from PyAudio for simplicity,
    you may find their callback-based API more useful for bigger projects. Make
    sure to consult their documentation for more information on the differences
    between the two and how to use one or the other.