-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] avatar plugin based on v1.0 #1391
base: dev-1.0
Are you sure you want to change the base?
Conversation
longcw
commented
Jan 20, 2025
•
edited
Loading
edited
- AudioSink based on DataStream (Add data stream support python-sdks#347)
- Avatar worker example with video generation and av sync
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great! just a few comments.
we'll also need some error handling in various parts.. how do both sides handle cases where the other side is disconnected. if the avatar participant is gone for longer than a reasonable timeout, then the agent would likely need to report that error and shutdown itself.
similarly.. if the controller is gone, the service on the other side might want to avoid consuming resources and also exit
|
||
AUDIO_SENDER_ATTR = "__livekit_avatar_audio_sender" | ||
AUDIO_RECEIVER_ATTR = "__livekit_avatar_audio_receiver" | ||
RPC_INTERRUPT_PLAYBACK = "__livekit_avatar_interrupt_playback" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency, using lk.
namespace to identify livekit specific actions
RPC_INTERRUPT_PLAYBACK = "__livekit_avatar_interrupt_playback" | |
RPC_INTERRUPT_PLAYBACK = "lk.interrupt_playback" |
async def start(self) -> None: | ||
"""Wait for worker participant to join and start streaming""" | ||
# mark self as sender | ||
await self._room.local_participant.set_attributes({AUDIO_SENDER_ATTR: "true"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking we can simplify this step. instead the receiver could just wait for an audio stream of a particular name?
"""Wait for worker participant to join and start streaming""" | ||
# mark self as sender | ||
await self._room.local_participant.set_attributes({AUDIO_SENDER_ATTR: "true"}) | ||
self._remote_participant = await wait_for_participant( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if.. instead of waiting for an attribute, we could:
- take
avatar_identity
as a param in the sink (with a sane default) - create a token for that identity and send it to the other side as part of initial handshake
- here we can just wait for that agreed-upon identity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good! on the avatar side, it wait for the audio stream with a particular name from the participant with kind=='agent'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
# start new stream | ||
# TODO: any better option to send the metadata? | ||
name = f"audio_{frame.sample_rate}_{frame.num_channels}" | ||
self._stream_writer = await self._room.local_participant.stream_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good use of stream extensions:
writer = await room.local_participant.stream_file("audio",
extensions={"sample_rate": "48000", "channels": "1"})
or
writer = await room.local_participant.stream_file("audio",
extensions={"audio_settings": json.dumps({"sample_rate": 48000, channels: 1})})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. the extensions
is some kind of metadata
? then what is the reason it is named extensions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah.. I think attributes
is probably a better name
# mark self as receiver | ||
await self._room.local_participant.set_attributes({AUDIO_RECEIVER_ATTR: "true"}) | ||
|
||
self._remote_participant = await wait_for_participant( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems here we can just wait for participant.kind == agent?
if we wanted to handle multiple avatars in the room, then the integration should take in the controller's identity.
reader: rtc.FileStreamReader, remote_participant_id: str | ||
) -> None: | ||
if remote_participant_id != self._remote_participant.identity: | ||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we really want to warn on any other incoming file stream? that seems like a rather narrow use case for this plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, I'll filter for the audio stream first, so other data streams can still be processed by other handlers.
Btw, what is the use case of the file_name
in data stream, can I pass a tag and the metadata like sample_rate
and num_channels
using file name, or is there any better option for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see @davidzhao's comment above, the best option is the extensions map on the stream.
reader = self._stream_readers.pop(0) | ||
async for data in reader.stream_reader: | ||
yield rtc.AudioFrame( | ||
data=data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pattern would suggest that we're sure a single audio frame never exceeds STREAM_CHUNK_SIZE
(~15kb)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For audio bytes, splitting large chunks into smaller chunks before sending is fine, and should be less than 15kb. but for other use cases, receiving a different number of chunks than it send may not be good behavior. Maybe add a size limit at the send side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had a size limit in there, @theomonnom's wish was that we wouldn't enforce such a limit, but I agree, it might make things trickier if we don't have a sender side size limit