v1.0 #1364

theomonnom · 2025-01-12T20:18:40Z

No description provided.

changeset-bot · 2025-01-13T11:51:32Z

⚠️ No Changeset found

Latest commit: 1bbbdfd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

martin-purplefish · 2025-01-13T11:58:32Z

livekit-agents/livekit/agents/pipeline/generation.py

+                        data.generated_text += chunk
+                        text_ch.send_nowait(chunk)
+
+                    elif isinstance(chunk, ChatChunk):


Maybe an else with a log or exception?

Sure, good idea in case users give us random stuff

livekit-agents/livekit/agents/multimodal/realtime.py

davidzhao · 2025-01-20T00:01:30Z

examples/minimal_worker.py

+    agent.start()
+
+    # start a chat inside the CLI
+    chat_cli = ChatCLI(agent)


as discussed, let's save ChatCLI for later, until echo issues are resolved locally

right, this isn't done, there is no Room implementation yet

livekit-agents/livekit/agents/debug/index.html

livekit-agents/livekit/agents/http_server.py

davidzhao · 2025-01-20T00:39:25Z

livekit-agents/livekit/agents/multimodal/realtime.py

i would rename this package too, from multimodal to something like realtime

realtime.RealtimeModel?

sure.. I think the main difference between the models is:

HTTP based vs Websocket

Stateless vs stateful

so realtime seems like as good of a descriptor as any?

davidzhao · 2025-01-20T00:49:10Z

livekit-agents/livekit/agents/pipeline/audio_recognition.py

+                self._run_eou_detection(self._agent.chat_ctx)
+
+    def _run_eou_detection(self, chat_ctx: llm.ChatContext) -> None:
+        if not self._audio_transcript:


if we've triggered an interruption during VAD start of speech, we will not be able to skip this. otherwise it will recreate the bug of agent becoming "stuck"

davidzhao · 2025-01-20T01:39:36Z

livekit-agents/livekit/agents/pipeline/io.py

+    def __init__(self, video_changed: Callable, audio_changed: Callable) -> None:
+        self._video_stream: VideoStream | None = None
+        self._audio_stream: AudioStream | None = None
+        self._video_changed = video_changed


nit: should we keep our _cb suffix to indicate it's a callback?

It's internal but we can yes

davidzhao · 2025-01-20T01:42:00Z

livekit-agents/livekit/agents/pipeline/io.py

+        return self._sample_rate
+
+    @abstractmethod
+    async def capture_frame(self, frame: rtc.AudioFrame) -> None:


do we want to keep our capture_frame terminology? or start switching to push_frame like we use elsewhere?

davidzhao · 2025-01-20T01:43:37Z

livekit-agents/livekit/agents/pipeline/io.py

+
+class TextSink(ABC):
+    @abstractmethod
+    async def capture_text(self, text: str) -> None:


is there a way to update existing text? thinking in the case we are pushing interim transcripts out

no we don't, this isn't for the transcript, this is for the llm output. Transcripts aren't going to be a stream. They're going to get exposed as events

davidzhao · 2025-01-20T01:48:07Z

livekit-agents/livekit/agents/pipeline/pipeline_agent.py

+    "agent_stopped_speaking",
+    "user_message_committed",
+    "agent_message_committed",
+    "agent_message_interrupted",


we need to add metrics still?

what about these?
"function_calls_collected",
"function_calls_finished",

yes this isn't done yet

livekit-agents/livekit/agents/utils/_message_change.py

livekit-agents/livekit/agents/pipeline/io.py

longcw · 2025-01-20T08:46:51Z

livekit-agents/livekit/agents/pipeline/io.py

+            self.__capturing = False
+
+    @abstractmethod
+    def clear_buffer(self) -> None:


should flush and clear_buffer be also async?
for the avatar use case, it needs to call response = await room.local_participant.perform_rpc to notify the remote participant.

in the current implementation, we assume it happens instantly (so we're not waiting).
If we change those functions to async, we may have to add some more synchronization primitives where the agent uses it.

We don't have to be consistent, but as a note, TTS and STT doesn't have async methods too

so is that ok for the case await room.local_participant.perform_rpc("interrupt_playback") in clear_buffer we use a create_task?

IMO It is OK, but we can discuss further in case we think the devex for ppl developing custom Sinks is bad

longcw · 2025-01-22T13:14:15Z

livekit-agents/livekit/agents/pipeline/speech_handle.py

-        self._nested_speech_done_fut.set_result(None)
+    async def wait_until_interrupted(self, aw: list[Awaitable]) -> None:
+        await asyncio.wait(
+            [*aw, self._interrupt_fut], return_when=asyncio.FIRST_COMPLETED


When I testing in py3.11 it raises TypeError: Passing coroutines is forbidden, use tasks explicitly. (https://docs.python.org/3/library/asyncio-task.html#waiting-primitives)

perhaps the following is needed

async def wait_until_interrupted(self, aw: list[Awaitable]) -> None: aw = [asyncio.create_task(task) for task in aw] await asyncio.wait( [*aw, self._interrupt_fut], return_when=asyncio.FIRST_COMPLETED ) for task in aw: if not task.done(): task.cancel()

or more complicated if there are tasks in aw

async def wait_until_interrupted(self, aw: list[Awaitable]) -> None: temp_tasks = [] tasks = [] for task in aw: if not isinstance(task, asyncio.Task): task = asyncio.create_task(task) temp_tasks.append(task) tasks.append(task) await asyncio.wait([*tasks, self._interrupt_fut], return_when=asyncio.FIRST_COMPLETED) for task in temp_tasks: if not task.done(): task.cancel()

theomonnom added 7 commits December 26, 2024 16:10

initial flexible IO implementation (PipelineAgent)

a980a76

wip

7fc317a

Merge branch 'main' into dev-1.0

5c0d7ed

updated io.AudioSink API to support interruptions

5b9a187

tracing & pipeline

80cb9c4

multimodal wip

35d9762

realtime API abstraction

5d5054d

martin-purplefish reviewed Jan 13, 2025

View reviewed changes

theomonnom added 3 commits January 13, 2025 14:36

chat_ctx diff with root

eb048d2

remove unused

4047f82

Delete _message_change.py

3a25d8d

nbsp reviewed Jan 13, 2025

View reviewed changes

livekit-agents/livekit/agents/multimodal/realtime.py Outdated Show resolved Hide resolved

chat_ctx wip

1bc6b3c

davidzhao reviewed Jan 20, 2025

View reviewed changes

wip

eb4a68e

longcw reviewed Jan 20, 2025

View reviewed changes

livekit-agents/livekit/agents/pipeline/io.py Show resolved Hide resolved

longcw reviewed Jan 20, 2025

View reviewed changes

nbsp mentioned this pull request Jan 20, 2025

Build up chatCtx for multimodal agent livekit/agents-js#248

Open

theomonnom added 4 commits January 20, 2025 15:04

multimodal support & ActiveTask

a715b64

wip (currently segfaulting)

d4d430a

fix segfault & interruptions

c63d862

multimodal wip

1bbbdfd

longcw reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0 #1364

v1.0 #1364

theomonnom commented Jan 12, 2025

changeset-bot bot commented Jan 13, 2025 •

edited

Loading

martin-purplefish Jan 13, 2025

theomonnom Jan 13, 2025

davidzhao Jan 20, 2025

theomonnom Jan 20, 2025

davidzhao Jan 20, 2025

theomonnom Jan 20, 2025

davidzhao Jan 20, 2025

davidzhao Jan 20, 2025

davidzhao Jan 20, 2025

theomonnom Jan 20, 2025

davidzhao Jan 20, 2025

davidzhao Jan 20, 2025

theomonnom Jan 20, 2025 •

edited

Loading

davidzhao Jan 20, 2025

theomonnom Jan 20, 2025

longcw Jan 20, 2025

theomonnom Jan 22, 2025 •

edited

Loading

longcw Jan 22, 2025

theomonnom Jan 22, 2025

longcw Jan 22, 2025

longcw Jan 22, 2025

v1.0 #1364

Are you sure you want to change the base?

v1.0 #1364

Conversation

theomonnom commented Jan 12, 2025

changeset-bot bot commented Jan 13, 2025 • edited Loading

⚠️ No Changeset found

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theomonnom Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theomonnom Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

changeset-bot bot commented Jan 13, 2025 •

edited

Loading

theomonnom Jan 20, 2025 •

edited

Loading

theomonnom Jan 22, 2025 •

edited

Loading