-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README.md #1520
base: main
Are you sure you want to change the base?
Update README.md #1520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:
This pull request updates the README.md file for the websocket-demo-app
within the gemini/multimodal-live-api
directory. The main changes are focused on clarifying the tutorial's instructions and modernizing the language. Specifically, the README now explicitly mentions Gemini 2.0 Flash Experimental and streamlines the setup instructions, separating the preparation steps from the local execution steps. The Cloud Shell setup instructions have also been simplified. The intent is to make the tutorial easier to follow and more up-to-date with the latest Gemini capabilities.
The changes are primarily in the gemini/multimodal-live-api/websocket-demo-app/README.md
file. The updated README provides clearer instructions for setting up and running the demo application, both locally and in Cloud Shell. The descriptions of the architecture and prerequisites have also been improved for better clarity.
If there's anything unclear or requires further clarification, please let me know.
And now, a little haiku to brighten your day:
Code flows like a stream,
Readme guides the path so clear,
Joy in every line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request updates the README file with some helpful clarifications and improvements to the setup instructions. The changes are well-written and should make it easier for users to get started with the demo app. Here are a few suggestions for improvement.
@@ -1,76 +1,79 @@ | |||
# Multimodal Live API Demo | |||
|
|||
In this tutorial, you will be building a web application that enables you to use your voice and camera to talk to Gemini 2.0 through the [Multimodal Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live). | |||
This tutorial guides you through building a web application that allows you to interact with [Gemini 2.0 Flash Experimental](https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#ceo-message) using your voice and camera. This is achieved through the [Multimodal Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live), a low-latency bidirectional streaming API that supports audio and video input and can output audio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to briefly explain what "Flash Experimental" means in this context for new users. Is it a specific version or feature set of Gemini 2.0? This would help users understand the capabilities and limitations of the demo.
This tutorial guides you through building a web application that allows you to interact with [Gemini 2.0 Flash Experimental](https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#ceo-message) using your voice and camera. This is achieved through the [Multimodal Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live), a low-latency bidirectional streaming API that supports audio and video input and can output audio. | |
This tutorial guides you through building a web application that allows you to interact with [Gemini 2.0 Flash Experimental](https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#ceo-message) (a research prototype exploring new multimodal capabilities) using your voice and camera. This is achieved through the [Multimodal Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live), a low-latency bidirectional streaming API that supports audio and video input and can output audio. |
* A Google Cloud project | ||
* Foundational knowledge of Web development |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding links to instructions for creating a Google Cloud project and enabling billing. This would be helpful for users who are new to Google Cloud.
* A Google Cloud project | |
* Foundational knowledge of Web development | |
* A Google Cloud project (see [instructions](https://cloud.google.com/resource-manager/docs/creating-managing-projects)) | |
* Foundational knowledge of Web development |
|
||
### Setup locally | ||
### Preparation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git clone https://github.com/GoogleCloudPlatform/generative-ai.git | ||
cd generative-ai/gemini/multimodal-live-api/websocket-demo-app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cd
command should also include the repository name after cloning, since the user may clone into a directory other than the repository name.
git clone https://github.com/GoogleCloudPlatform/generative-ai.git | |
cd generative-ai/gemini/multimodal-live-api/websocket-demo-app | |
git clone https://github.com/GoogleCloudPlatform/generative-ai.git | |
cd generative-ai/gemini/multimodal-live-api/websocket-demo-app |
Description
Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).Fixes #<issue_number_goes_here> 🦕