-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Create Spatial Understanding Notebook for Gemini 2.0 #1651
base: main
Are you sure you want to change the base?
Conversation
holtskinner
commented
Jan 23, 2025
- Based on https://github.com/google-gemini/cookbook/blob/main/gemini-2/spatial_understanding.ipynb
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @holtskinner and reviewers,
I'm currently reviewing this pull request and will post my feedback in a few minutes. In the meantime, I'm providing this summary of the pull request's content to help you and other reviewers quickly understand the changes and intent.
This pull request, titled "feat: Create Spatial Understanding Notebook for Gemini 2.0", aims to create a Jupyter Notebook demonstrating spatial understanding capabilities with the Gemini 2.0 API in Vertex AI. The notebook leverages the Google Gen AI SDK and the gemini-2.0-flash-exp
model.
Here's a summary of the changes:
- A new Jupyter Notebook (
gemini/use-cases/spatial-understanding/spatial_understanding.ipynb
) was added. This notebook is approximately 675 lines long and contains comprehensive examples of spatial understanding tasks. The notebook includes sections on:- Installation and authentication
- Setting up the Google Cloud project
- Importing necessary libraries
- Loading the Gemini model
- Defining helper functions for plotting bounding boxes
- Several examples demonstrating object detection and spatial reasoning with various images and prompts.
- Sharing options for the notebook (Colab, Colab Enterprise, Vertex AI Workbench, GitHub).
The notebook uses a BoundingBox
Pydantic model to structure the output and includes safety settings to mitigate potential risks. The code effectively demonstrates how to use the Gemini API for tasks such as:
* Simply overlaying information on images
* Searching within images for specific objects
* Leveraging Gemini's reasoning capabilities for tasks like finding the shadow of an object.
- The notebook also includes a YouTube video link demonstrating spatial understanding with Gemini 2.0.
I'll provide a more detailed review shortly, but this summary should give you a good overview of the changes.
Here's a little haiku to lighten the mood:
Boxes on the screen,
Gemini's sight, sharp and keen,
Code makes pictures speak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces a new Jupyter Notebook demonstrating spatial understanding with Gemini 2.0. The notebook is well-structured and provides clear explanations and examples. The code is generally clean and easy to follow. However, there are a few minor suggestions for improvement, primarily related to documentation and code style.
I've also noticed a few minor issues with the prompt text and some inconsistencies in the code. I've added comments and suggestions to address these issues and improve the overall quality of the notebook.
Overall, this is a valuable addition to the repository. Addressing the minor issues I've raised would further enhance its clarity and usefulness.