Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with streaming with Gemini #547

Open
eltoob opened this issue Nov 21, 2024 · 2 comments
Open

Issue with streaming with Gemini #547

eltoob opened this issue Nov 21, 2024 · 2 comments

Comments

@eltoob
Copy link

eltoob commented Nov 21, 2024

Describe the bug
Gemini just announced the support for openai library.
See here: https://ai.google.dev/gemini-api/docs/openai
For some reason, the ruby library doesn't stream (or I guess to be more precise it streams all the response at once.
Tried the exact same request the python library and it streams properly

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://ai.google.dev/
  2. Generate a key
  3. Run the code below
  4. There is no streaming
require 'openai'

client = OpenAI::Client.new(
  access_token: "API_KEY",
  uri_base: "https://generativelanguage.googleapis.com/v1beta/openai/"
)
start_time = Time.now
puts start_time
response = client.chat(
  parameters: {
    model: "gemini-1.5-flash",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Hello! write a poem about the moon make it 2000 words" }
    ],
    stream: proc do |chunk|
      current_time = Time.now
      elapsed = current_time - start_time
      puts "#{current_time}: chunk (#{elapsed.round(2)}s elapsed)"
    end
  }
)

You can execute the same code with python and you will see that the stream will work properly

from openai import OpenAI
import time

start_time = time.time()

client = OpenAI(
    api_key="API_KEY",
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = client.chat.completions.create(
  model="gemini-1.5-flash",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello! write a poem about the moon"}
  ],
  stream=True
)

for chunk in response:
    current_time = time.time()
    elapsed = current_time - start_time
    print(f"{elapsed:.2f}s elapsed:"

If you try to run the following code
Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Here I logged the time and you can see that with ruby it returns all the chunks at once
image

Now with python it will actually stream,
image

@eltoob
Copy link
Author

eltoob commented Nov 26, 2024

ok quick update,
I try to replicate the exact same headers as the python library
When i pass "Accept-Encoding" => "gzip, deflate", as a header, it's kinda working (ie I do see the proc working but there are issues with eventstreamer

@eltoob
Copy link
Author

eltoob commented Nov 27, 2024

Ok I finally fixed the issue.

OpenAI.configure do |config|
  config.extra_headers = {
    "Accept-Encoding" => ""
  }
end

Not sure why

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant