Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need more examples of utilizing the official API. #684

Closed
jwh12333 opened this issue Jan 29, 2025 · 2 comments
Closed

We need more examples of utilizing the official API. #684

jwh12333 opened this issue Jan 29, 2025 · 2 comments

Comments

@jwh12333
Copy link

We need to use the official API to recognize the bbox and points positions of the image, and provide an example of scaling the model's output coordinates relative to the actual image input pixels.

@jwh12333 jwh12333 changed the title We require more examples of utilizing the official API. We need more examples of utilizing the official API. Jan 29, 2025
@ShuaiBai623
Copy link
Collaborator

Thank you for your suggestion. We have added a method to run using the API in the cookbook. You can refer to it.

from openai import OpenAI
import os
import base64
#  base 64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# @title inference function with API
def inference_with_api(image_path, prompt, sys_prompt="You are a helpful assistant.", model_id="qwen2.5-vl-72b-instruct", min_pixels=512*28*28, max_pixels=2048*28*28):
    base64_image = encode_image(image_path)
    client = OpenAI(
        #If the environment variable is not configured, please replace the following line with the Dashscope API Key: api_key="sk-xxx".
        api_key=os.getenv('DASHSCOPE_API_KEY'),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )


    messages=[
        {
            "role": "system",
            "content": [{"type":"text","text": sys_prompt}]},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "min_pixels": min_pixels,
                    "max_pixels": max_pixels,
                    # Pass in BASE64 image data. Note that the image format (i.e., image/{format}) must match the Content Type in the list of supported images. "f" is the method for string formatting.
                    # PNG image:  f"data:image/png;base64,{base64_image}"
                    # JPEG image: f"data:image/jpeg;base64,{base64_image}"
                    # WEBP image: f"data:image/webp;base64,{base64_image}"
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
                {"type": "text", "text": prompt},
            ],
        }
    ]
    completion = client.chat.completions.create(
        model = model_id,
        messages = messages,
       
    )
    return completion.choices[0].message.content

# Use an API-based approach to inference. Apply API key here: https://bailian.console.alibabacloud.com/?apiKey=1
from qwen_vl_utils import smart_resize
os.environ['DASHSCOPE_API_KEY'] = 'your_api_key_here' 
min_pixels = 512*28*28
max_pixels = 2048*28*28
image = Image.open(image_path)
width, height = image.size
input_height,input_width = smart_resize(height,width,min_pixels=min_pixels, max_pixels=max_pixels)
response = inference_with_api(image_path, prompt, min_pixels=min_pixels, max_pixels=max_pixels)
plot_bounding_boxes(image, response, input_width, input_height)

@jwh12333
Copy link
Author

Thank you very much, wishing you and the entire team a happy New Year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants