We need more examples of utilizing the official API. #684

jwh12333 · 2025-01-29T04:12:43Z

We need to use the official API to recognize the bbox and points positions of the image, and provide an example of scaling the model's output coordinates relative to the actual image input pixels.

ShuaiBai623 · 2025-01-29T13:03:36Z

Thank you for your suggestion. We have added a method to run using the API in the cookbook. You can refer to it.

from openai import OpenAI
import os
import base64
#  base 64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# @title inference function with API
def inference_with_api(image_path, prompt, sys_prompt="You are a helpful assistant.", model_id="qwen2.5-vl-72b-instruct", min_pixels=512*28*28, max_pixels=2048*28*28):
    base64_image = encode_image(image_path)
    client = OpenAI(
        #If the environment variable is not configured, please replace the following line with the Dashscope API Key: api_key="sk-xxx".
        api_key=os.getenv('DASHSCOPE_API_KEY'),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )


    messages=[
        {
            "role": "system",
            "content": [{"type":"text","text": sys_prompt}]},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "min_pixels": min_pixels,
                    "max_pixels": max_pixels,
                    # Pass in BASE64 image data. Note that the image format (i.e., image/{format}) must match the Content Type in the list of supported images. "f" is the method for string formatting.
                    # PNG image:  f"data:image/png;base64,{base64_image}"
                    # JPEG image: f"data:image/jpeg;base64,{base64_image}"
                    # WEBP image: f"data:image/webp;base64,{base64_image}"
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
                {"type": "text", "text": prompt},
            ],
        }
    ]
    completion = client.chat.completions.create(
        model = model_id,
        messages = messages,
       
    )
    return completion.choices[0].message.content

# Use an API-based approach to inference. Apply API key here: https://bailian.console.alibabacloud.com/?apiKey=1
from qwen_vl_utils import smart_resize
os.environ['DASHSCOPE_API_KEY'] = 'your_api_key_here' 
min_pixels = 512*28*28
max_pixels = 2048*28*28
image = Image.open(image_path)
width, height = image.size
input_height,input_width = smart_resize(height,width,min_pixels=min_pixels, max_pixels=max_pixels)
response = inference_with_api(image_path, prompt, min_pixels=min_pixels, max_pixels=max_pixels)
plot_bounding_boxes(image, response, input_width, input_height)

jwh12333 · 2025-01-29T13:16:42Z

Thank you very much, wishing you and the entire team a happy New Year!

jwh12333 changed the title ~~We require more examples of utilizing the official API.~~ We need more examples of utilizing the official API. Jan 29, 2025

jwh12333 closed this as completed Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We need more examples of utilizing the official API. #684

We need more examples of utilizing the official API. #684

jwh12333 commented Jan 29, 2025

ShuaiBai623 commented Jan 29, 2025

jwh12333 commented Jan 29, 2025

We need more examples of utilizing the official API. #684

We need more examples of utilizing the official API. #684

Comments

jwh12333 commented Jan 29, 2025

ShuaiBai623 commented Jan 29, 2025

jwh12333 commented Jan 29, 2025