Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination support to list_models() API for systematic model discovery #2741

Open
darwich6 opened this issue Jan 8, 2025 · 0 comments
Open

Comments

@darwich6
Copy link

darwich6 commented Jan 8, 2025

Is your feature request related to a problem? Please describe.
The current list_models() API only supports a limit parameter without true pagination support. This makes it impossible to systematically discover models beyond the initial limit. For example, when fetching most downloaded models:

models = hf.list_models(
filter="text-generation",
sort="downloads",
direction=-1,
limit=100
)

This will always return the same top 100 models unless a new model has overtaken the top 100 model downloads. There's no way to get models 101-200, unless you change the limit to 200 and grab the 1-200 records. This is problematic for services that need to:

  • Discover new models systematically
  • Process models in smaller batches
  • Index or monitor the full model ecosystem

Describe the solution you'd like

Add proper pagination support to the API by either:

  1. Adding an offset parameter:
python
models = hf.list_models(
filter="text-generation",
sort="downloads",
limit=100,
offset=100 # Get next 100 models
)
  1. Or exposing the internal cursor-based pagination that's already used by paginate():
python
response = hf.list_models(
filter="text-generation",
limit=100,
cursor="next_page_token" # From previous response
)
next_cursor = response.next_cursor

Describe alternatives you've considered

Current workarounds we've tried:

  1. Fetching very large batches (1000+ models) and filtering locally
  2. Using different sort criteria to try to get different models
  3. Using the search parameter with different queries

None of these provide a reliable way to systematically discover all models and ensure we are getting different models with each call.

Additional context
Looking at the source code, the API already uses internal pagination via paginate():

items = paginate(path, params=params, headers=headers)
if limit is not None:
items = islice(items, limit)

Exposing this functionality would align with common API practices and enable better tooling around the Hub's model ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant