Add Goodfire API Provider Support #1161

menhguin · 2025-01-20T16:06:07Z

Add Goodfire API Provider Support

Overview

This PR introduces support for the Goodfire API, enabling the use of Meta's Llama models through Goodfire's inference service. The implementation provides basic chat completion functionality while maintaining compatibility with the existing evaluation framework.

Over the next few weeks, I expect to add complex mechanistic interpretability techniques (feature search, inspect, feature steering) as shown in the Goodfire AI documentation. For now, this PR seeks to cover basic chat completion and standardisation in line with other model providers (since I have to keep merging new commits every day)

Critical Implementation Details

Core Provider Implementation (inspect_ai/src/inspect_ai/model/_providers/goodfire.py):
- Implements GoodfireAPI class with synchronous API handling
- Key methods: generate(), _to_goodfire_message(), connection management
- Constants for defaults: DEFAULT_MAX_TOKENS=4096, DEFAULT_TEMPERATURE=0.7
- Model mapping for supported variants in MODEL_MAP
Known Limitations:
- MMLU Few-shot Evaluation Issue ():
  - Zero-shot works correctly (~0.57 accuracy)
  - Few-shot fails (~0.1 accuracy) due to strict format following - we note that this is more due to using Llama 3 instruct models, rather than a Goodfire-specific issue
  - Model outputs bare letters instead of "Answer: A" format
  - Affects inspect_evals/src/inspect_evals/mmlu/mmlu_5_shot.py
- Synchronous API in async framework:
  - Blocks event loop during generation
  - Affects progress bar updates
  - Located in generate() method
API Differences (vs OpenAI/Anthropic):
- Parameter naming:
  - Uses max_completion_tokens vs max_tokens
  - Different default values
- Response handling:
  - Dictionary-based vs object-based responses
  - Manual extraction required for content/usage
  - No finish_reason field
- Message handling:
  - Tool messages converted to user messages
  - Limited role support

Required Configuration

Environment Setup:

GOODFIRE_API_KEY=<key>
GOODFIRE_BASE_URL=<optional>

pip install goodfire

Model Support:
- Currently supports:
  - meta-llama/Meta-Llama-3.1-8B-Instruct
  - meta-llama/Llama-3.3-70B-Instruct

Pending Improvements (Prioritized)

Critical:
- Fix few-shot evaluation format handling
- Implement proper async operation
- Add progress tracking solution
Important:
- Add streaming support when available
- Implement tool calls support
- Enhance error handling
Nice to Have:
- Add feature analysis support
- Expand model support
- Add caching strategy

Testing Status

Verified:
- Basic chat completion
- Zero-shot evaluations
- Usage statistics collection
- Parameter validation
Known Issues:
- Few-shot format compliance
- Progress tracking during long runs
- Type hints causing linter errors

Breaking Changes

None. This should not affect the use of other model providers, and effort has been taken to ensure standardisation. Code changes have been isolated to:

/src/inspect_ai/model/_providers/goodfire.py for core implementation script
src/inspect_ai/model/_providers/providers.py to register Goodfire as a model provider
src/inspect_ai/model/_generate_config.py for certain Goodfire-specific generation functions (tho do feel free to test this to make sure it doesn't affect any other providers)

So far, the model seems to generate and score similarly to VLLM-hosted Llama 8B-Instruct on GPQA, GSM8K and MMLU.

Conclusion

Once again, I will be improving and building on this initial chat generation implementation in the coming weeks with more advanced mech interp functions. If you come across issues in other evals, do let me know.

- Introduced `GoodfireConfig` dataclass for Goodfire-specific settings in `_generate_config.py`. - Implemented `GoodfireAPI` class in a new file `_providers/goodfire.py` to handle interactions with the Goodfire API. - Registered the Goodfire API provider in `_providers/providers.py`, including error handling for dependency imports. - Updated `GenerateConfig` to include Goodfire configuration options.

…th version verification and improved error handling - Added support for minimum version requirement for the Goodfire API. - Introduced supported model literals and updated model name handling. - Improved API key retrieval logic with environment variable checks. - Enhanced client initialization to include base URL handling. - Updated maximum completion tokens to 4096 for better performance. - Refined message conversion to handle tool messages appropriately. - Removed unsupported feature analysis configuration. This commit improves the robustness and usability of the Goodfire API integration.

…odfireAPI generate method for improved error handling and parameter management - Enhanced the generate method to use a try-except block for better error logging. - Consolidated API request parameters into a dictionary for cleaner code. - Added handling for usage statistics in the output if available. - Improved message conversion process for better clarity and maintainability. This update increases the robustness of the Goodfire API integration and enhances error reporting.

menhguin · 2025-01-20T16:08:15Z

OH YES almost forgot: You need to pip install goodfire as well, but i'm unsure where to add this.

jjallaire

Fantastic! So happy to see this and excited to see it built out further. Left some feedback in the review. Some additional comments:

Saw there was a note on streaming support -- currently do don't use streaming in our model interfaces so I don't think this will be required (but perhaps there is a scenario I'm not thinking of?)
Saw your note on caching -- would the built in caching work (we cache ModelOutput instances based on a key that hopefully reflects the full range of possible inputs).

In terms of adding mech interp stuff, we've had initial discussions with a few others in the field on how to do this. At some point I think we'd like to define some common data structures that can go in ModelOutput but we aren't there yet. In the meantime, you should add any mech interp data to the metadata field of ModelOutput (using whatever schema you want). Later we can try to bring some of this back into something that is shared by multiple mech interp back ends.

src/inspect_ai/model/_generate_config.py

src/inspect_ai/model/_providers/goodfire.py

src/inspect_ai/model/_providers/providers.py

src/inspect_ai/model/_providers/goodfire.py

jjallaire · 2025-01-21T12:52:23Z

OH YES almost forgot: You need to pip install goodfire as well, but i'm unsure where to add this.

You can add this to the dev config of [project.optional-dependencies] in pyproject.toml

menhguin · 2025-01-21T16:57:17Z

The proposed changes here seem reasonable. I will attempt to implement all of them by Friday morning-ish UK time.

Some of the more ... awkward design choices were me trying to patch dozens of little mismatches between Inspect and Goodfire API (different function names, output formats, Goodfire's special functions). I tried to clean it up + standardise with the rest of Inspect but evidently i missed a few things.

I'll try the metadata approach afterwards. Figuring out which mech interp function to allow and how is gonna be ... tricky. Do you have any reference examples where a model provider supports more than just text generation via Inspect? Even logits/logprob view might be a helpful reference.

jjallaire · 2025-01-21T19:33:36Z

The proposed changes here seem reasonable. I will attempt to implement all of them by Friday morning-ish UK time.

Great, thanks! (and feel free to ping me w/ any questions in the meantime)

I'll try the metadata approach afterwards. Figuring out which mech interp function to allow and how is gonna be ... tricky. Do you have any reference examples where a model provider supports more than just text generation via Inspect? Even logits/logprob view might be a helpful reference.

Yes, several model providers (OpenAI, Grok, TogetherAI, Huggingface, llama-cpp-python, and vLLM) support Logprobs:

inspect_ai/src/inspect_ai/model/_model_output.py

Lines 39 to 72 in 124d837

    
           class TopLogprob(BaseModel): 
        
               """List of the most likely tokens and their log probability, at this token position.""" 
        
               token: str 
        
               """The top-kth token represented as a string.""" 
        
               logprob: float 
        
               """The log probability value of the model for the top-kth token.""" 
        
               bytes: list[int] | None = Field(default=None) 
        
               """The top-kth token represented as a byte array (a list of integers).""" 
        
           class Logprob(BaseModel): 
        
               """Log probability for a token.""" 
        
               token: str 
        
               """The predicted token represented as a string.""" 
        
               logprob: float 
        
               """The log probability value of the model for the predicted token.""" 
        
               bytes: list[int] | None = Field(default=None) 
        
               """The predicted token represented as a byte array (a list of integers).""" 
        
               top_logprobs: list[TopLogprob] | None = Field(default=None) 
        
               """If the `top_logprobs` argument is greater than 0, this will contain an ordered list of the top K most likely tokens and their log probabilities.""" 
        
           class Logprobs(BaseModel): 
        
               """Log probability information for a completion choice.""" 
        
               content: list[Logprob] 
        
               """a (num_generated_tokens,) length list containing the individual log probabilities for each generated token."""

Eventually I'd like to have some standard fields like this for mech interp payloads (so that readers of logs can benefit form some uniformity). Absent working out these schemas I would put your own data structures in ModelOutput.metadata then we can ideally learn from them and work towards standardization over time.

…e package dependency and remove GoodfireConfig class from GenerateConfig. Enhance goodfire provider with version verification.

…se runtime-safe string. Add note and TODO for potential issue in Goodfire's repo.

- Remove hardcoded MODEL_MAP and variant validation - Directly use model name for Variant initialization - Standardize max_tokens to use default value - Enhance generate method to include additional model arguments - Streamline model configuration and parameter handling

- Improve error handling by using specific Goodfire exception types - Refactor rate limit and context length error detection - Enhance parameter configuration with more flexible temperature and top_p handling - Add type casting and improve type hints - Simplify client initialization and method calls

- Simplify model argument collection and storage - Update generate method to incorporate model arguments more flexibly - Remove separate tracking of temperature and top_p - Ensure all model-specific arguments are passed to generation parameters

- Update type hints for model arguments and parameters - Improve parameter configuration in generate method - Simplify base model selection and parameter passing - Enhance code readability and type consistency

- Add default values for temperature and top_p when not specified in model arguments - Prioritize model_args over config parameters - Ensure consistent parameter configuration when generating completions

menhguin · 2025-01-25T09:49:47Z

Newest updates as of this weekend. Basically improving robustness, handling and minimising hardcodes:

Replaced synchronous with async (I found it deep in the docs). This also now enables progress tracking.
Refined the logic of passing model args, including any model args that should be added in the future.
Streamlined error handling to pass max tokens, invalid request, ratelimit and connection key error specifically since i'm told that lets Inspect do handling on its own, but otherwise errors are passed as-is. For example, invalid model args are passed and the specific error is propagated as expected. I tested an invalid top_p value of 1.1 and it gave the relevant error output.

│ RuntimeError:                                                                             │
│ RequestFailedException('{"detail":[{"type":"less_than_equal","loc":["body","top_p"],"msg… │
│ should be less than or equal to 1","input":1.1,"ctx":{"le":1.0}}]}')

Model names no longer hardcoded, simply passed to the Goodfire API. This was needed early on for setup since it took a while to get model name passing right due to issues with "variant", Literal, prefixing and the actual mode names being weird. This is robust to model names being edited/added, is more standardised and eliminates the need to update specific model params.
General standardisation and logic improvements to iron out hardcoded values, model args precedence etc to make all the new functions play nice with one another, especially when something unexpected happens. Though I may have missed something.

So text gen should be settled now? I am moving onto feature implementations this week, finally.

jjallaire · 2025-01-25T13:41:37Z

Thanks again for your diligent work here! Noted the changes and PR is looking good. I did a scan of the code as-is and did find a couple more things we should tweak (will post those a new review shortly).

jjallaire

Some additional comments (all of them quite small).

Could you also add a simple test (and related skip_if function) along these lines: https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/tests/model/providers/test_groq.py

I noticed there were also some ruff errors when running the checks. You can clean this up by running make check locally.

src/inspect_ai/model/_providers/goodfire.py

jjallaire · 2025-01-25T13:52:53Z

src/inspect_ai/model/_providers/providers.py


 # Defer importing model api classes until they are actually used
 # (this allows the package to load without the optional deps)
 # Note that some api providers (e.g. Cloudflare, AzureAI) don't
 # strictly require this treatment but we do it anyway for uniformity,

+logger = logging.getLogger(__name__)


We don't use the logger anymore so we can remove it.

jjallaire · 2025-01-25T13:54:08Z

src/inspect_ai/model/_providers/providers.py


 from inspect_ai._util.error import pip_dependency_error
 from inspect_ai._util.version import verify_required_version

 from .._model import ModelAPI
-from .._registry import modelapi
+from .._registry import modelapi, modelapi_register
+from .goodfire import GoodfireAPI


The import needs to be moved down into the goodfire() function (otherwise all Inspect users will need this package installed). See below for recommended implementation of this function.

Shifted. first few lines are just this (unchanged)

import os from inspect_ai._util.error import pip_dependency_error from inspect_ai._util.version import verify_required_version from .._model import ModelAPI from .._registry import modelapi

jjallaire · 2025-01-25T13:55:56Z

src/inspect_ai/model/_providers/providers.py

@@ -239,6 +243,21 @@ def mockllm() -> type[ModelAPI]:
    return MockLLM


+@modelapi(name="goodfire")


Recommended implementation based on other providers:

@modelapi(name="anthropic") def anthropic() -> type[ModelAPI]: FEATURE = "Goodfire API" PACKAGE = "goodfire" MIN_VERSION = "0.2.5" # verify we have the package try: import goodfire # noqa: F401 except ImportError: raise pip_dependency_error(FEATURE, [PACKAGE]) # verify version verify_required_version(FEATURE, PACKAGE, MIN_VERSION) # in the clear from .goodfire import GoodfireAPI return GoodfireAPI

implemented in latest version!

@modelapi("goodfire") def goodfire() -> type[ModelAPI]: """Get the Goodfire API provider.""" FEATURE = "Goodfire API" PACKAGE = "goodfire" MIN_VERSION = "0.3.4" # Support for newer Llama models and OpenAI compatibility # verify we have the package try: import goodfire # noqa: F401 except ImportError: raise pip_dependency_error(FEATURE, [PACKAGE]) # verify version verify_required_version(FEATURE, PACKAGE, MIN_VERSION) # in the clear from .goodfire import GoodfireAPI return GoodfireAPI

- Remove redundant rate limit exception handling in handle_error method - Simplify import of Goodfire API exceptions - Maintain existing InvalidRequestException handling

- Remove unnecessary type imports - Simplify type casting in generate method - Update type hints for parameters - Remove redundant type casting for return values

- Remove unused logging import - Refactor Goodfire provider initialization to improve error handling - Streamline package import and version verification - Remove unnecessary logger definition

- Create new test file for Goodfire model provider - Add skip decorator for Goodfire API key requirement - Implement basic test for model generation with sample configuration - Verify response generation for Goodfire model

- Remove unnecessary imports and unused variables - Simplify error handling and type conversion logic - Streamline code by removing commented-out and redundant code - Update providers.py to remove unused import

… test case - Modify GoodfireAPI to filter out non-API parameters like api_key and base_url - Update test case to use a specific Llama 3.1 model from SUPPORTED_MODELS - Simplify test configuration and add tool_choice parameter - Add RateLimitException to imports for potential future error handling

- Update test_goodfire.py to use GoodfireAPI directly - Remove skip decorator and unnecessary imports - Simplify test configuration - Update providers.py to use deferred import for GoodfireAPI - Improve code organization and import management

menhguin · 2025-01-26T10:39:17Z

hello! each of the requested changes should be applied now

jjallaire · 2025-01-26T12:25:04Z

hello! each of the requested changes should be applied now

Thanks! I noticed that there are 3 more small issues to resolve (all related to the providers.py file). We should also name that function goodfire() rather than get_goodfire().

- Add version check for goodfire package (minimum 0.3.4) - Modify provider function to include package verification - Remove version constraint in pyproject.toml - Rename get_goodfire() to goodfire() for consistency

menhguin · 2025-01-26T14:10:06Z

hello! each of the requested changes should be applied now

Thanks! I noticed that there are 3 more small issues to resolve (all related to the providers.py file). We should also name that function goodfire() rather than get_goodfire().

I actually did implement those, I just forgot to reply mentioning that. anyway, goodfire() should be get_goodfire() now!

- Remove unused logging import - Refactor Goodfire provider initialization to improve error handling - Streamline package import and version verification - Remove unnecessary logger definition

menhguin added 19 commits January 18, 2025 00:47

GSM8K Test Script

ee9a647

Merge branch 'main' of https://github.com/UKGovernmentBEIS/inspect_ai

31f7d08

to fix tmr

17bb100

Merge branch 'UKGovernmentBEIS:main' into main

e2eb13a

Merge branch 'UKGovernmentBEIS:main' into main

63b1ad1

registry.py full revert

8c0a4d8

i reverted model.py and it worked???

b466dea

Merge branch 'UKGovernmentBEIS:main' into main

7503d70

generate config reset

89a9ec0

[further standardisation changes]

65da3b1

Delete gsm8k_example.py

254e0a8

Update _model.py w space lol

15eda0a

Update registry.py w space lol

65f507f

Update registry.py

2af6a9a

Update _model.py

89da81b

Merge branch 'UKGovernmentBEIS:main' into main

b628884

jjallaire requested changes Jan 21, 2025

View reviewed changes

menhguin added 6 commits January 22, 2025 12:24

Merge branch 'UKGovernmentBEIS:main' into main

dfa4d0e

[changes not in goodfire.py] Update Goodfire integration: add goodfir…

45d5609

…e package dependency and remove GoodfireConfig class from GenerateConfig. Enhance goodfire provider with version verification.

[model args implementation]

06d8972

[UKGovernmentBEIS#7 remove exception logging]

0f15aea

last stable state

68ddb3b

Fix type hint mismatch in GoodfireAPI: update variant assignment to u…

0796eac

…se runtime-safe string. Add note and TODO for potential issue in Goodfire's repo.

menhguin added 7 commits January 25, 2025 15:55

Refine GoodfireAPI parameter handling and type annotations

c34efd6

- Update type hints for model arguments and parameters - Improve parameter configuration in generate method - Simplify base model selection and parameter passing - Enhance code readability and type consistency

Merge branch 'main' into main

b07dd8b

minor docstring update

7ecbdb9

Improve GoodfireAPI parameter configuration with default handling

6bcfd18

- Add default values for temperature and top_p when not specified in model arguments - Prioritize model_args over config parameters - Ensure consistent parameter configuration when generating completions

Merge branch 'main' into main

133bcf8

jjallaire requested changes Jan 25, 2025

View reviewed changes

menhguin added 9 commits January 26, 2025 17:32

Refine Goodfire error handling for rate limit exceptions

570cc8a

- Remove redundant rate limit exception handling in handle_error method - Simplify import of Goodfire API exceptions - Maintain existing InvalidRequestException handling

Refactor GoodfireAPI type annotations and error handling

157dbb2

- Remove unnecessary type imports - Simplify type casting in generate method - Update type hints for parameters - Remove redundant type casting for return values

Simplify Goodfire provider import and error handling

2acb8aa

- Remove unused logging import - Refactor Goodfire provider initialization to improve error handling - Streamline package import and version verification - Remove unnecessary logger definition

Merge branch 'main' into main

51c0054

Add test for Goodfire provider API

cc6842f

- Create new test file for Goodfire model provider - Add skip decorator for Goodfire API key requirement - Implement basic test for model generation with sample configuration - Verify response generation for Goodfire model

Cleanup and refactor Goodfire provider imports and code structure

9e7df44

- Remove unnecessary imports and unused variables - Simplify error handling and type conversion logic - Streamline code by removing commented-out and redundant code - Update providers.py to remove unused import

Merge branch 'main' of https://github.com/menhguin/inspect_ai

e0cd50d

Merge branch 'UKGovernmentBEIS:main' into main

8998d93

menhguin added 4 commits January 26, 2025 21:46

Merge branch 'UKGovernmentBEIS:main' into main

3da2218

Update Goodfire provider to enforce minimum package version

b5bc243

- Add version check for goodfire package (minimum 0.3.4) - Modify provider function to include package verification - Remove version constraint in pyproject.toml - Rename get_goodfire() to goodfire() for consistency

noe explaining min_version

4315d5e

Merge branch 'main' of https://github.com/menhguin/inspect_ai

40bebd5

menhguin added 2 commits January 26, 2025 22:19

Simplify Goodfire provider import and error handling

6dc57c0

- Remove unused logging import - Refactor Goodfire provider initialization to improve error handling - Streamline package import and version verification - Remove unnecessary logger definition

+1 newline to show 0 deletions

c947acd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Goodfire API Provider Support #1161

Add Goodfire API Provider Support #1161

menhguin commented Jan 20, 2025 •

edited

Loading

menhguin commented Jan 20, 2025 •

edited

Loading

jjallaire left a comment

jjallaire commented Jan 21, 2025

menhguin commented Jan 21, 2025

jjallaire commented Jan 21, 2025

menhguin commented Jan 25, 2025 •

edited

Loading

jjallaire commented Jan 25, 2025

jjallaire left a comment

jjallaire Jan 25, 2025

menhguin Jan 26, 2025

jjallaire Jan 25, 2025

menhguin Jan 26, 2025

jjallaire Jan 25, 2025

menhguin Jan 26, 2025

menhguin commented Jan 26, 2025

jjallaire commented Jan 26, 2025

menhguin commented Jan 26, 2025

		@@ -239,6 +243,21 @@ def mockllm() -> type[ModelAPI]:
		return MockLLM


		@modelapi(name="goodfire")

Add Goodfire API Provider Support #1161

Are you sure you want to change the base?

Add Goodfire API Provider Support #1161

Conversation

menhguin commented Jan 20, 2025 • edited Loading

Add Goodfire API Provider Support

Overview

Critical Implementation Details

Required Configuration

Pending Improvements (Prioritized)

Testing Status

Breaking Changes

Conclusion

menhguin commented Jan 20, 2025 • edited Loading

jjallaire left a comment

Choose a reason for hiding this comment

jjallaire commented Jan 21, 2025

menhguin commented Jan 21, 2025

jjallaire commented Jan 21, 2025

menhguin commented Jan 25, 2025 • edited Loading

jjallaire commented Jan 25, 2025

jjallaire left a comment

Choose a reason for hiding this comment

jjallaire Jan 25, 2025

Choose a reason for hiding this comment

menhguin Jan 26, 2025

Choose a reason for hiding this comment

jjallaire Jan 25, 2025

Choose a reason for hiding this comment

menhguin Jan 26, 2025

Choose a reason for hiding this comment

jjallaire Jan 25, 2025

Choose a reason for hiding this comment

menhguin Jan 26, 2025

Choose a reason for hiding this comment

menhguin commented Jan 26, 2025

jjallaire commented Jan 26, 2025

menhguin commented Jan 26, 2025

menhguin commented Jan 20, 2025 •

edited

Loading

menhguin commented Jan 20, 2025 •

edited

Loading

menhguin commented Jan 25, 2025 •

edited

Loading