-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TransformersChat code to figure out correct role start and end tokens #791
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #791 +/- ##
==========================================
+ Coverage 55.37% 62.13% +6.75%
==========================================
Files 55 55
Lines 4074 4088 +14
==========================================
+ Hits 2256 2540 +284
+ Misses 1818 1548 -270 ☔ View full report in Codecov by Sentry. |
@@ -1,5 +1,7 @@ | |||
import os | |||
import re | |||
import uuid | |||
import jinja2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need jinja2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do -- the chat templates are written in jinja2 as a standard
Thanks @ilmarinen, I think this looks good. I'd like to use this as a basis for eventually minimizing
|
Sounds good to me @Harsha-Nori . Do you want to merge this before making more changes? If so we should add a basic unit test first |
I think we can skip merging this and get an initial release out first, and then I'll pull this in as part of my broader proposal of changes for the subsequent release. We'll get some tests added in that phase. |
…emplate API, fix Anthropic API (#820) This PR should significantly reduce the number of user-facing classes we have in Guidance, and reduce subtle bugs introduced by using a mis-specified Chat Template (models currently silently default to the ChatML syntax, which many of the latest models don't adhere to). It should also make it easier for users to add new models to guidance, either via PR or in their own codebases. Before: ```python from guidance.models.transformers import Transformers, TransformersChat class Llama(Transformers): pass # Users have to do this for most chat models in guidance class LlamaChat(TransformersChat, Llama): def get_role_start(self, role_name, **kwargs): if role_name == "system": return self._system_prefex + "<<SYS>>\n" elif role_name == "user": if str(self).endswith("\n<</SYS>>\n\n"): return "" else: return "[INST] " else: return " " def get_role_end(self, role_name=None): if role_name == "system": return "\n<</SYS>>\n\n" elif role_name == "user": return " [/INST]" else: return " " lm = LlamaChat(path_to_llama) ``` After: ```python from guidance.models import Transformers lm = Transformers(path_to_llama) # automagically works ``` If you're using a rare model and the auto import doesn't automatically work... After pt2: ```python # users can copy paste from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer_config.json#L12 llama2_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}" lm = Transformers(path_to_llama, chat_template=llama2_template) ``` or, in the worst case for maximal robustness and customizability: ```python from guidance._chat import ChatTemplate, UnsupportedRoleException class Llama2ChatTemplate(ChatTemplate): template_str = llama2_template def get_role_start(self, role_name): if role_name == "system": return "[INST] <<SYS>>\n" elif role_name == "user": return "<s>[INST]" elif role_name == "assistant": return " " else: raise UnsupportedRoleException(role_name, self) def get_role_end(self, role_name=None): if role_name == "system": return "\n<</SYS>" elif role_name == "user": return " [/INST]" elif role_name == "assistant": return "</s>" else: raise UnsupportedRoleException(role_name, self) lm = Transformers(path_to_llama, chat_template=Llama2ChatTemplate) ``` The first big change is the removal of the `Chat` and `Instruct` mixins, and an introduction of a new `guidance._chat.ChatTemplate` class, which handles the same responsibilities as those mixins used to. Users can construct a subclass of `ChatTemplate` and pass it to models with a new `chat_template` argument (defaulted to None). The way this works for local models is to leverage the `chat_template` property in `huggingface transformers` and `llamacpp`'s GGUF files. When a user tries to load a model, guidance now follows the following order of operations: 1. See if the user passed in a `ChatTemplate` -- if so, use that directly. 2. If the user passed string "chat_template", set that as a "template_str". If the user did not pass in anything, set the "template_str" based on metadata from the huggingface.AutoTokenizer or gguf metadata fields. 3. Check the `template_str` against a local cache in guidance which maintains template converters for the most popular models on huggingface/in llama.cpp. We index this cache based on the actual chat_template string, so any model that uses one of these chat templates -- even if it isn't explicitly listed in the documentation -- will automatically load the right guidance class. 4. If we don't have anything in the cache, try to automatically convert the jinja2 template into the new guidance._chat.ChatTemplate syntax. Warn the user if we attempt this. [NOTE: This is not yet implemented and may come in a future PR.] 5. Default to the `ChatML` syntax, with a warning to the user. Currently this PR updates the following user facing `guidance.Models` classes: - Transformers (removing TransformersChat and the Llama/Mistral subclasses) - LlamaCpp (removing LlamaCppChat and Mistral subclasses) - Anthropic For now, `Anthropic` should be representative of how grammarless classes will work. I wanted to start with OpenAI, but many other guidance.models classes inherit from OpenAI, so I'm saving that for later. Also while I was at it I upgraded the `Anthropic` class to use their latest SDK, so `guidance.models.Anthropic` should now work with the latest Claude3 models. # TODO A decent amount left to do here. In no particular order... 1. Unrelated to this change, but guidance cannot properly handle `llama3` or `phi-3`'s true tokenizers/chat templates. We need to fix that independently. 2. Add better warnings to users when we fall back to using the ChatML template. 3. Extend support through the rest of the guidance.models classes (mostly just remote ones left, starting with OpenAI). 4. Write out more templates for popular models in the ChatTemplateCache, and also add an alias system so that we can look up models in the cache by common names (e.g. "llama3"). 5. Add a deprecation warning to people trying to use very old models on `Anthropic`. 6. Much more testing and documentation. We should, for example, add documentation on how to import/initialize a new ChatTemplate and use it for your own models. 7. Write the auto-converter from huggingface `jinja2` to guidance ChatTemplate. A battery of unit tests here that compare against the original `transformers.apply_chat_template` method would make this more robust. Can be in a future PR as this is complex logic. A start to this was attempted in #791 by @ilmarinen, and we could eventually pull this in and expand its coverage. 8. Probably get rid of the folders in `guidance.models.llama_cpp` and `guidance.models.transformers` because we don't need to maintain a bunch of subclasses for them anymore. Would appreciate any and all feedback, particularly on the logical flow and new user facing (simpler) API. @marcotcr @paulbkoch @slundberg @riedgar-ms @hudson-ai --------- Co-authored-by: Richard Edgar (Microsoft) <[email protected]> Co-authored-by: Paul Koch <[email protected]>
No description provided.