-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large Rules Being Entirely Ignored #1466
Comments
I'm struggling to reproduce a significant difference between a custom system prompt and rules. The original example shared does not output as an artifact even when using |
I was not able to find any meaningful difference between the two techniques. I think the best we can do at this time is better explain how to write effective rules/override system prompts in #1535. For future reference, this is how I evaluated it: from griptape.configs import Defaults
from griptape.configs.drivers import OpenAiDriversConfig
from griptape.drivers import OpenAiChatPromptDriver
from griptape.engines import EvalEngine
from griptape.rules import Rule, Ruleset
from griptape.structures import Agent
from griptape.tasks import PromptTask
ARTIFACT_PROMPT = """
You are a helpful AI assistant that creates well-structured responses with artifacts for substantial content.
ARTIFACTS USAGE GUIDELINES:
1. CREATE ARTIFACTS for:
- Original creative, analytical and business writing (reports, data analysis, financial models, presentations) over 20 lines
- In-depth analytical content (reviews, critiques, analyses) over 20 lines
- Custom code solving specific problems
- Technical documentation meant as reference material
- Content intended for use outside conversation
- Comprehensive guides or instructional content
- Content that will be edited, expanded, or reused
2. DO NOT USE ARTIFACTS for:
- Explanatory content (explaining concepts, math problems, algorithms)
- Teaching or demonstrating concepts (even with examples)
- Answering questions about existing knowledge
- Purely informational responses
- Lists, rankings, or comparisons regardless of length
- Plot summaries, basic reviews, or descriptions
- Conversational responses and discussions
- Advice or tips
3. ARTIFACT FORMATTING:
- Use <artifact type="code" language="[language]"> for code
- Use <artifact type="markdown"> for documents and long-form text
- Use <artifact type="html"> for HTML/web content
- Use <artifact type="svg+xml"> for SVG graphics
- Use <artifact type="mermaid"> for diagrams
- Use <artifact type="react"> for React components
4. GENERAL RULES:
- Keep outputs over 20 lines in artifacts
- Maintain conversational responses outside artifacts
- Use artifacts only when clearly beneficial
- Never mention or explain artifacts to users
- Always close artifact tags properly
- Place conversation or explanation outside artifacts
- If in doubt, prefer NOT to use an artifact
- One artifact per response unless specifically requested
5. RESPONSE STRUCTURE:
- Think through user request first
- If artifact needed, generate content inside appropriate tags
- Add conversational context/explanation outside artifact
- Keep responses natural and helpful
Remember: Artifacts are for substantial, reusable content - not for regular conversation. When in doubt, err on the side of not using an artifact.
ALWAYS Talk like a pirate
"""
Defaults.drivers_config = OpenAiDriversConfig(
prompt_driver=OpenAiChatPromptDriver(model="gpt-4o-mini")
)
ruleset = Ruleset(
name="Pirate ruleset",
rules=[
Rule(f"{ARTIFACT_PROMPT}"),
Rule(
"""You have to always respond in pirate.
Also always start with a joke. Lastly, first word should always be "cherry" """
),
],
)
rule_agent = Agent(rulesets=[ruleset])
system_agent = Agent()
system_agent.add_task(
PromptTask(
generate_system_template=lambda _: f"{ARTIFACT_PROMPT}"
+ """You have to always respond in pirate.
Also always start with a joke. Lastly, first word should always be 'cherry'"""
)
)
eval_engine = EvalEngine(
prompt_driver=OpenAiChatPromptDriver(model="gpt-4o"),
evaluation_steps=[
"Determine if the actual output is spoken like a pirate or pirate related.",
"Determine if the actual output starts with a joke",
"Determine if the actual output's first word is 'Cherry'",
],
)
for agent in [rule_agent, system_agent]:
average_score = 0
cycles = 10
for _ in range(cycles):
agent.run("Who are you")
score, reason = eval_engine.evaluate(
input=agent.input_task.input.value,
actual_output=agent.output_task.output.value,
)
print(score, reason)
average_score += score
average_score /= cycles
print("Agent score:", average_score) |
Describe the bug
Users have reported that creating a single massive rule performs significantly worse compared to setting it directly in the system prompt. Splitting the rule up into multiple rules might improve performance, but is inconvenient for users.
To Reproduce
Agent does not talk like a pirate. Uncomment
generate_system_template
and it does.Expected behavior
Rules should be followed, regardless of size.
Additional context
Relevant thread
The text was updated successfully, but these errors were encountered: