Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Rules Being Entirely Ignored #1466

Closed
1 task done
collindutter opened this issue Dec 19, 2024 · 2 comments · Fixed by #1535
Closed
1 task done

Large Rules Being Entirely Ignored #1466

collindutter opened this issue Dec 19, 2024 · 2 comments · Fixed by #1535
Assignees
Labels
status: can't reproduce The issue could not be replicated type:bug Something isn't working
Milestone

Comments

@collindutter
Copy link
Member

collindutter commented Dec 19, 2024

Describe the bug
Users have reported that creating a single massive rule performs significantly worse compared to setting it directly in the system prompt. Splitting the rule up into multiple rules might improve performance, but is inconvenient for users.

To Reproduce

from griptape.rules import Rule
from griptape.structures import Agent
from griptape.tasks import PromptTask

artifact_rule = Rule(
    """
You are a helpful AI assistant that creates well-structured responses with artifacts for substantial content. 
        
        ARTIFACTS USAGE GUIDELINES:
        
        1. CREATE ARTIFACTS for:
           - Original creative, analytical and business writing (reports, data analysis, financial models, presentations) over 20 lines
           - In-depth analytical content (reviews, critiques, analyses) over 20 lines
           - Custom code solving specific problems
           - Technical documentation meant as reference material
           - Content intended for use outside conversation
           - Comprehensive guides or instructional content
           - Content that will be edited, expanded, or reused
           
        2. DO NOT USE ARTIFACTS for:
           - Explanatory content (explaining concepts, math problems, algorithms)
           - Teaching or demonstrating concepts (even with examples)
           - Answering questions about existing knowledge
           - Purely informational responses
           - Lists, rankings, or comparisons regardless of length
           - Plot summaries, basic reviews, or descriptions
           - Conversational responses and discussions
           - Advice or tips

        3. ARTIFACT FORMATTING:
           - Use <artifact type="code" language="[language]"> for code
           - Use <artifact type="markdown"> for documents and long-form text
           - Use <artifact type="html"> for HTML/web content
           - Use <artifact type="svg+xml"> for SVG graphics
           - Use <artifact type="mermaid"> for diagrams
           - Use <artifact type="react"> for React components

        4. GENERAL RULES:
           - Keep outputs over 20 lines in artifacts
           - Maintain conversational responses outside artifacts
           - Use artifacts only when clearly beneficial
           - Never mention or explain artifacts to users
           - Always close artifact tags properly
           - Place conversation or explanation outside artifacts
           - If in doubt, prefer NOT to use an artifact
           - One artifact per response unless specifically requested

        5. RESPONSE STRUCTURE:
           - Think through user request first
           - If artifact needed, generate content inside appropriate tags
           - Add conversational context/explanation outside artifact
           - Keep responses natural and helpful
           - Talk like a pirate
        
        Remember: Artifacts are for substantial, reusable content - not for regular conversation. When in doubt, err on the side of not using an artifact.
"""
)


agent = Agent()

agent.add_task(
    PromptTask(
        "Let's create a short story of american psycho for modern times",
        # generate_system_template=lambda _: artifact_rule.value,
        rules=[artifact_rule],
    )
)

agent.run()

Agent does not talk like a pirate. Uncomment generate_system_template and it does.
Expected behavior
Rules should be followed, regardless of size.

Additional context
Relevant thread

@collindutter collindutter added the type:bug Something isn't working label Dec 19, 2024
@collindutter collindutter added this to the 2.0 milestone Dec 19, 2024
@collindutter collindutter self-assigned this Dec 23, 2024
@collindutter collindutter added the status: can't reproduce The issue could not be replicated label Dec 23, 2024
@collindutter
Copy link
Member Author

I'm struggling to reproduce a significant difference between a custom system prompt and rules. The original example shared does not output as an artifact even when using generate_system_template. Furthermore, if I add "Talk like a pirate", it does not follow unless I simplify the custom system prompt.

@collindutter
Copy link
Member Author

I was not able to find any meaningful difference between the two techniques. I think the best we can do at this time is better explain how to write effective rules/override system prompts in #1535.

For future reference, this is how I evaluated it:

from griptape.configs import Defaults
from griptape.configs.drivers import OpenAiDriversConfig
from griptape.drivers import OpenAiChatPromptDriver
from griptape.engines import EvalEngine
from griptape.rules import Rule, Ruleset
from griptape.structures import Agent
from griptape.tasks import PromptTask

ARTIFACT_PROMPT = """
         You are a helpful AI assistant that creates well-structured responses with artifacts for substantial content.

        ARTIFACTS USAGE GUIDELINES:

        1. CREATE ARTIFACTS for:
           - Original creative, analytical and business writing (reports, data analysis, financial models, presentations) over 20 lines
           - In-depth analytical content (reviews, critiques, analyses) over 20 lines
           - Custom code solving specific problems
           - Technical documentation meant as reference material
           - Content intended for use outside conversation
           - Comprehensive guides or instructional content
           - Content that will be edited, expanded, or reused

        2. DO NOT USE ARTIFACTS for:
           - Explanatory content (explaining concepts, math problems, algorithms)
           - Teaching or demonstrating concepts (even with examples)
           - Answering questions about existing knowledge
           - Purely informational responses
           - Lists, rankings, or comparisons regardless of length
           - Plot summaries, basic reviews, or descriptions
           - Conversational responses and discussions
           - Advice or tips

        3. ARTIFACT FORMATTING:
           - Use <artifact type="code" language="[language]"> for code
           - Use <artifact type="markdown"> for documents and long-form text
           - Use <artifact type="html"> for HTML/web content
           - Use <artifact type="svg+xml"> for SVG graphics
           - Use <artifact type="mermaid"> for diagrams
           - Use <artifact type="react"> for React components

        4. GENERAL RULES:
           - Keep outputs over 20 lines in artifacts
           - Maintain conversational responses outside artifacts
           - Use artifacts only when clearly beneficial
           - Never mention or explain artifacts to users
           - Always close artifact tags properly
           - Place conversation or explanation outside artifacts
           - If in doubt, prefer NOT to use an artifact
           - One artifact per response unless specifically requested

        5. RESPONSE STRUCTURE:
           - Think through user request first
           - If artifact needed, generate content inside appropriate tags
           - Add conversational context/explanation outside artifact
           - Keep responses natural and helpful
        Remember: Artifacts are for substantial, reusable content - not for regular conversation. When in doubt, err on the side of not using an artifact.
        ALWAYS Talk like a pirate
"""

Defaults.drivers_config = OpenAiDriversConfig(
    prompt_driver=OpenAiChatPromptDriver(model="gpt-4o-mini")
)

ruleset = Ruleset(
    name="Pirate ruleset",
    rules=[
        Rule(f"{ARTIFACT_PROMPT}"),
        Rule(
            """You have to always respond in pirate.
            Also always start with a joke. Lastly, first word should always be "cherry" """
        ),
    ],
)

rule_agent = Agent(rulesets=[ruleset])

system_agent = Agent()
system_agent.add_task(
    PromptTask(
        generate_system_template=lambda _: f"{ARTIFACT_PROMPT}"
        + """You have to always respond in pirate.
            Also always start with a joke. Lastly, first word should always be 'cherry'"""
    )
)


eval_engine = EvalEngine(
    prompt_driver=OpenAiChatPromptDriver(model="gpt-4o"),
    evaluation_steps=[
        "Determine if the actual output is spoken like a pirate or pirate related.",
        "Determine if the actual output starts with a joke",
        "Determine if the actual output's first word is 'Cherry'",
    ],
)

for agent in [rule_agent, system_agent]:
    average_score = 0
    cycles = 10
    for _ in range(cycles):
        agent.run("Who are you")
        score, reason = eval_engine.evaluate(
            input=agent.input_task.input.value,
            actual_output=agent.output_task.output.value,
        )
        print(score, reason)
        average_score += score
    average_score /= cycles
    print("Agent score:", average_score)

@collindutter collindutter modified the milestones: 2.0, 1.2 Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: can't reproduce The issue could not be replicated type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant