Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output detection for chat completions #288

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

resoluteCoder
Copy link
Collaborator

@resoluteCoder resoluteCoder commented Jan 30, 2025

Added output detection for chat completions
Design choices

  • added output warning so that the user can have the same experience with input. As well as having a way to scan the warnings if there were some detections from the output.
  • created DetectionResult struct to allow for re-use of similar logic such as message_detection.
  • removed the message_index var in filter_chat_messages to allow for the output detection choice_index to match the choices array in the response

Example 1

input request

 "messages": [
    {
      "content": "[email protected]",
      "role": "system",
      "name": "string"
    },
    {
      "content": "[email protected]",
      "role": "system",
      "name": "string"
    }
  ]

input response

{
  "id": "1a4a422085da4849b98b40876103a309",
  "object": "",
  "created": 1738705014,
  "model": "Qwen/Qwen2.5-1.5B-Instruct",
  "choices": [],
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0,
    "completion_tokens": 0
  },
  "detections": {
    "input": [
      {
        "message_index": 1,
        "results": [
          {
            "start": 0,
            "end": 20,
            "text": "[email protected]",
            "detection": "EmailAddress",
            "detection_type": "pii",
            "detector_id": "regex",
            "score": 1.0
          }
        ]
      }
    ]
  },
  "warnings": [
    {
      "type": "UNSUITABLE_INPUT",
      "message": "Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed."
    }
  ]
}

Example 2

output request

"detectors": {
    "input": {
      "regex": {
        "regex": ["email"]
      }
    },
    "output": {
      "regex": {
        "regex": ["ssn"]
      }
    }
  },
  "n": 3,
  "text_gen_parameters": {
    "max_new_tokens": 25
  },
  "messages": [
    {
      "content": "can you give me an example of a social security number?",
      "role": "system",
      "name": "string"
    }
  ]

output response

{
  "id": "a2e06458b58a4a58add30f3b8e6e2beb",
  "object": "",
  "created": 1738705212,
  "model": "Qwen/Qwen2.5-1.5B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure! Here is an example of a Social Security Number in numerical form:\n\n123-45-6789\n\nNote that Social Security Numbers are typically 9 digits long."
      },
      "logprobs": null,
      "finish_reason": "stop"
    },
    {
      "index": 1,
      "message": {
        "role": "assistant",
        "content": "Certainly! A Social Security Number (SSN) typically consists of nine digits. An example would be:\n\n123-45-6789\n\nThis is a typical layout, and the actual digits could vary. Social Security Numbers are used to identify individuals and are unique to each person."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "total_tokens": 122,
    "completion_tokens": 102
  },
  "detections": {
    "output": [
      {
        "choice_index": 0,
        "results": [
          {
            "start": 73,
            "end": 84,
            "text": "123-45-6789",
            "detection": "SocialSecurity",
            "detection_type": "pii",
            "detector_id": "regex",
            "score": 1.0
          }
        ]
      },
      {
        "choice_index": 1,
        "results": [
          {
            "start": 99,
            "end": 110,
            "text": "123-45-6789",
            "detection": "SocialSecurity",
            "detection_type": "pii",
            "detector_id": "regex",
            "score": 1.0
          }
        ]
      }
    ]
  },
  "warnings": [
    {
      "type": "UNSUITABLE_OUTPUT",
      "message": "Unsuitable output detected."
    }
  ]
}

@resoluteCoder resoluteCoder marked this pull request as ready for review February 4, 2025 22:41
@evaline-ju evaline-ju linked an issue Feb 5, 2025 that may be closed by this pull request
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement content detectors on unary chat output
1 participant