LLM Guard is a security toolkit for LLMs.
It has input and output scanners offering sanitization, harmful language detection, data leakage prevention, guard against prompt injections and more to make LLMs interactions safe and secure.
In this tutorial, you'll learn how to use LLM Guard with Vertex AI.
Input scanners allow you to scan and sanitize user prompts.
For example, you can detect code in user prompts:
def ban_code_input():
prompt = "System.out.println('Hello World')"
scanner = input_scanners.BanCode()
prompt, is_valid, risk_score = scanner.scan(prompt)
print_results(is_valid, prompt, risk_score)
python main.py ban_code_input
Detected code in the text score=1.0 text=System.out.println('Hello World') threshold=0.97
Input/output: System.out.println('Hello World')
Valid? False
Risk score: 1.0
Or you can ban certain topics in user prompts:
def ban_topics_input():
prompt = "Keir Starmer is the prime minister of the UK"
scanner = input_scanners.BanTopics(topics=["politics"], threshold=0.5)
prompt, is_valid, risk_score = scanner.scan(prompt)
print_results(is_valid, prompt, risk_score)
python main.py ban_topics_input
Topics detected for the prompt scores={'politics': 0.9804285764694214}
Input/output: Keir Starmer is the prime minister of the UK
Valid? False
Risk score: 1.0
You can see more examples in main.py.
Output scanners allow you to scan and sanitize model outputs.
For example, you can detect gibberish in model outputs as follows:
def gibberish_output():
from llm_guard.input_scanners.gibberish import MatchType
model_output = "abcasd asdkhasd asdasd"
scanner = output_scanners.Gibberish(match_type=MatchType.FULL)
model_output, is_valid, risk_score = scanner.scan("", model_output)
print_results(is_valid, model_output, risk_score)
python main.py gibberish_output
Detected gibberish text score=1.0 threshold=0.7
Input/output: abcasd asdkhasd asdasd
Valid? False
Risk score: 1.0
Or detect if the output is in a certain language:
def language_output():
from llm_guard.input_scanners.gibberish import MatchType
model_output = "This is some text in English"
scanner = output_scanners.Language(valid_languages=["fr"], match_type=MatchType.FULL)
model_output, is_valid, risk_score = scanner.scan("", model_output)
print_results(is_valid, model_output, risk_score)
python main.py language_output
Languages are found with high confidence languages=['en']
Input/output: This is some text in English
Valid? False
Risk score: 1.0
You can see more examples in main.py.
Let's take another example. Sometimes, it's a good idea to anonymize personal data before sending to the LLM.
Take this prompt with a number of personal data:
prompt = ("Make an SQL insert statement to add a new user to our database. Name is John Doe. Email is "
"[email protected] but also possible to contact him with [email protected] email. Phone number is 555-123-4567 "
"and the IP address is 192.168.1.100. And credit card number is 4567-8901-2345-6789. " +
"He works in Test LLC. " +
"Only return the SQL statement and nothing else")
You can anonymize it like this:
vault = Vault()
input_scanners = [Anonymize(vault)]
sanitized_prompt, results_valid, results_score = scan_prompt(input_scanners, prompt)
You get back this sanitized prompt:
Sanitized prompt: Make an SQL insert statement to add a new user to our database. Name is [REDACTED_PERSON_1] Doe.
Email is [REDACTED_EMAIL_ADDRESS_1] but also possible to contact him with [REDACTED_EMAIL_ADDRESS_2] email.
Phone number is [REDACTED_PHONE_NUMBER_1] and the IP address is [REDACTED_IP_ADDRESS_1].
And credit card number is [REDACTED_CREDIT_CARD_RE_1]. He works in Test LLC. Only return the SQL statement and nothing else
Then, you pass this back to Vertex AI:
model = GenerativeModel('gemini-1.5-flash-001')
response = model.generate_content(sanitized_prompt)
And the response you get back is as follows:
Response text: ```sql
INSERT INTO users (name, email, alternative_email, phone, ip_address, credit_card, company) VALUES ('[REDACTED_PERSON_1] Doe',
'[REDACTED_EMAIL_ADDRESS_1]', '[REDACTED_EMAIL_ADDRESS_2]', '[REDACTED_PHONE_NUMBER_1]', '[REDACTED_IP_ADDRESS_1]',
'[REDACTED_CREDIT_CARD_RE_1]', 'Test LLC');
You then feed this back to the deanonymizer:
output_scanners = [Deanonymize(vault)]
sanitized_response_text, results_valid, results_score = scan_output(
output_scanners, sanitized_prompt, response.text
)
And you get back the deanonymized output:
Sanitized output: ```sql
INSERT INTO users (name, email, alternative_email, phone, ip_address, credit_card, company)
VALUES ('John Doe', '[email protected]', '[email protected]', '555-123-4567', '192.168.1.100', '4567-8901-2345-6789', 'Test LLC');
You can see the full example in anonymize_vertexai.py
You can also chain multiple input and output scanners as follows:
input_scanners = [Anonymize(vault), Toxicity(), TokenLimit(), PromptInjection()]
output_scanners = [Deanonymize(vault), NoRefusal(), Relevance(), Sensitive()]
You can see the full example in multiple_vertexai.py.