You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The scanning is failing. I want get the report of the scanning.
Standalone code OR list down the steps to reproduce the issue
The scanning is failing.
import sys
sys.stdout.reconfigure(encoding='utf-8')
import os
import json
import pandas as pd
import giskard as gsk
from openai import AzureOpenAI
AZURE_OPENAI_API_KEY='xxxxx'
AZURE_OPENAI_ENDPOINT='https://xxxxxxx.openai.azure.com'
AZURE_OPENAI_DEPLOYMENT_NAME='xxxx'
AZURE_OPENAI_API_VERSION="xxxxx"
os.environ["AZURE_OPENAI_API_KEY"] = AZURE_OPENAI_API_KEY
os.environ["AZURE_OPENAI_ENDPOINT"] = AZURE_OPENAI_ENDPOINT
os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"] = AZURE_OPENAI_DEPLOYMENT_NAME
os.environ["AZURE_OPENAI_API_VERSION"] = AZURE_OPENAI_API_VERSION
os.environ["AZURE_API_KEY"] = AZURE_OPENAI_API_KEY
os.environ["AZURE_API_BASE"] = AZURE_OPENAI_ENDPOINT
os.environ["AZURE_API_VERSION"] = AZURE_OPENAI_API_VERSION
gsk.llm.set_llm_model(AZURE_OPENAI_DEPLOYMENT_NAME)
PROMPT_TEMPLATE = """Answer the question based only on the following context:{context}---Answer the question based on the above context: {question}"""
def ask_bot(question):
# ....
context = 'xxxxx'
prompt = PROMPT_TEMPLATE.format(context=context, question=question)
# ....
client = AzureOpenAI(
api_key=AZURE_OPENAI_API_KEY,
api_version=AZURE_OPENAI_API_VERSION,
azure_endpoint =AZURE_OPENAI_ENDPOINT
)
# ....
response = client.chat.completions.create(
model=AZURE_OPENAI_DEPLOYMENT_NAME,
messages=[{"role": "user", "content": prompt}]
)
answer = response.choices[0].message.content
return answer
def llm_wrap_fn(df: pd.DataFrame):
outputs = []
forquestionin df['question']:
answer = ask_bot(question)
outputs.append(answer)
return outputs
model = gsk.Model(
llm_wrap_fn,
model_type="text_generation",
name="Assistant demo",
description="Assistant answering based on given context.",
feature_names=["question"],
)
examples = pd.DataFrame(
{
"question": [
"Do you offer company expense cards?",
"What are the monthly fees for a business account?",
]
}
)
demo_dataset = gsk.Dataset(
examples,
name="ZephyrBank Customer Assistant Demo Dataset",
target=None
)
try:
x = model.predict(demo_dataset).prediction
print(json.dumps(x.tolist(), indent=4))
except Exception as error:
print('-- error --')
print(error)
exit(0)
print(f"Dataset size: {len(demo_dataset)}")
# print(f"Dataset preview: {demo_dataset[0]}") # Preview the first 5 items
print(f"Model type: {type(model)}")
print(f"Model: {model}")
report = ''
try:
report = gsk.scan(
model,
demo_dataset,
only="jailbreak",
raise_exceptions=True,
)
except Exception as error:
print('-- scan error --')
print(error)
exit(0)
try:
# display(report)
report.to_html("scan_report.html")
except Exception as error:
print('-- report.to_html error --')
print(error)
Relevant log output
2025-01-09 11:13:04,578 pid:44752 MainThread giskard.models.automodel INFO Your 'prediction_function' is successfully wrapped by Giskard's 'PredictionFunctionModel' wrapper class.2025-01-09 11:13:04,582 pid:44752 MainThread giskard.datasets.base INFO Your 'pandas.DataFrame' is successfully wrapped by Giskard's 'Dataset' wrapper class.
2025-01-09 11:13:04,585 pid:44752 MainThread giskard.datasets.base INFO Casting dataframe columns from {'question': 'object'} to {'question': 'object'}
2025-01-09 11:13:06,110 pid:44752 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (2, 1) executed in 0:00:01.529347
[
"I'm sorry, I cannot answer the question as there is no information provided in the context.",
"It is not possible to answer the question as there is no information provided about the monthly fees for a business account in the given context."
]
Dataset size: 2
Model type: <class 'giskard.models.function.PredictionFunctionModel'>
Model: Assistant demo(bcf134a2-0e1a-4b5d-82e2-6fa5a426362f)
2025-01-09 11:13:07,099 pid:44752 MainThread httpx INFO HTTP Request: GET https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json "HTTP/1.1 200 OK"
🔎 Running scan…
Estimated calls to your model: ~35
Estimated LLM calls for evaluation: 0
2025-01-09 11:13:08,908 pid:44752 MainThread giskard.scanner.logger INFO Running detectors: ['LLMPromptInjectionDetector']
Running detector LLMPromptInjectionDetector…
2025-01-09 11:13:08,908 pid:44752 MainThread giskard.datasets.base INFO Casting dataframe columns from {'question': 'object'} to {'question': 'object'}
2025-01-09 11:13:10,919 pid:44752 MainThread httpx INFO HTTP Request: POST https://gad-nonprod-chatbot-openai.openai.azure.com/openai/deployments/Chatbot-NP-OAI/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2025-01-09 11:13:10,919 pid:44752 MainThread giskard.scanner.logger ERROR Detector LLMPromptInjectionDetector failed with error: 'charmap' codec can't encode character '\U0001f512' in position 1351: character maps to <undefined>Traceback (most recent call last): File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\giskard\scanner\scanner.py", line 162, in _run_detectors detected_issues = detector.run(model, dataset, features=features) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\giskard\scanner\llm\llm_prompt_injection_detector.py", line 59, in run evaluation_results = evaluator.evaluate(model, group_dataset, evaluator_configs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\giskard\llm\evaluators\string_matcher.py", line 58, in evaluate model_outputs = model.predict(dataset).prediction ^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\giskard\models\base\model.py", line 376, in predict raw_prediction = self._predict_from_cache(dataset) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\giskard\models\base\model.py", line 430, in _predict_from_cache raw_prediction = self.predict_df(unpredicted_df) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\pydantic\_internal\_validate_call.py", line 38, in wrapper_function return wrapper(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\pydantic\_internal\_validate_call.py", line 111, in __call__ res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\giskard\models\base\wrapper.py", line 131, in predict_df output = self.model_predict(batch) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\venv11\Lib\site-packages\giskard\models\function.py", line 40, in model_predict return self.model(df) ^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\1_test.py", line 36, in llm_wrap_fn answer = simpleBot.ask_bot(question) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\python-sandbox\14-pro-giskard-test-llm\simpleBot.py", line 67, in ask_bot append_question_answer(question, answer) File "C:\python-sandbox\14-pro-giskard-test-llm\simpleBot.py", line 43, in append_question_answer myfile.write(f'\nQuestion: {q}\n') File "C:\Pythons\python10\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f512'in position 1351: character maps to <undefined>
-- scan error --
'charmap' codec can't encode character '\U0001f512' in position 1351: character maps to <undefined>
The text was updated successfully, but these errors were encountered:
@SalAlba from your output log it seems you are running on Windows OS (even though you have mentioned linux for this issue) and the write statement that is failing has called encodings\cp1252.py resulting in the charmap encoding error message. This encoding is specific to Windows.
You have hit the same issue as has been raised under Issue #2083 which is applicable when running on Windows OS. To fix your issue:
In your simpleBot.py file you will need to add specifically to your open statement the argument encoding="utf-8" when you are creating myfile object.
Code changes have already been made to Giskard so your call to report.to_html("scan_report.html") should no longer hit the same issues AFTER the fix is released. Unfortunately latest version 2.16.0 didn't originally have the updated code so changes won't get pulled through if you try to update. It will only come through in a version >2.16.0. In the meantime, a more robust solution for you is to add PYTHONUTF8=1 as an environment variable before Python startup. This will force all encodings to use "utf-8". You can read about this here: https://docs.python.org/3/library/os.html#utf8-mode
PS Additionally for your json.dumps calls you should also add the argument ensure_ascii=False to ensure that the output matches the original source data text. Refer to the same Issue 2083 for demo of this too. You can try adding "New York was 1.5°C" into your output data to see the impact of not adding it.
Issue Type
Bug
Source
source
Giskard Library Version
2.16.0
OS Platform and Distribution
linux
Python version
3.11.8
Installed python packages
Current Behaviour?
Standalone code OR list down the steps to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: