Testing and Validation

Assessment Measures and Evaluation Technique

The following testing procedure aims to verify that the Agent correctly identifies and understands user intents for accessing customer data (e.g., account information), fulfilling business workflows through pre-defined intents (e.g., completing a loan application), and answering general queries (see Sample Prompts under README). Response accuracy is determined by evaluating the relevancy, coherency, and human-like nature of the answers generated by the Bedrock-hosted Anthropic Claude LLM. The source links provided with each response, whether from Kendra data sources (e.g., Web Crawler configured for 'octankfinancial.com') or the Bedrock LLM's training dataset, should also be confirmed as credible.

Provide Personalized Responses: Verify the Agent successfully accesses and utilizes relevant customer information In DynamoDB to tailor user-specific responses.

❗ The use of PIN authentication within the Agent is for demonstration purposes only and should not be used in any production implementation.

Curate Opinionated Answers: Validate that opinionated questions are met with opinioned answers by the Agent correctly sourcing replies based on authoritative customer documents and webpages indexed by Kendra.

Deliver Contextual Generation: Determine the Agent's ability to provide contextually relevant responses based on previous prompt history.

Access General Knowledge: Confirm the Agent's access to general knowledge information for non-customer-specific, non-opinionated queries that require accurate and coherent retorts based on Bedrock LLM training data.

Execute Pre-Defined Intents: Ensure the agent correctly interprets and conversationally fulfills user prompts that are intended to be routed to pre-defined intents, such as completing a loan application as part of a business workflow.

Resultant Loan Application Document completed through conversational flow:

Multi-channel support functionality can be tested in conjunction with the above assessment measures across Web, SMS, and Voice channels.

Security and Privacy

Ensure data security and user privacy throughout the implementation process. Implement appropriate access controls and encryption mechanisms to protect sensitive information. Solutions like the GenerativeAI Financial Services Agent will benefit from data which is not yet available to the underlying LLM, which often means you will want to use your own private data for the biggest jump in capability.

Keep it secret. Keep it safe. You will want this data to stay completely protected, secure, and private during the generative process, and want control over how this data is shared and used.
Set some rules of the road. Understand how data is used by a service before making it available to your teams. Create and distribute the rules for what data can be used with what service. Make these clear to your teams so they can move quickly and prototype safely.
Involve Legal, sooner rather than later. Have your Legal teams review the T&Cs and service cards of the services you plan to use before you start running any sensitive data through them. Your Legal partners have never been more important than they are today.
As an example of how we are thinking about this at AWS with Amazon Bedrock: All data is encrypted and does not leave your VPC, and Bedrock makes a separate copy of the base Foundational Model that is accessible only to the customer, and fine-tunes or trains this private copy of the model.

User Acceptance Testing (UAT)

Conduct UAT with real users to evaluate the performance, usability, and satisfaction of the GenerativeAI Financial Services Agent. Gather feedback and make necessary improvements based on user input.

Deployment and Monitoring

Deploy the fully-tested Agent on AWS, and implement monitoring and logging to track its performance, identify issues, and optimize the system as needed. AWS Lambda monitoring and troubleshooting features are enabled by default for the Agent's Lambda handler.

Maintenance and Updates

Regularly update the Agent with the latest LLM versions and data to enhance its accuracy and effectiveness. Monitor customer-specific data in DynamoDB and synchronize Kendra's data source indexing as needed.

By following this guide, you can successfully implement, test, and validate a reliable GenerativeAI Financial Services Agent, providing users with accurate and personalized financial assistance through natural language conversations.

Resources

Generative AI on AWS
AWS Amplify
Amazon Bedrock
Amazon DynamoDB
Amazon Kendra
Amazon Lex
LangChain Conversational Agent

Clean Up

see Clean Up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing-and-validation.md

testing-and-validation.md

Testing and Validation

Assessment Measures and Evaluation Technique

Security and Privacy

User Acceptance Testing (UAT)

Deployment and Monitoring

Maintenance and Updates

Resources

Clean Up

Files

testing-and-validation.md

Latest commit

History

testing-and-validation.md

File metadata and controls

Testing and Validation

Assessment Measures and Evaluation Technique

Security and Privacy

User Acceptance Testing (UAT)

Deployment and Monitoring

Maintenance and Updates

Resources

Clean Up