Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update genai notebooks - Cameron J. #94

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

cjackson202
Copy link
Collaborator

@cjackson202 cjackson202 commented Dec 5, 2024

Pull Request Template

Description

This pull request includes several updates and improvements within the GenAI directory:

  • Updated all Python packages and existing embedding scripts to ensure error-free execution.
  • Updated the Streamlit demo for the Azure OpenAI RAG application, ensuring error-free execution and implementing a vectorization function for documents in Blob Storage.
  • Created a Jupyter notebook for Azure OpenAI RAG, providing detailed documentation on building a RAG application using Azure APIs for each resource.
  • Developed an ARM template for the automated deployment of Azure resources required for Azure AI.

Assignee

*Assignees: @zbyosufzai *

PR checklist

Please ensure the following:

  • This comment contains a description of changes (with reason).
  • All changes were tested.
  • If you've fixed a bug mention the issue number/name.
  • Apply appropriate tags (e.g. documentation, bug)

Copy link
Collaborator

@zbyosufzai zbyosufzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address the comments I've left and let me know if you have any questions!

@@ -0,0 +1,234 @@
# Setting Up Azure Environment for Azure GenAI Cloud Lab

Welcome! This guide will help you set up your Azure environment to complete the activities in the [Azure GenAI](../) directory of the NIH Cloud Lab. We will walk you through the steps required to configure PowerShell, deploy necessary resources using an ARM template, upload local files to Azure Storage Account, and acquire keys and secrets for `.env` variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if the user is using Azure Machine Learning and creates a notebook there then the CLI is already installed there. It might be the same for the VMs created in Azure. If so, then that would be a helpful note to add here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note to Prerequisites, advising the user of such circumstance. If using such environment, user is encouraged to skip step 1 and move directly to step 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposing we also incorporate steps to deploying Azure Resources manually from the Azure portal in this tutorial as well. Therefore, we can have a unified landing zone for users to configure their working environments for all tutorials in GenAI notebook using either the ARM file as an option or manual deployments as an option. In each tutorial for prerequisites we can then link users to this landing zone if they have not yet configured the environment and resources.

## Prerequisites

- An active Azure subscription
- PowerShell installed on your machine (option 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason why the user should use powershell?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choosing between Azure CLI and PowerShell comes down to personal preference and the working environment. Brief overview added to the Prerequisites section for helping users choose.

@@ -0,0 +1,70 @@
import streamlit as st
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential Blocker: I couldn't seem to get the Streamlit UI to launch in my Azure account. It could be that the NIH environment blocks this. I'll be testing this out some more and will let you know if the issue persists.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streamlit expects to run the demo application locally on port 8501, explaining the Streamlit site will not launch when running the demo from Azure ML. A workaround involves utilizing Ngrok, which provides a secure tunnel for the Streamlit demo app, exposing the application to the internet. This is a secure and efficient alternative to accessing the Streamlit app without needing to run locally from port 8501.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ngrock script has been created and successfully tested with the Streamlit demo in my Azure ML env. Demo now works as expected. Documentation added to readme, section "## Executing the Azure OpenAI Demo w/ Streamlit Frontend" on what this script does and how to execute.

# load in .env variables
load_dotenv()

# configure azure openai keys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this isn't required I would advise deleting it so as to not confuse new users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is crucial for ensuring that all environment secrets and keys are imported into the Python script, keeping sensitive information secure by separating it from the codebase. The code has been updated with better detailed documentation. However, please note that this particular page of the demo may not execute as expected when launching from an Azure ML notebook or VM with the identified Ngrok workaround. This is because the Streamlit app will be launched to the internet, providing limited access to the local CSV file necessary to generate embeddings and chat over. I recommend adding a note that this particular page of the demo should be executed from a local machine. Alternatively, we should consider whether it’s best to archive this page from the demo site.

@@ -0,0 +1,234 @@
# Setting Up Azure Environment for Azure GenAI Cloud Lab
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an estimate on the price of how much running this README in one setting could potentially cost a user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing the ARM template in this README does not incur any cost. However, the resources deployed in the ARM template will incur costs overtime, based on Azure's pricing for each resource. These resources are the same resources that were originally being used in the Cloud Lab around AOAI, such as AI Search, Blob Storage, and Azure OpenAI service. The ARM template serves as an automated way to deploy these resources to a resource group, rather than having to manually deploy each resource from the Azure portal. "## Resources and Cost Breakdown" added to readme with estimated costs from Azure Pricing Calculator based on SKUs found in the ARM template for each resource.

@@ -0,0 +1,234 @@
# Setting Up Azure Environment for Azure GenAI Cloud Lab
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please structure the tutorial to follow the outline laid out in the tutorial checklist

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tutorial has been structured to follow the outline laid out in checklist. Skill level identification has not yet been added. Where should skill level identification be placed in the tutorial structure?


- Navigate to the /GenAI directory:
```sh
cd .\notebooks\GenAI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change backslashes to forward slashes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backslashes have been changed to forward slashes.


4. Navigate to the /embeddings directory (location of the Streamlit demo)
```sh
cd .\notebooks\GenAI\embedding_demos
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are already in the GenAI directory please change to cd ./embedding_demos and fix slashes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrections have been made to directory path and slashes.


5. Execute the Streamlit demo
```sh
streamlit run Demo_Suite.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

streamlit wasn't able to launch in Jupyter Lab.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ngrock script has been created and successfully tested with the Streamlit demo in my Azure ML env. Demo now works as expected. Documentation added to readme, section "## Executing the Azure OpenAI Demo w/ Streamlit Frontend" on what this script does and how to execute.

notebooks/GenAI/embedding_demos/readme.md Show resolved Hide resolved
Copy link
Collaborator

@zbyosufzai zbyosufzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the import error in the Jupyter notebook tutorial stopped me from running the rest of the notebook

" \n",
"# For handling Azure credentials \n",
"from azure.core.credentials import AzureKeyCredential \n",
"from azure.identity import DefaultAzureCredential \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran into an error with importing this python package: ImportError: cannot import name 'AccessTokenInfo' from 'azure.core.credentials' (/anaconda/envs/jupyter_env/lib/python3.8/site-packages/azure/core/credentials.py)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to successfully execute the pip command in cell one to install packages in the kernel? If so, it's possible there can be a discrepancy between the Python version you are using and the Python version which the code was built on top of. The notebook is using Python 3.11.9. Please confirm if you are using a separate version from this.

"from dotenv import load_dotenv \n",
" \n",
"# For utilizing OpenAI functionalities within Azure \n",
"from openai import AzureOpenAI \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another import error: ImportError: cannot import name 'Sequence' from 'typing_extensions' (/anaconda/envs/jupyter_env/lib/python3.8/site-packages/typing_extensions.py)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to successfully execute the pip command in cell one to install packages in the kernel? If so, it's possible there can be a discrepancy between the Python version you are using and the Python version which the code was built on top of. The notebook is using Python 3.11.9. Please confirm if you are using a separate version from this.

@zbyosufzai zbyosufzai added bug Something isn't working documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file labels Dec 17, 2024
df['embedding'] = df['text'].apply(lambda x: get_embedding(x))
df.to_csv('microsoft-earnings_embeddings.csv', index=False)
df['embedding'] = df['text'].apply(lambda x:get_embedding(x, engine=os.getenv("AZURE_EMBEDDINGS_DEPLOYMENT")))
df.to_csv('.\\microsoft-earnings_embeddings.csv', index=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo please change .\\ to ../

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.join has been incorporated to ensure the output file can be saved to /example_scripts across different operating systems, including Azure ML environment. Loading pattern also added, to make user aware that the script is generating embeddings, rather than being stuck in a continuous loop.


#create cosine function
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# read in the embeddings .csv
# convert elements in 'embedding' column back to numpy array
df = pd.read_csv('microsoft-earnings_embeddings.csv')
df = pd.read_csv('.\\microsoft-earnings_embeddings.csv')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo please change .\\ to ../

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two example scripts my require a README to explain they are connected

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.join has been incorporated to ensure the input file "'microsoft-earnings_embeddings.csv" can be read into the df across different operating systems, including Azure ML environment. I agree a README will enhance the awareness that the two scripts are connected. Proposing that we create a README for the /example_scripts directory, encompassing all scripts in this directory and identifying what each script does and any connections they may have.


### 5. Deploying the ARM Template

Deploy the [ARM template](/notebooks/GenAI/azure_infra_setup/arm_resources.json) to create the Azure Storage Account, Azure AI Search, and Azure OpenAI resources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link didn't work please change to arm_resources.json

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjustment made to arm_resources.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants