diff --git a/examples/credit-risk-end-to-end/01_Credit_Risk_Data_Prep.ipynb b/examples/credit-risk-end-to-end/01_Credit_Risk_Data_Prep.ipynb new file mode 100644 index 00000000000..a345ec8ca46 --- /dev/null +++ b/examples/credit-risk-end-to-end/01_Credit_Risk_Data_Prep.ipynb @@ -0,0 +1,757 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a52c80c4-1ea2-4d1e-b582-fac51081e76d", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "id": "576a8e30-fe4c-4eda-bc56-9edd7fde3385", + "metadata": {}, + "source": [ + "# Credit Risk Data Preparation" + ] + }, + { + "cell_type": "markdown", + "id": "1f3fbd5a-1587-4b4e-9263-a57490657337", + "metadata": {}, + "source": [ + "Predicting credit risk is an important task for financial institutions. If a bank can accurately determine the probability that a borrower will pay back a future loan, then they can make better decisions on loan terms and approvals. Getting credit risk right is critical to offering good financial services, and getting credit risk wrong could mean going out of business.\n", + "\n", + "AI models have played a central role in modern credit risk assessment systems. In this example, we develop a credit risk model to predict whether a future loan will be good or bad, given some context data (presumably supplied from the loan application). We use the modeling process to demonstrate how Feast can be used to facilitate the serving of data for training and inference use-cases.\n", + "\n", + "In this notebook, we prepare the data." + ] + }, + { + "cell_type": "markdown", + "id": "4d05715f-ddb8-42de-8f0c-212dcbad9e0e", + "metadata": {}, + "source": [ + "### Setup" + ] + }, + { + "cell_type": "markdown", + "id": "6fba29f9-db1f-4ceb-b066-5b2df2c95d33", + "metadata": {}, + "source": [ + "*The following code assumes that you have read the example README.md file, and that you have setup an environment where the code can be run. Please make sure you have addressed the prerequisite needs.*" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "8a897b19-6f82-4631-ae51-8a23182ff267", + "metadata": {}, + "outputs": [], + "source": [ + "# Import Python libraries\n", + "import os\n", + "import warnings\n", + "import datetime as dt\n", + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.datasets import fetch_openml" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b944ed48-54b3-43fa-8373-ce788d7e71af", + "metadata": {}, + "outputs": [], + "source": [ + "# suppress warning messages for example flow (don't run if you want to see warnings)\n", + "warnings.filterwarnings('ignore')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "70788c73-144f-4ecf-b370-c5669c538d93", + "metadata": {}, + "outputs": [], + "source": [ + "# Seed for reproducibility\n", + "SEED = 142" + ] + }, + { + "cell_type": "markdown", + "id": "cfb4dfd0-f583-4aa0-bd39-3ff9fbb80db0", + "metadata": {}, + "source": [ + "### Pull the Data" + ] + }, + { + "cell_type": "markdown", + "id": "3c206dfc-d551-4002-ae63-ccbb981768fa", + "metadata": {}, + "source": [ + "The data we will use to train the model is from the [OpenML](https://www.openml.org/) dataset [credit-g](https://www.openml.org/search?type=data&sort=runs&status=active&id=31), obtained from a 1994 German study. More details on the data can be found in the `DESC` attribute and `details` map (see below)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "31a9e964-bdb3-4ae4-b2b4-64bbe0ab93a3", + "metadata": {}, + "outputs": [], + "source": [ + "data = fetch_openml(name=\"credit-g\", version=1, parser='auto')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "58dbf7c2-f40b-4965-baac-6903a27ef622", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "**Author**: Dr. Hans Hofmann \n", + "**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)) - 1994 \n", + "**Please cite**: [UCI](https://archive.ics.uci.edu/ml/citation_policy.html)\n", + "\n", + "**German Credit dataset** \n", + "This dataset classifies people described by a set of attributes as good or bad credit risks.\n", + "\n", + "This dataset comes with a cost matrix: \n", + "``` \n", + "Good Bad (predicted) \n", + "Good 0 1 (actual) \n", + "Bad 5 0 \n", + "```\n", + "\n", + "It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1). \n", + "\n", + "### Attribute description \n", + "\n", + "1. Status of existing checking account, in Deutsche Mark. \n", + "2. Duration in months \n", + "3. Credit history (credits taken, paid back duly, delays, critical accounts) \n", + "4. Purpose of the credit (car, television,...) \n", + "5. Credit amount \n", + "6. Status of savings account/bonds, in Deutsche Mark. \n", + "7. Present employment, in number of years. \n", + "8. Installment rate in percentage of disposable income \n", + "9. Personal status (married, single,...) and sex \n", + "10. Other debtors / guarantors \n", + "11. Present residence since X years \n", + "12. Property (e.g. real estate) \n", + "13. Age in years \n", + "14. Other installment plans (banks, stores) \n", + "15. Housing (rent, own,...) \n", + "16. Number of existing credits at this bank \n", + "17. Job \n", + "18. Number of people being liable to provide maintenance for \n", + "19. Telephone (yes,no) \n", + "20. Foreign worker (yes,no)\n", + "\n", + "Downloaded from openml.org.\n" + ] + } + ], + "source": [ + "print(data.DESCR)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "53de57ec-0fb6-4b51-9c27-696b059a1847", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Original data url: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)\n", + "Paper url: https://dl.acm.org/doi/abs/10.1145/967900.968104\n" + ] + } + ], + "source": [ + "print(\"Original data url: \".ljust(20), data.details[\"original_data_url\"])\n", + "print(\"Paper url: \".ljust(20), data.details[\"paper_url\"])" + ] + }, + { + "cell_type": "markdown", + "id": "6b2c2514-484e-46cb-aedc-89a301266f44", + "metadata": {}, + "source": [ + "### High-Level Data Inspection" + ] + }, + { + "cell_type": "markdown", + "id": "a76af306-caba-403d-a9cb-b5de12573075", + "metadata": {}, + "source": [ + "Let's inspect the data to see high level details like data types and size. We also want to make sure there are no glaring issues (like a large number of null values)." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "20fb82c4-ed8d-42f8-b386-c7ebdc9bf786", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 1000 entries, 0 to 999\n", + "Data columns (total 21 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 checking_status 1000 non-null category\n", + " 1 duration 1000 non-null int64 \n", + " 2 credit_history 1000 non-null category\n", + " 3 purpose 1000 non-null category\n", + " 4 credit_amount 1000 non-null int64 \n", + " 5 savings_status 1000 non-null category\n", + " 6 employment 1000 non-null category\n", + " 7 installment_commitment 1000 non-null int64 \n", + " 8 personal_status 1000 non-null category\n", + " 9 other_parties 1000 non-null category\n", + " 10 residence_since 1000 non-null int64 \n", + " 11 property_magnitude 1000 non-null category\n", + " 12 age 1000 non-null int64 \n", + " 13 other_payment_plans 1000 non-null category\n", + " 14 housing 1000 non-null category\n", + " 15 existing_credits 1000 non-null int64 \n", + " 16 job 1000 non-null category\n", + " 17 num_dependents 1000 non-null int64 \n", + " 18 own_telephone 1000 non-null category\n", + " 19 foreign_worker 1000 non-null category\n", + " 20 class 1000 non-null category\n", + "dtypes: category(14), int64(7)\n", + "memory usage: 71.0 KB\n" + ] + } + ], + "source": [ + "df = data.frame\n", + "df.info()" + ] + }, + { + "cell_type": "markdown", + "id": "a384932a-40df-45f6-bfbc-a9cf6c708f1b", + "metadata": {}, + "source": [ + "We see that there are 21 columns, each with 1000 non-null values. The first 20 columns are contextual fields with `Dtype` of `category` or `int64`, while the last field is actually the target variable, `class`, which we wish to predict. \n", + "\n", + "From the description (above), the `class` tells us whether a loan to a customer was \"good\" or \"bad\". We are anticipating that patterns in the contextual data, as well as their relationship to the class outcomes, can give insight into loan classification. In the following notebooks, we will build a loan classification model that seeks to encode these patterns and relationships in its weights, such that given a new loan application (context data), the model can predict whether the loan (if approved) will be good or bad in the future." + ] + }, + { + "cell_type": "markdown", + "id": "a451c9a3-0390-4d5a-b687-c59f52445eb1", + "metadata": {}, + "source": [ + "### Data Preparation For Demonstrating Feast" + ] + }, + { + "cell_type": "markdown", + "id": "dc4e7653-b118-44c3-ade3-f1b217b112fc", + "metadata": {}, + "source": [ + "At this point, it's important to bring up that Feast was developed primarily to work with production data. Feast requires datasets to have entities (in our case, IDs) and timestamps, which it uses in joins. Feast can support joining data on multiple entities (like primary keys in SQL), as well as \"created\" timestamps and \"event\" timestamps. However, in this example, we'll keep things more simple.\n", + "\n", + "In a real loan application scenario, the application fields (in a database) would be associated with a timestamp, while the actual loan outcome (label) would be determined much later and recorded separately with a different timestamp.\n", + "\n", + "In order to demonstrate Feast capabilities, such as point-in-time joins, we will mock IDs and timestamps for this data. For IDs, we will use the original dataframe index values. For the timestamps, we will generate random values between \"Tue Sep 24 12:00:00 2023\" and \"Wed Oct 9 12:00:00 2023\"." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "9d6ec4f6-9410-4858-a440-45dccaa0896b", + "metadata": {}, + "outputs": [], + "source": [ + "# Make index into \"ID\" column\n", + "df = df.reset_index(names=[\"ID\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "055f2cb7-3abf-4d01-be60-e4c7b8ad1988", + "metadata": {}, + "outputs": [], + "source": [ + "# Add mock timestamps\n", + "time_format = \"%a %b %d %H:%M:%S %Y\"\n", + "date = dt.datetime.strptime(\"Wed Oct 9 12:00:00 2023\", time_format)\n", + "end = int(date.timestamp())\n", + "start = int((date - dt.timedelta(days=15)).timestamp()) # 'Tue Sep 24 12:00:00 2023'\n", + "\n", + "def make_tstamp(date):\n", + " dtime = dt.datetime.fromtimestamp(date).ctime()\n", + " return dtime\n", + " \n", + "# (seed set for reproducibility)\n", + "np.random.seed(SEED)\n", + "df[\"application_timestamp\"] = pd.to_datetime([\n", + " make_tstamp(d) for d in np.random.randint(start, end, len(df))\n", + "])" + ] + }, + { + "cell_type": "markdown", + "id": "f7800ea9-de9a-4aab-9d77-c4276e7db5f9", + "metadata": {}, + "source": [ + "Verify that the newly created \"ID\" and \"application_timestamp\" fields were added to the data as expected." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "9516fc5c-7c25-4e60-acba-7400ab6bab42", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
012
ID012
checking_status<00<=X<200no checking
duration64812
credit_historycritical/other existing creditexisting paidcritical/other existing credit
purposeradio/tvradio/tveducation
credit_amount116959512096
savings_statusno known savings<100<100
employment>=71<=X<44<=X<7
installment_commitment422
personal_statusmale singlefemale div/dep/marmale single
other_partiesnonenonenone
residence_since423
property_magnitudereal estatereal estatereal estate
age672249
other_payment_plansnonenonenone
housingownownown
existing_credits211
jobskilledskilledunskilled resident
num_dependents112
own_telephoneyesnonenone
foreign_workeryesyesyes
classgoodbadgood
application_timestamp2023-10-04 17:50:132023-09-28 18:10:132023-10-03 23:06:03
\n", + "
" + ], + "text/plain": [ + " 0 1 \\\n", + "ID 0 1 \n", + "checking_status <0 0<=X<200 \n", + "duration 6 48 \n", + "credit_history critical/other existing credit existing paid \n", + "purpose radio/tv radio/tv \n", + "credit_amount 1169 5951 \n", + "savings_status no known savings <100 \n", + "employment >=7 1<=X<4 \n", + "installment_commitment 4 2 \n", + "personal_status male single female div/dep/mar \n", + "other_parties none none \n", + "residence_since 4 2 \n", + "property_magnitude real estate real estate \n", + "age 67 22 \n", + "other_payment_plans none none \n", + "housing own own \n", + "existing_credits 2 1 \n", + "job skilled skilled \n", + "num_dependents 1 1 \n", + "own_telephone yes none \n", + "foreign_worker yes yes \n", + "class good bad \n", + "application_timestamp 2023-10-04 17:50:13 2023-09-28 18:10:13 \n", + "\n", + " 2 \n", + "ID 2 \n", + "checking_status no checking \n", + "duration 12 \n", + "credit_history critical/other existing credit \n", + "purpose education \n", + "credit_amount 2096 \n", + "savings_status <100 \n", + "employment 4<=X<7 \n", + "installment_commitment 2 \n", + "personal_status male single \n", + "other_parties none \n", + "residence_since 3 \n", + "property_magnitude real estate \n", + "age 49 \n", + "other_payment_plans none \n", + "housing own \n", + "existing_credits 1 \n", + "job unskilled resident \n", + "num_dependents 2 \n", + "own_telephone none \n", + "foreign_worker yes \n", + "class good \n", + "application_timestamp 2023-10-03 23:06:03 " + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check data (first few records, transposed for readability)\n", + "df.head(3).T" + ] + }, + { + "cell_type": "markdown", + "id": "72b2105a-b459-4715-aa53-6fe69fc4a210", + "metadata": {}, + "source": [ + "We'll also generate counterpart IDs and timestamps on the label data. In a real-life scenario, the label data would come separate and later relative to the loan application data. To mimic this, let's create a labels dataset with an \"outcome_timestamp\" column with a variable lag from the application timestamp of 30 to 90 days." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "e214478b-ed9b-4354-ba6f-4117813c56c3", + "metadata": {}, + "outputs": [], + "source": [ + "# Add (lagged) label timestamps (30 to 90 days)\n", + "def lag_delta(data, seed):\n", + " np.random.seed(seed)\n", + " delta_days = np.random.randint(30, 90, len(data))\n", + " delta_hours = np.random.randint(0, 24, len(data))\n", + " delta = np.array([dt.timedelta(days=int(delta_days[i]), hours=int(delta_hours[i])) for i in range(len(data))])\n", + " return delta\n", + "\n", + "labels = df[[\"ID\", \"class\"]]\n", + "labels[\"outcome_timestamp\"] = pd.to_datetime(df.application_timestamp + lag_delta(df, SEED))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "356a7225-db20-4c15-87a3-4a0eb3127475", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDclassoutcome_timestamp
00good2023-11-24 22:50:13
11bad2023-11-03 12:10:13
22good2023-11-30 22:06:03
\n", + "
" + ], + "text/plain": [ + " ID class outcome_timestamp\n", + "0 0 good 2023-11-24 22:50:13\n", + "1 1 bad 2023-11-03 12:10:13\n", + "2 2 good 2023-11-30 22:06:03" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check labels\n", + "labels.head(3)" + ] + }, + { + "cell_type": "markdown", + "id": "4a29f754-f758-402b-ac42-2dcfcee3b7fc", + "metadata": {}, + "source": [ + "You can verify that the `outcome timestamp` has a difference of 30 to 90 days from the \"application_timestamp\" (above)." + ] + }, + { + "cell_type": "markdown", + "id": "e720ce24-e092-4fcd-be3e-68bb18f4d2a7", + "metadata": {}, + "source": [ + "### Save Data" + ] + }, + { + "cell_type": "markdown", + "id": "5cae0578-8431-46c7-8d64-e52146f47d46", + "metadata": {}, + "source": [ + "Now that we have our data prepared, let's save it to local parquet files in the `data` directory (parquet is one of the file formats supported by Feast).\n", + "\n", + "One more step we will add is splitting the context data column-wise and saving it in two files. This step is contrived--we don't usually split data when we don't need to--but it will allow us to demonstrate later how Feast can easily join datasets (a common need in Data Science projects)." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "cebef56c-1f54-4d31-a545-75d708d38579", + "metadata": {}, + "outputs": [], + "source": [ + "# Create the data directory if it doesn't exist\n", + "os.makedirs(\"Feature_Store/data\", exist_ok=True)\n", + "\n", + "# Split columns and save context data\n", + "a_cols = [\n", + " 'ID', 'checking_status', 'duration', 'credit_history', 'purpose',\n", + " 'credit_amount', 'savings_status', 'employment', 'application_timestamp',\n", + " 'installment_commitment', 'personal_status', 'other_parties',\n", + "]\n", + "b_cols = [\n", + " 'ID', 'residence_since', 'property_magnitude', 'age', 'other_payment_plans',\n", + " 'housing', 'existing_credits', 'job', 'num_dependents', 'own_telephone',\n", + " 'foreign_worker', 'application_timestamp'\n", + "]\n", + "\n", + "df[a_cols].to_parquet(\"Feature_Store/data/data_a.parquet\", engine=\"pyarrow\")\n", + "df[b_cols].to_parquet(\"Feature_Store/data/data_b.parquet\", engine=\"pyarrow\")\n", + "\n", + "# Save label data\n", + "labels.to_parquet(\"Feature_Store/data/labels.parquet\", engine=\"pyarrow\")" + ] + }, + { + "cell_type": "markdown", + "id": "d8d5de9f-bd27-4e95-802c-b121743dd1b0", + "metadata": {}, + "source": [ + "We have saved the following files to the `Feature_Store/data` directory: \n", + "- `data_a.parquet` (training data, a columns)\n", + "- `data_b.parquet` (training data, b columns)\n", + "- `labels.parquet` (label outcomes)" + ] + }, + { + "cell_type": "markdown", + "id": "af6355dc-ff5b-4b3f-b0bd-3c4020ef67e8", + "metadata": {}, + "source": [ + "With the feature data prepared, we are ready to setup and deploy the feature store. \n", + "\n", + "Continue with the [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb) notebook." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/credit-risk-end-to-end/02_Deploying_the_Feature_Store.ipynb b/examples/credit-risk-end-to-end/02_Deploying_the_Feature_Store.ipynb new file mode 100644 index 00000000000..f736cdaed93 --- /dev/null +++ b/examples/credit-risk-end-to-end/02_Deploying_the_Feature_Store.ipynb @@ -0,0 +1,801 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "08d9e060-d455-43e2-b1ec-51e2a53e3169", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "id": "93095241-3886-44a2-83b1-2a9537c21bc8", + "metadata": {}, + "source": [ + "# Deploying the Feature Store" + ] + }, + { + "cell_type": "markdown", + "id": "465783da-18eb-4945-98e7-bb1058a7af1b", + "metadata": {}, + "source": [ + "### Introduction" + ] + }, + { + "cell_type": "markdown", + "id": "11961d1b-72db-48dc-a07d-dcea9ba223b4", + "metadata": {}, + "source": [ + "Feast enables AI/ML teams to serve (and consume) features via feature stores. In this notebook, we will configure the feature stores and feature definitions, and deploy a Feast feature store server. We will also materialize (move) data from the offline store to the online store.\n", + "\n", + "In Feast, offline stores support pulling large amounts of data for model training using tools like Redshift, Snowflake, Bigquery, and Spark. In contrast, the focus of Feast online stores is feature serving in support of model inference, using tools like Redis, Snowflake, PostgreSQL, and SQLite.\n", + "\n", + "In this notebook, we will setup a file-based (Dask) offline store and SQLite online store. The online store will be made available through the Feast server." + ] + }, + { + "cell_type": "markdown", + "id": "dfed8ccf-0d7d-46a1-82f0-5765f8796088", + "metadata": {}, + "source": [ + "This notebook assumes that you have prepared the data by running the notebook [01_Credit_Risk_Data_Prep.ipynb](01_Credit_Risk_Data_Prep.ipynb). " + ] + }, + { + "cell_type": "markdown", + "id": "e66b7a08-5d15-4804-a82a-8bc571777496", + "metadata": {}, + "source": [ + "### Setup" + ] + }, + { + "cell_type": "markdown", + "id": "1c1e87a4-900b-48f3-a400-ce6608046ce3", + "metadata": {}, + "source": [ + "*The following code assumes that you have read the example README.md file, and that you have setup an environment where the code can be run. Please make sure you have addressed the prerequisite needs.*" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "8bd21689-4a8e-4b0c-937d-0911df9db1d3", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "import re\n", + "import sys\n", + "import time\n", + "import signal\n", + "import sqlite3\n", + "import subprocess\n", + "import datetime as dt\n", + "from feast import FeatureStore" + ] + }, + { + "cell_type": "markdown", + "id": "471db4b0-ea93-47a1-9d55-a80e4d2bdc1e", + "metadata": {}, + "source": [ + "### Feast Feature Store Configuration" + ] + }, + { + "cell_type": "markdown", + "id": "0a307490-4121-4bf3-a5c4-77a8885a4f6a", + "metadata": {}, + "source": [ + "For model training, we usually don't need (or want) a constantly running feature server. All we need is the ability to efficiently query and pull all of the training data at training time. In contrast, during model serving we need servers that are always ready to supply feature records in response to application requests. \n", + "\n", + "This training-serving dichotomy is reflected in Feast using \"offline\" and \"online\" stores. Offline stores are configured to work with database technologies typically used for training, while online stores are configured to use storage and streaming technologies that are popular for feature serving.\n", + "\n", + "We need to create a `feature_store.yaml` config file to tell feast the structure we want in our offline and online feature stores. Below, we write the configuration for a local \"Dask\" offline store and local SQLite online store. We give the feature store a project name of `loan_applications`, and provider `local`. The registry is where the feature store will keep track of feature definitions and online store updates; we choose a file location in this case.\n", + "\n", + "See the [feature_store.yaml](https://docs.feast.dev/reference/feature-repository/feature-store-yaml) documentation for further details. " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b3757221-2037-49eb-867f-b9529fec06e2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Writing Feature_Store/feature_store.yaml\n" + ] + } + ], + "source": [ + "%%writefile Feature_Store/feature_store.yaml\n", + "\n", + "project: loan_applications\n", + "registry: data/registry.db\n", + "provider: local\n", + "offline_store:\n", + " type: dask\n", + "online_store:\n", + " type: sqlite\n", + " path: data/online_store.db\n", + "entity_key_serialization_version: 2" + ] + }, + { + "cell_type": "markdown", + "id": "180038f3-e5ce-4cce-bdf0-118eee7a822d", + "metadata": {}, + "source": [ + "### Feature Definitions" + ] + }, + { + "cell_type": "markdown", + "id": "dd44b206-1f5c-4f55-bbab-41ba2d3f5202", + "metadata": {}, + "source": [ + "We also need to create feature definitions and other feature constructs in a python file, which we name `feature_definitions.py`. For our purposes, we define the following:\n", + "\n", + "- Data Source: connections to data storage or data-producing endpoints\n", + "- Entity: primary key fields which can be used for joining data\n", + "- FeatureView: collections of features from a data source\n", + "\n", + "For more information on these, see the [Concepts](https://docs.feast.dev/getting-started/concepts) section of the Feast documentation." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d3e8fd80-0bee-463c-b3fb-bd0d1ee83a9c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Writing Feature_Store/feature_definitions.py\n" + ] + } + ], + "source": [ + "%%writefile Feature_Store/feature_definitions.py\n", + "\n", + "# Imports\n", + "import os\n", + "from pathlib import Path\n", + "from feast import (\n", + " FileSource,\n", + " Entity,\n", + " FeatureView,\n", + " Field,\n", + " FeatureService\n", + ")\n", + "from feast.types import Float32, String\n", + "from feast.data_format import ParquetFormat\n", + "\n", + "CURRENT_DIR = os.path.abspath(os.curdir)\n", + "\n", + "# Data Sources\n", + "# A data source tells Feast where the data lives\n", + "data_a = FileSource(\n", + " file_format=ParquetFormat(),\n", + " path=Path(CURRENT_DIR,\"data/data_a.parquet\").as_uri()\n", + ")\n", + "data_b = FileSource(\n", + " file_format=ParquetFormat(),\n", + " path=Path(CURRENT_DIR,\"data/data_b.parquet\").as_uri()\n", + ")\n", + "\n", + "# Entity\n", + "# An entity tells Feast the column it can use to join tables\n", + "loan_id = Entity(\n", + " name = \"loan_id\",\n", + " join_keys = [\"ID\"]\n", + ")\n", + "\n", + "# Feature views\n", + "# A feature view is how Feast groups features\n", + "features_a = FeatureView(\n", + " name=\"data_a\",\n", + " entities=[loan_id],\n", + " schema=[\n", + " Field(name=\"checking_status\", dtype=String),\n", + " Field(name=\"duration\", dtype=Float32),\n", + " Field(name=\"credit_history\", dtype=String),\n", + " Field(name=\"purpose\", dtype=String),\n", + " Field(name=\"credit_amount\", dtype=Float32),\n", + " Field(name=\"savings_status\", dtype=String),\n", + " Field(name=\"employment\", dtype=String),\n", + " Field(name=\"installment_commitment\", dtype=Float32),\n", + " Field(name=\"personal_status\", dtype=String),\n", + " Field(name=\"other_parties\", dtype=String),\n", + " ],\n", + " source=data_a\n", + ")\n", + "features_b = FeatureView(\n", + " name=\"data_b\",\n", + " entities=[loan_id],\n", + " schema=[\n", + " Field(name=\"residence_since\", dtype=Float32),\n", + " Field(name=\"property_magnitude\", dtype=String),\n", + " Field(name=\"age\", dtype=Float32),\n", + " Field(name=\"other_payment_plans\", dtype=String),\n", + " Field(name=\"housing\", dtype=String),\n", + " Field(name=\"existing_credits\", dtype=Float32),\n", + " Field(name=\"job\", dtype=String),\n", + " Field(name=\"num_dependents\", dtype=Float32),\n", + " Field(name=\"own_telephone\", dtype=String),\n", + " Field(name=\"foreign_worker\", dtype=String),\n", + " ],\n", + " source=data_b\n", + ")\n", + "\n", + "# Feature Service\n", + "# a feature service in Feast represents a logical group of features\n", + "loan_fs = FeatureService(\n", + " name=\"loan_fs\",\n", + " features=[features_a, features_b]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b47c1b5-849e-43f3-8043-60466aaed69f", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "be9723eb-8fa0-4338-b50c-f9f1ff6bb13a", + "metadata": {}, + "source": [ + "### Applying the Configuration and Definitions" + ] + }, + { + "cell_type": "markdown", + "id": "c796d45f-28c0-4875-bbb1-71e5a15dcb96", + "metadata": {}, + "source": [ + "Now that we have our feature store configuration (`feature_store.yaml`) and feature definitions (`feature_definitions.py`), we are ready to \"apply\" them. The `feast apply` command creates a registry file (`Feature_Store/data/registry.db`) and sets up data connections; in this case, it creates a SQLite database (`Feature_Store/data/online_store.db`)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "394467f3-4ced-492a-9379-105aea9d4a6d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10/27/2024 02:19:03 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "10/27/2024 02:19:03 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "10/27/2024 02:19:03 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "10/27/2024 02:19:03 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "Created entity \u001b[1m\u001b[32mloan_id\u001b[0m\n", + "Created feature view \u001b[1m\u001b[32mdata_a\u001b[0m\n", + "Created feature view \u001b[1m\u001b[32mdata_b\u001b[0m\n", + "Created feature service \u001b[1m\u001b[32mloan_fs\u001b[0m\n", + "\n", + "10/27/2024 02:19:03 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "10/27/2024 02:19:03 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "Created sqlite table \u001b[1m\u001b[32mloan_applications_data_a\u001b[0m\n", + "Created sqlite table \u001b[1m\u001b[32mloan_applications_data_b\u001b[0m\n", + "\n" + ] + } + ], + "source": [ + "# Run 'feast apply' in the Feature_Store directory\n", + "!feast --chdir ./Feature_Store apply" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "e32f40eb-a31a-4877-8f40-2d8515302f39", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "total 232\n", + "-rw-r--r-- 1 501 20 33K Oct 27 14:17 data_a.parquet\n", + "-rw-r--r-- 1 501 20 27K Oct 27 14:17 data_b.parquet\n", + "-rw-r--r-- 1 501 20 17K Oct 27 14:17 labels.parquet\n", + "-rw-r--r-- 1 501 20 28K Oct 27 14:19 online_store.db\n", + "-rw-r--r-- 1 501 20 2.8K Oct 27 14:19 registry.db\n" + ] + } + ], + "source": [ + "# List the Feature_Store/data/ directory to see newly created files\n", + "!ls -nlh Feature_Store/data/" + ] + }, + { + "cell_type": "markdown", + "id": "31014885-ce6a-4007-8bdb-d74d3b44781b", + "metadata": {}, + "source": [ + "Note that while `feast apply` set up the `sqlite` online database, `online_store.db`, no data has been added to the online database as of yet. We can verify this by connecting with the `sqlite3` library." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "107ca856-af06-40c4-8339-70daf59cdf37", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Online Store Tables: [('loan_applications_data_a',), ('loan_applications_data_b',)]\n", + "loan_applications_data_a data: []\n", + "loan_applications_data_b data: []\n" + ] + } + ], + "source": [ + "# Connect to sqlite database\n", + "conn = sqlite3.connect(\"Feature_Store/data/online_store.db\")\n", + "cursor = conn.cursor()\n", + "# Query table data (3 tables)\n", + "print(\n", + " \"Online Store Tables: \",\n", + " cursor.execute(\"SELECT name FROM sqlite_master WHERE type='table';\").fetchall()\n", + ")\n", + "print(\n", + " \"loan_applications_data_a data: \",\n", + " cursor.execute(\"SELECT * FROM loan_applications_data_a\").fetchall()\n", + ")\n", + "print(\n", + " \"loan_applications_data_b data: \",\n", + " cursor.execute(\"SELECT * FROM loan_applications_data_b\").fetchall()\n", + ")\n", + "conn.close()" + ] + }, + { + "cell_type": "markdown", + "id": "03b927ee-7913-4a8a-b17b-9bee361d8d94", + "metadata": {}, + "source": [ + "Since we have used `feast apply` to create the registry, we can now use the Feast Python SDK to interact with our new feature store. To see other possible commands see the [Feast Python SDK documentation](https://rtd.feast.dev/en/master/)." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "c764a60a-b911-41a8-ba8f-7ef0a0bc7257", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "RepoConfig(project='loan_applications', provider='local', registry_config='data/registry.db', online_config={'type': 'sqlite', 'path': 'data/online_store.db'}, offline_config={'type': 'dask'}, batch_engine_config='local', feature_server=None, flags=None, repo_path=PosixPath('Feature_Store'), entity_key_serialization_version=2, coerce_tz_aware=True)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Get feature store config\n", + "store = FeatureStore(repo_path=\"./Feature_Store\")\n", + "store.config" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "fc572976-6ce9-44f6-8b67-28ee6157e29c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Feature view: data_a | Features: [checking_status-String, duration-Float32, credit_history-String, purpose-String, credit_amount-Float32, savings_status-String, employment-String, installment_commitment-Float32, personal_status-String, other_parties-String]\n", + "Feature view: data_b | Features: [residence_since-Float32, property_magnitude-String, age-Float32, other_payment_plans-String, housing-String, existing_credits-Float32, job-String, num_dependents-Float32, own_telephone-String, foreign_worker-String]\n" + ] + } + ], + "source": [ + "# List feature views\n", + "feature_views = store.list_batch_feature_views()\n", + "for fv in feature_views:\n", + " print(f\"Feature view: {fv.name} | Features: {fv.features}\")" + ] + }, + { + "cell_type": "markdown", + "id": "027edcfe-58d7-4dcb-92e2-5a5514c0f1f0", + "metadata": {}, + "source": [ + "### Deploying the Feature Store Servers" + ] + }, + { + "cell_type": "markdown", + "id": "c9aab68d-395f-421e-ba11-ad8c4acc9d6f", + "metadata": {}, + "source": [ + "If you wish to share a feature store with your team, Feast provides feature servers. To spin up an offline feature server process, we can use the `feast serve_offline` command, while to spin up a Feast online feature server, we use the `feast serve` command.\n", + "\n", + "Let's spin up an offline and an online server that we can use in the subsequent notebooks to get features during model training and model serving. We will run both servers as background processes, that we can communicate with in the other notebooks.\n", + "\n", + "First, we write a helper function to extract the first few printed log lines (so we can print it in the notebook cell output)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "568f81b8-df34-4b06-8a3f-1a6bdc2e6cff", + "metadata": {}, + "outputs": [], + "source": [ + "# TimeoutError class\n", + "class TimeoutError(Exception):\n", + " pass\n", + "\n", + "# TimeoutError raise function\n", + "def timeout():\n", + " raise TimeoutError(\"timeout\")\n", + "\n", + "# Get first few log lines function\n", + "def print_first_proc_lines(proc, wait):\n", + " '''Given a process, `proc`, read and print output lines until they stop \n", + " comming (waiting up to `wait` seconds for new lines to appear)'''\n", + " lines = \"\"\n", + " while True:\n", + " signal.signal(signal.SIGALRM, timeout)\n", + " signal.alarm(wait)\n", + " try:\n", + " lines += proc.stderr.readline()\n", + " except:\n", + " break\n", + " if lines:\n", + " print(lines, file=sys.stderr)" + ] + }, + { + "cell_type": "markdown", + "id": "88d25a87-241a-46c6-9ca7-d035959c5f74", + "metadata": {}, + "source": [ + "Launch the offline server with the command `feast --chdir ./Feature_Store serve_offline`." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "ce965dd4-652b-4c36-a064-fd0fd97d3ef7", + "metadata": {}, + "outputs": [], + "source": [ + "# Feast offline server process\n", + "offline_server_proc = subprocess.Popen(\n", + " \"feast --chdir ./Feature_Store serve_offline 2>&2 & echo $! > server_proc.txt\",\n", + " shell=True,\n", + " text=True,\n", + " stdout=subprocess.PIPE,\n", + " stderr=subprocess.PIPE,\n", + " bufsize=0\n", + ")\n", + "print_first_proc_lines(offline_server_proc, 2)" + ] + }, + { + "cell_type": "markdown", + "id": "59958d64-8e68-45ff-9549-556cbf46908c", + "metadata": {}, + "source": [ + "The tail end of the command above, `2>&2 & echo $! > server_proc.txt`, captures log messages (in the offline case there are none), and writes the process PID to the file `server_proc.txt` (we will use this in the cleanup notebook, [05_Credit_Risk_Cleanup.ipynb](05_Credit_Risk_Cleanup.ipynb))." + ] + }, + { + "cell_type": "markdown", + "id": "cfed4334-9e62-4f3f-be96-3f7db2f06ada", + "metadata": {}, + "source": [ + "Next, launch the online server with the command `feast --chdir ./Feature_Store serve`." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "a581fbe2-13ba-433e-8e76-dc82cc22af74", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/ddowler/Code/Feast/feast/examples/credit-risk-end-to-end/venv-py3.11/lib/python3.11/site-packages/uvicorn/workers.py:16: DeprecationWarning: The `uvicorn.workers` module is deprecated. Please use `uvicorn-worker` package instead.\n", + "For more details, see https://github.com/Kludex/uvicorn-worker.\n", + " warnings.warn(\n", + "[2024-10-27 14:19:07 -0600] [44621] [INFO] Starting gunicorn 23.0.0\n", + "[2024-10-27 14:19:07 -0600] [44621] [INFO] Listening at: http://127.0.0.1:6566 (44621)\n", + "[2024-10-27 14:19:07 -0600] [44621] [INFO] Using worker: uvicorn.workers.UvicornWorker\n", + "[2024-10-27 14:19:07 -0600] [44623] [INFO] Booting worker with pid: 44623\n", + "[2024-10-27 14:19:07 -0600] [44623] [INFO] Started server process [44623]\n", + "[2024-10-27 14:19:07 -0600] [44623] [INFO] Waiting for application startup.\n", + "[2024-10-27 14:19:07 -0600] [44623] [INFO] Application startup complete.\n", + "\n" + ] + } + ], + "source": [ + "# Feast online server (master and worker) processes\n", + "online_server_proc = subprocess.Popen(\n", + " \"feast --chdir ./Feature_Store serve 2>&2 & echo $! >> server_proc.txt\",\n", + " shell=True,\n", + " text=True,\n", + " stdout=subprocess.PIPE,\n", + " stderr=subprocess.PIPE,\n", + " bufsize=0\n", + ")\n", + "print_first_proc_lines(online_server_proc, 3)" + ] + }, + { + "cell_type": "markdown", + "id": "0e778173-f58a-4074-b63f-107e1f39577b", + "metadata": {}, + "source": [ + "Note that the output helpfully let's us know that the online server is \"Listening at: http://127.0.0.1:6566\" (the default host:port).\n", + "\n", + "List the running processes to verify they are up." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "9b1a224d-884d-45c5-9711-2e2eb4351710", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 501 44594 1 0 2:19PM ?? 0:03.66 **/python **/feast --chdir ./Feature_Store serve_offline\n", + " 501 44621 1 0 2:19PM ?? 0:03.58 **/python **/feast --chdir ./Feature_Store serve\n", + " 501 44623 44621 0 2:19PM ?? 0:00.03 **/python **/feast --chdir ./Feature_Store serve\n", + " 501 44662 44542 0 2:19PM ?? 0:00.01 /bin/zsh -c ps -ef | grep **/feast | grep serve\n" + ] + } + ], + "source": [ + "# List running Feast processes (paths redacted)\n", + "running_procs = !ps -ef | grep feast | grep serve\n", + "\n", + "for line in running_procs:\n", + " redacted = re.sub(r'/*[^\\s]*(?P(python )|(feast ))', r'**/\\g', line)\n", + " print(redacted)" + ] + }, + { + "cell_type": "markdown", + "id": "fd52eeb4-948c-472b-9111-8549fda955a1", + "metadata": {}, + "source": [ + "Note that there are two process for the online server (master and worker)." + ] + }, + { + "cell_type": "markdown", + "id": "8258e7a8-5f6e-4737-93ee-63591518b169", + "metadata": {}, + "source": [ + "### Materialize Features to the Online Store" + ] + }, + { + "cell_type": "markdown", + "id": "21b354ab-ec22-476d-8fd9-6ffe0f3fbacb", + "metadata": {}, + "source": [ + "At this point, there is no data in the online store yet. Let's use the SDK feature store object (that we created above) to \"materialize\" data; this is Feast lingo for moving/updating data from the offline store to the online store." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "ff6146df-03a7-4ac2-a665-ee5f440c3605", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:root:_list_feature_views will make breaking changes. Please use _list_batch_feature_views instead. _list_feature_views will behave like _list_all_feature_views in the future.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Materializing \u001b[1m\u001b[32m2\u001b[0m feature views from \u001b[1m\u001b[32m2023-09-24 12:00:00-06:00\u001b[0m to \u001b[1m\u001b[32m2024-01-07 12:00:00-07:00\u001b[0m into the \u001b[1m\u001b[32msqlite\u001b[0m online store.\n", + "\n", + "\u001b[1m\u001b[32mdata_a\u001b[0m:\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 0%| | 0/1000 [00:00=7\",\"4<=X<7\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"0<=X<200\",\"no checking\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"existing paid\",\"critical/other existing credit\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"female div/dep/mar\",\"male mar/wid\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[12579.0,2463.0],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"none\",\"none\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"used car\",\"new car\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[24.0,24.0],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[4.0,4.0],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"yes\",\"yes\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[2.0,3.0],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[1.0,1.0],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[44.0,27.0],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"none\",\"none\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"for free\",\"own\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[1.0,2.0],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"high qualif/self emp/mgmt\",\"skilled\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"no known property\",\"life insurance\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]},{\"values\":[\"yes\",\"yes\"],\"statuses\":[\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"2023-09-25T01:03:47Z\",\"2023-09-29T03:17:24Z\"]}]}']" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response" + ] + }, + { + "cell_type": "markdown", + "id": "01d20196-1d42-486d-a0bd-97193c953785", + "metadata": {}, + "source": [ + "The `curl` command gave us a quick validation. In the [04_Credit_Risk_Model_Serving.ipynb](04_Credit_Risk_Model_Serving.ipynb) notebook, we'll use the Python `requests` library to handle the query better." + ] + }, + { + "cell_type": "markdown", + "id": "d74a5117-dd34-4dde-93a8-ea6e8c4c545a", + "metadata": {}, + "source": [ + "Now that the feature stores and their respective servers have been configured and deployed, we can proceed to train an AI model in [03_Credit_Risk_Model_Training.ipynb](03_Credit_Risk_Model_Training.ipynb)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/credit-risk-end-to-end/03_Credit_Risk_Model_Training.ipynb b/examples/credit-risk-end-to-end/03_Credit_Risk_Model_Training.ipynb new file mode 100644 index 00000000000..ca0d0e29d95 --- /dev/null +++ b/examples/credit-risk-end-to-end/03_Credit_Risk_Model_Training.ipynb @@ -0,0 +1,1541 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "54f2ab19-68e1-4725-b6e7-efd8eedebe1a", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "id": "69a40de4-65cf-4b45-b321-2b7ce571f8cb", + "metadata": {}, + "source": [ + "# Credit Risk Model Training" + ] + }, + { + "cell_type": "markdown", + "id": "fe641d83-1e28-4f7f-895c-8ca038f6cc53", + "metadata": {}, + "source": [ + "### Introduction" + ] + }, + { + "cell_type": "markdown", + "id": "8f04f635-401b-47b6-b807-df61d42ec752", + "metadata": {}, + "source": [ + "AI models have played a central role in modern credit risk assessment systems. In this example, we develop a credit risk model to predict whether a future loan will be good or bad, given some context data (presumably supplied from the loan application process). We use the modeling process to demonstrate how Feast can be used to facilitate the serving of data for training and inference use-cases.\n", + "\n", + "In this notebook, we train our AI model. We will use the popular scikit-learn library (sklearn) to train a RandomForestClassifier, as this is a relatively easy choice for a baseline model." + ] + }, + { + "cell_type": "markdown", + "id": "a96bf1aa-c450-4201-83a4-e25b08bdd12d", + "metadata": {}, + "source": [ + "### Setup" + ] + }, + { + "cell_type": "markdown", + "id": "a47b33bc-bc06-4de0-8f3a-beea8179035c", + "metadata": {}, + "source": [ + "*The following code assumes that you have read the example README.md file, and that you have setup an environment where the code can be run. Please make sure you have addressed the prerequisite needs.*" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "c66a3dab-fdbf-40be-8227-6180dc314a84", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "import warnings\n", + "import datetime\n", + "import feast\n", + "import joblib\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "\n", + "from feast import FeatureStore, RepoConfig\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import OrdinalEncoder\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.metrics import classification_report" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "2a841445-fa47-4826-a874-28ac0e4ea57f", + "metadata": {}, + "outputs": [], + "source": [ + "# Ignore warnings\n", + "warnings.filterwarnings(action=\"ignore\")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "23579727-7797-4101-a70d-b0d4c24b0fdf", + "metadata": {}, + "outputs": [], + "source": [ + "# Random seed\n", + "SEED = 142" + ] + }, + { + "cell_type": "markdown", + "id": "fc5be519-7733-449b-8dc3-411e86371315", + "metadata": {}, + "source": [ + "This notebook assumes that you have already done the following:\n", + "\n", + "1. Run the [01_Credit_Risk_Data_Prep.ipynb](01_Credit_Risk_Data_Prep.ipynb) notebook to prepare the data.\n", + "2. Run the [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb) notebook to configure the feature stores and launch the feature store servers.\n", + "\n", + "If you have not completed the above steps, please go back and do so before continuing. This notebook relies on the data prepared by 1, and it uses the Feast offline server stood up by 2." + ] + }, + { + "cell_type": "markdown", + "id": "1ca99047-e508-4b1f-9f4c-f11e38587d70", + "metadata": {}, + "source": [ + "### Load Label (Outcome) Data" + ] + }, + { + "cell_type": "markdown", + "id": "89b49268-b7a5-4abc-8d82-1cdbf9bb4473", + "metadata": {}, + "source": [ + "From our previous data exploration, remember that the label data represents whether the loan was classed as \"good\" (1) or \"bad\" (0). Let's pull the labels for training, as we will use them as our \"entity dataframe\" when pulling features.\n", + "\n", + "This is also a good time to remember that the label timestamps are lagged by 30-90 days from the context data records." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "6a227a12-7b3e-462a-8f6e-38a7690df1c4", + "metadata": {}, + "outputs": [], + "source": [ + "labels = pd.read_parquet(\"Feature_Store/data/labels.parquet\")" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "31a39cad-0a85-4d98-ad95-008c81bb6fe0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDclassoutcome_timestamp
00good2023-11-24 22:50:13
11bad2023-11-03 12:10:13
22good2023-11-30 22:06:03
33good2023-11-17 07:37:19
44bad2023-12-01 05:01:48
\n", + "
" + ], + "text/plain": [ + " ID class outcome_timestamp\n", + "0 0 good 2023-11-24 22:50:13\n", + "1 1 bad 2023-11-03 12:10:13\n", + "2 2 good 2023-11-30 22:06:03\n", + "3 3 good 2023-11-17 07:37:19\n", + "4 4 bad 2023-12-01 05:01:48" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "labels.head()" + ] + }, + { + "cell_type": "markdown", + "id": "857f29fd-46d3-444b-b24f-eaccd82ab7d3", + "metadata": {}, + "source": [ + "### Pull Feature Data from Feast Offline Store" + ] + }, + { + "cell_type": "markdown", + "id": "07c13b69-3d26-484c-97cd-97734cc812bd", + "metadata": {}, + "source": [ + "In order to pull feature data from the offline store, we create a FeatureStore object that connects to the offline server (continuously running in the previous notebook)." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "9e9828f8-f210-4586-ac36-3f7e17f4f1e8", + "metadata": {}, + "outputs": [], + "source": [ + "# Create FeatureStore object\n", + "# (connects to the offline server deployed in 02_Deploying_the_Feature_Store.ipynb) \n", + "store = FeatureStore(config=RepoConfig(\n", + " project=\"loan_applications\",\n", + " provider=\"local\",\n", + " registry=\"Feature_Store/data/registry.db\",\n", + " offline_store={\n", + " \"type\": \"remote\",\n", + " \"host\": \"localhost\",\n", + " \"port\": 8815\n", + " },\n", + " entity_key_serialization_version=2\n", + "))" + ] + }, + { + "cell_type": "markdown", + "id": "c007e7ca-40c1-4850-abed-73b6171ad08d", + "metadata": {}, + "source": [ + "Now, we can retrieve feature data by supplying our entity dataframe and feature specifications to the `get_historical_features` function. Note that this function performs a fuzzy lookback (\"point-in-time\") join, matching the lagged outcome timestamp to the closest application timestamp (per ID) in the context data; it also joins the \"a\" and \"b\" features that we had previously split into two tables.\n", + "\n", + "To keep this example simple, we will limit our feature set to the numerical features plus two categorical features." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "dd2e3cb5-c865-48f4-80b6-8a14a1ff09ab", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:root:_list_feature_views will make breaking changes. Please use _list_batch_feature_views instead. _list_feature_views will behave like _list_all_feature_views in the future.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using outcome_timestamp as the event timestamp. To specify a column explicitly, please name it event_timestamp.\n" + ] + } + ], + "source": [ + "# Get feature data\n", + "# (Joins a and b data, and selects records with the right timestamps)\n", + "df = store.get_historical_features(\n", + " entity_df=labels,\n", + " features=[\n", + " \"data_a:duration\",\n", + " \"data_a:credit_amount\",\n", + " \"data_a:installment_commitment\",\n", + " \"data_a:checking_status\",\n", + " \"data_b:residence_since\",\n", + " \"data_b:age\",\n", + " \"data_b:existing_credits\",\n", + " \"data_b:num_dependents\",\n", + " \"data_b:housing\"\n", + " ]\n", + ").to_df()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "c72f6cb1-bbbf-4512-98cd-0abe5ff0c24b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 1000 entries, 0 to 999\n", + "Data columns (total 12 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 ID 1000 non-null int64 \n", + " 1 class 1000 non-null category \n", + " 2 outcome_timestamp 1000 non-null datetime64[ns, UTC]\n", + " 3 duration 1000 non-null int64 \n", + " 4 credit_amount 1000 non-null int64 \n", + " 5 installment_commitment 1000 non-null int64 \n", + " 6 checking_status 1000 non-null category \n", + " 7 residence_since 1000 non-null int64 \n", + " 8 age 1000 non-null int64 \n", + " 9 existing_credits 1000 non-null int64 \n", + " 10 num_dependents 1000 non-null int64 \n", + " 11 housing 1000 non-null category \n", + "dtypes: category(3), datetime64[ns, UTC](1), int64(8)\n", + "memory usage: 73.8 KB\n" + ] + } + ], + "source": [ + "# Check the data info\n", + "df.info()" + ] + }, + { + "cell_type": "markdown", + "id": "110ea48c-0a5a-4642-aaba-a9eeb4a7da48", + "metadata": {}, + "source": [ + "### Split the Data" + ] + }, + { + "cell_type": "markdown", + "id": "f6669dce-a8b0-4d80-9a15-70b7dfd2d718", + "metadata": {}, + "source": [ + "Next, we split the data into a `train` and `validate` set, which we will use to train and then validate a model. The validation set will allow us to more accurately assess the model's performance on data that it has not seen during the training phase." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "036b0a54-48e4-4414-bb8c-0c30b6ab7469", + "metadata": {}, + "outputs": [], + "source": [ + "# Split data into train and validate datasets\n", + "train, validate = train_test_split(df, test_size=0.2, random_state=SEED)" + ] + }, + { + "cell_type": "markdown", + "id": "4b65cbf7-5981-4f51-97aa-a3ff7027f2f3", + "metadata": {}, + "source": [ + "### Exploratory Data Analysis" + ] + }, + { + "cell_type": "markdown", + "id": "e516ded8-10ad-4274-a736-f288290b5883", + "metadata": {}, + "source": [ + "Before building a model, a data scientist needs to gain understanding of the data to make sure it meets important statistical assumptions, and to identify potential opportunities and issues. As the purpose of this particular example is to show working with Feast, we will take the view of a data scientist looking to build a quick baseline model to establish some low-end metrics.\n", + "\n", + "Note that this data set is very \"clean\", as it has already been prepared. In real-life, production credit risk data can be much more complex, and have many issues that need to be understood and addressed before modeling." + ] + }, + { + "cell_type": "markdown", + "id": "553986a0-c804-4ab4-a4b9-48b16c72fd4f", + "metadata": {}, + "source": [ + "Let's look at counts for the target variable `class`, which tells us whether a (historical) loan was good or bad. We can see that there were many more good loans than bad, making the dataset imbalanced." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "607bd29b-eaf4-41a6-aaca-a8eaaf37e2d2", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAysElEQVR4nO3deVxV5d7///dmRmWDoIKWqJWKOHZw2k1qkWRkeWvllKlHG8FKy2PcOWLedqwcQ6tTqWWm2WBq5kRZHcVSTFNT1MqwFCgVtnoUBNbvj37sb/ugpQhsvHw9H4/1yHVd11rrc+3d1jdr2Ngsy7IEAABgKC9PFwAAAFCRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIO8AlyGazafz48Z4u45LD6wZcngg7QDmYN2+ebDab21KnTh116dJFn3zyiafLq3T5+fmaNWuWbrjhBtWsWVN+fn6qV6+e7rzzTr3zzjsqKirydInndODAAdlsNr3wwgueLuWCOZ1OTZgwQa1bt1aNGjUUGBioFi1aaNSoUTp06JCny5MkrVy5ksCJSufj6QIAkyQnJ6tRo0ayLEvZ2dmaN2+ebr/9di1fvlx33HGHp8urFL/++qu6deum9PR0xcXFafTo0QoNDVVWVpbWrVunfv36af/+/RozZoynSzXKDz/8oNjYWGVmZuqee+7Rgw8+KD8/P3377bd6/fXX9eGHH2rv3r2eLlMrV65USkoKgQeVirADlKNu3bqpbdu2rvUhQ4YoPDxc77zzzmUTdgYMGKBvvvlG77//vnr27OnWl5SUpC1btigjI8ND1ZmpsLBQPXv2VHZ2ttavX68bbrjBrX/SpEn65z//6aHqAM/jMhZQgUJCQhQYGCgfH/efK1544QVdd911CgsLU2BgoGJiYvTee++V2j4/P1/Dhw9X7dq1FRQUpDvvvFM///zzXx43OztbPj4+mjBhQqm+jIwM2Ww2vfTSS5KkM2fOaMKECWrcuLECAgIUFhamG264QWvXrr3g+aalpWn16tV68MEHSwWdEm3btlX//v3d2nJyclzBMCAgQK1bt9b8+fNLbXvy5Ek9+eSTql+/vvz9/dW0aVO98MILsizLbVxZX7cLcb41n+97bbPZlJiYqKVLl6pFixby9/dX8+bNtWrVqr+s5f3339f27dv1zDPPlAo6kmS32zVp0iS3tiVLligmJkaBgYGqVauW7rvvPv3yyy9uYzp37qzOnTuX2t+gQYPUsGFD1/ofL/29+uqruvrqq+Xv76927dpp8+bNbtulpKS45luyABWNMztAOcrLy9Nvv/0my7KUk5OjWbNm6cSJE7rvvvvcxs2YMUN33nmn+vfvr4KCAi1atEj33HOPVqxYofj4eNe4oUOHasGCBerXr5+uu+46ffrpp2795xIeHq5OnTrp3Xff1bhx49z6Fi9eLG9vb91zzz2SpPHjx2vy5MkaOnSo2rdvL6fTqS1btmjr1q269dZbL2j+y5cvl6RS8/0zp06dUufOnbV//34lJiaqUaNGWrJkiQYNGqTc3Fw9/vjjkiTLsnTnnXfqs88+05AhQ9SmTRutXr1aI0eO1C+//KJp06a59lnW1628a5bO/72WpH//+9/64IMP9OijjyooKEgzZ85Ur169lJmZqbCwsHPWs2zZMkm/n1U7H/PmzdPgwYPVrl07TZ48WdnZ2ZoxY4Y2bNigb775RiEhIRf+okhauHChjh8/roceekg2m01TpkxRz5499cMPP8jX11cPPfSQDh06pLVr1+qtt94q0zGAMrEAXLS5c+dakkot/v7+1rx580qN/89//uO2XlBQYLVo0cK6+eabXW3btm2zJFmPPvqo29h+/fpZkqxx48b9aU2vvPKKJcnasWOHW3t0dLTbcVq3bm3Fx8ef71T/1P/8z/9Ykqzc3Fy39lOnTlm//vqrazl27Jirb/r06ZYka8GCBa62goICy+FwWDVq1LCcTqdlWZa1dOlSS5L17LPPuu377rvvtmw2m7V//37Lsi7+dfvxxx8tSdbzzz9/zjHnW7Nlnd97bVmWJcny8/NzzcOyLGv79u2WJGvWrFl/WvO1115rBQcH/+mYPx6/Tp06VosWLaxTp0652lesWGFJssaOHetq69Spk9WpU6dS+xg4cKDVoEED13rJaxYWFmYdPXrU1f7RRx9Zkqzly5e72hISEiz+6UFl4zIWUI5SUlK0du1arV27VgsWLFCXLl00dOhQffDBB27jAgMDXX8+duyY8vLydOONN2rr1q2u9pUrV0qSHnvsMbdtn3jiifOqpWfPnvLx8dHixYtdbTt37tR3332n3r17u9pCQkK0a9cu7du377zneS5Op1OSVKNGDbf2l19+WbVr13Ytf7zUsnLlSkVERKhv376uNl9fXz322GM6ceKEPv/8c9c4b2/vUq/Hk08+KcuyXE+9Xezrdj7Ot2bp/N7rErGxsbr66qtd661atZLdbtcPP/zwp/U4nU4FBQWdV+1btmxRTk6OHn30UQUEBLja4+PjFRUVpY8//vi89nM2vXv3Vs2aNV3rN954oyT9Zf1ARSPsAOWoffv2io2NVWxsrPr376+PP/5Y0dHRSkxMVEFBgWvcihUr1LFjRwUEBCg0NFS1a9fWnDlzlJeX5xrz008/ycvLy+0fP0lq2rTpedVSq1Yt3XLLLXr33XddbYsXL5aPj4/b/TTJycnKzc1VkyZN1LJlS40cOVLffvttmeZf8g/uiRMn3Np79erlCoGtWrVy6/vpp5/UuHFjeXm5/3XUrFkzV3/Jf+vVq1fqH/WzjbuY1+18nG/N0vm91yUiIyNLtdWsWVPHjh3703rsdruOHz9+3rVLZ389oqKi3Gq/UP9df0nw+av6gYpG2AEqkJeXl7p06aLDhw+7zpx8+eWXuvPOOxUQEKDZs2dr5cqVWrt2rfr161fqRtuL1adPH+3du1fbtm2TJL377ru65ZZbVKtWLdeYm266Sd9//73eeOMNtWjRQq+99pr+9re/6bXXXrvg40VFRUn6/QzSH9WvX98VAv/4k7/pLvS99vb2Put+/ur/i6ioKOXl5engwYPlUneJc908fK7vSSpr/UBFI+wAFaywsFDS/zvb8f777ysgIECrV6/W3//+d3Xr1k2xsbGltmvQoIGKi4v1/fffu7VfyGPbPXr0kJ+fnxYvXqxt27Zp79696tOnT6lxoaGhGjx4sN555x0dPHhQrVq1KtP3oJQ8Xv/222+f9zYNGjTQvn37VFxc7Na+Z88eV3/Jfw8dOlTqDMbZxl3s61ZeNZ/ve32xunfvLklasGDBX44tqe1sr0dGRoarX/r9zExubm6pcRdz9oenr+AJhB2gAp05c0Zr1qyRn5+f6xKHt7e3bDab20/HBw4c0NKlS9227datmyRp5syZbu3Tp08/7+OHhIQoLi5O7777rhYtWiQ/Pz/16NHDbcyRI0fc1mvUqKFrrrlG+fn5rra8vDzt2bPnrJde/uj666/XrbfeqldffVUfffTRWcf890/5t99+u7KystzuLSosLNSsWbNUo0YNderUyTWuqKjI9ch8iWnTpslms7ler/J43f7K+dZ8vu/1xbr77rvVsmVLTZo0SWlpaaX6jx8/rmeeeUbS74/+16lTRy+//LLbe/zJJ59o9+7dbk+IXX311dqzZ49+/fVXV9v27du1YcOGMtdavXp1STpriAIqCo+eA+Xok08+cf10n5OTo4ULF2rfvn16+umnZbfbJf1+I+jUqVN12223qV+/fsrJyVFKSoquueYat3tl2rRpo759+2r27NnKy8vTddddp9TUVO3fv/+Caurdu7fuu+8+zZ49W3FxcaUeK46Ojlbnzp0VExOj0NBQbdmyRe+9954SExNdYz788EMNHjxYc+fO1aBBg/70eAsWLNBtt92mHj16uM5k1KxZ0/UNyl988YUrkEjSgw8+qFdeeUWDBg1Senq6GjZsqPfee08bNmzQ9OnTXffodO/eXV26dNEzzzyjAwcOqHXr1lqzZo0++ugjPfHEE657dMrrdUtNTdXp06dLtffo0eO8az7f9/pi+fr66oMPPlBsbKxuuukm3Xvvvbr++uvl6+urXbt2aeHChapZs6YmTZokX19f/fOf/9TgwYPVqVMn9e3b1/XoecOGDTV8+HDXfv/+979r6tSpiouL05AhQ5STk6OXX35ZzZs3d92MfqFiYmIk/X4DeVxcnLy9vc96thEoV558FAwwxdkePQ8ICLDatGljzZkzxyouLnYb//rrr1uNGze2/P39raioKGvu3LnWuHHjSj2Se+rUKeuxxx6zwsLCrOrVq1vdu3e3Dh48eF6PUJdwOp1WYGBgqUelSzz77LNW+/btrZCQECswMNCKioqyJk2aZBUUFJSa39y5c8/rmKdOnbKmT59uORwOy263Wz4+PlZERIR1xx13WG+//bZVWFjoNj47O9saPHiwVatWLcvPz89q2bLlWY91/Phxa/jw4Va9evUsX19fq3Hjxtbzzz9f6vW9mNet5DHqcy1vvfXWBdV8vu+1JCshIaHU9g0aNLAGDhz4pzWXOHbsmDV27FirZcuWVrVq1ayAgACrRYsWVlJSknX48GG3sYsXL7auvfZay9/f3woNDbX69+9v/fzzz6X2uWDBAuuqq66y/Pz8rDZt2lirV68+56PnZ3tc/79f88LCQmvYsGFW7dq1LZvNxmPoqBQ2y+LOMQAAYC7u2QEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphR79/o6vT6eT3twAAYCDCjn7/KvXg4ODz/q3BAADg0kHYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNF8PF0AAFSW02eKlHn0P54uA7gsRIZWU4Cvt6fLkETYAXAZyTz6H439aKenywAuC8l3tVCT8CBPlyGJy1gAAMBwhB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDSPhp3x48fLZrO5LVFRUa7+06dPKyEhQWFhYapRo4Z69eql7Oxst31kZmYqPj5e1apVU506dTRy5EgVFhZW9lQAAEAV5ePpApo3b65169a51n18/l9Jw4cP18cff6wlS5YoODhYiYmJ6tmzpzZs2CBJKioqUnx8vCIiIrRx40YdPnxY999/v3x9ffV///d/lT4XAABQ9Xg87Pj4+CgiIqJUe15enl5//XUtXLhQN998syRp7ty5atasmTZt2qSOHTtqzZo1+u6777Ru3TqFh4erTZs2mjhxokaNGqXx48fLz8+vsqcDAACqGI/fs7Nv3z7Vq1dPV111lfr376/MzExJUnp6us6cOaPY2FjX2KioKEVGRiotLU2SlJaWppYtWyo8PNw1Ji4uTk6nU7t27TrnMfPz8+V0Ot0WAABgJo+GnQ4dOmjevHlatWqV5syZox9//FE33nijjh8/rqysLPn5+SkkJMRtm/DwcGVlZUmSsrKy3IJOSX9J37lMnjxZwcHBrqV+/frlOzEAAFBlePQyVrdu3Vx/btWqlTp06KAGDRro3XffVWBgYIUdNykpSSNGjHCtO51OAg8AAIby+GWsPwoJCVGTJk20f/9+RUREqKCgQLm5uW5jsrOzXff4RERElHo6q2T9bPcBlfD395fdbndbAACAmapU2Dlx4oS+//571a1bVzExMfL19VVqaqqrPyMjQ5mZmXI4HJIkh8OhHTt2KCcnxzVm7dq1stvtio6OrvT6AQBA1ePRy1hPPfWUunfvrgYNGujQoUMaN26cvL291bdvXwUHB2vIkCEaMWKEQkNDZbfbNWzYMDkcDnXs2FGS1LVrV0VHR2vAgAGaMmWKsrKyNHr0aCUkJMjf39+TUwMAAFWER8POzz//rL59++rIkSOqXbu2brjhBm3atEm1a9eWJE2bNk1eXl7q1auX8vPzFRcXp9mzZ7u29/b21ooVK/TII4/I4XCoevXqGjhwoJKTkz01JQAAUMXYLMuyPF2EpzmdTgUHBysvL4/7dwCD7c0+rrEf7fR0GcBlIfmuFmoSHuTpMiRVsXt2AAAAyhthBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjFZlws5zzz0nm82mJ554wtV2+vRpJSQkKCwsTDVq1FCvXr2UnZ3ttl1mZqbi4+NVrVo11alTRyNHjlRhYWElVw8AAKqqKhF2Nm/erFdeeUWtWrVyax8+fLiWL1+uJUuW6PPPP9ehQ4fUs2dPV39RUZHi4+NVUFCgjRs3av78+Zo3b57Gjh1b2VMAAABVlMfDzokTJ9S/f3/961//Us2aNV3teXl5ev311zV16lTdfPPNiomJ0dy5c7Vx40Zt2rRJkrRmzRp99913WrBggdq0aaNu3bpp4sSJSklJUUFBgaemBAAAqhCPh52EhATFx8crNjbWrT09PV1nzpxxa4+KilJkZKTS0tIkSWlpaWrZsqXCw8NdY+Li4uR0OrVr165zHjM/P19Op9NtAQAAZvLx5MEXLVqkrVu3avPmzaX6srKy5Ofnp5CQELf28PBwZWVlucb8MeiU9Jf0ncvkyZM1YcKEi6weAABcCjx2ZufgwYN6/PHH9fbbbysgIKBSj52UlKS8vDzXcvDgwUo9PgAAqDweCzvp6enKycnR3/72N/n4+MjHx0eff/65Zs6cKR8fH4WHh6ugoEC5ublu22VnZysiIkKSFBERUerprJL1kjFn4+/vL7vd7rYAAAAzeSzs3HLLLdqxY4e2bdvmWtq2bav+/fu7/uzr66vU1FTXNhkZGcrMzJTD4ZAkORwO7dixQzk5Oa4xa9euld1uV3R0dKXPCQAAVD0eu2cnKChILVq0cGurXr26wsLCXO1DhgzRiBEjFBoaKrvdrmHDhsnhcKhjx46SpK5duyo6OloDBgzQlClTlJWVpdGjRyshIUH+/v6VPicAAFD1ePQG5b8ybdo0eXl5qVevXsrPz1dcXJxmz57t6vf29taKFSv0yCOPyOFwqHr16ho4cKCSk5M9WDUAAKhKbJZlWZ4uwtOcTqeCg4OVl5fH/TuAwfZmH9fYj3Z6ugzgspB8Vws1CQ/ydBmSqsD37AAAAFQkwg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAo5Up7Fx11VU6cuRIqfbc3FxdddVVF10UAABAeSlT2Dlw4ICKiopKtefn5+uXX3656KIAAADKi8+FDF62bJnrz6tXr1ZwcLBrvaioSKmpqWrYsGG5FQcAAHCxLijs9OjRQ5Jks9k0cOBAtz5fX181bNhQL774YrkVBwAAcLEuKOwUFxdLkho1aqTNmzerVq1aFVIUAABAebmgsFPixx9/LO86AAAAKkSZwo4kpaamKjU1VTk5Oa4zPiXeeOONiy4MAACgPJQp7EyYMEHJyclq27at6tatK5vNVt51AQAAlIsyhZ2XX35Z8+bN04ABA8q7HgAAgHJVpu/ZKSgo0HXXXVfetQAAAJS7MoWdoUOHauHCheVdCwAAQLkr02Ws06dP69VXX9W6devUqlUr+fr6uvVPnTq1XIoDAAC4WGUKO99++63atGkjSdq5c6dbHzcrAwCAqqRMl7E+++yzcy6ffvrpee9nzpw5atWqlex2u+x2uxwOhz755BNX/+nTp5WQkKCwsDDVqFFDvXr1UnZ2tts+MjMzFR8fr2rVqqlOnToaOXKkCgsLyzItAABgoDKFnfJy5ZVX6rnnnlN6erq2bNmim2++WXfddZd27dolSRo+fLiWL1+uJUuW6PPPP9ehQ4fUs2dP1/ZFRUWKj49XQUGBNm7cqPnz52vevHkaO3asp6YEAACqGJtlWdaFbtSlS5c/vVx1IWd3/ltoaKief/553X333apdu7YWLlyou+++W5K0Z88eNWvWTGlpaerYsaM++eQT3XHHHTp06JDCw8Ml/f5Y/KhRo/Trr7/Kz8/vvI7pdDoVHBysvLw82e32MtcOoGrbm31cYz/a+dcDAVy05LtaqEl4kKfLkFTGMztt2rRR69atXUt0dLQKCgq0detWtWzZskyFFBUVadGiRTp58qQcDofS09N15swZxcbGusZERUUpMjJSaWlpkqS0tDS1bNnSFXQkKS4uTk6n03V26Gzy8/PldDrdFgAAYKYy3aA8bdq0s7aPHz9eJ06cuKB97dixQw6HQ6dPn1aNGjX04YcfKjo6Wtu2bZOfn59CQkLcxoeHhysrK0uSlJWV5RZ0SvpL+s5l8uTJmjBhwgXVCQAALk3les/Offfdd8G/F6tp06batm2bvvrqKz3yyCMaOHCgvvvuu/Isq5SkpCTl5eW5loMHD1bo8QAAgOeU+ReBnk1aWpoCAgIuaBs/Pz9dc801kqSYmBht3rxZM2bMUO/evVVQUKDc3Fy3szvZ2dmKiIiQJEVEROjrr79221/J01olY87G399f/v7+F1QnAAC4NJUp7PzxiShJsixLhw8f1pYtWzRmzJiLKqi4uFj5+fmKiYmRr6+vUlNT1atXL0lSRkaGMjMz5XA4JEkOh0OTJk1STk6O6tSpI0lau3at7Ha7oqOjL6oOAABghjKFneDgYLd1Ly8vNW3aVMnJyeratet57ycpKUndunVTZGSkjh8/roULF2r9+vVavXq1goODNWTIEI0YMUKhoaGy2+0aNmyYHA6HOnbsKEnq2rWroqOjNWDAAE2ZMkVZWVkaPXq0EhISOHMDAAAklTHszJ07t1wOnpOTo/vvv1+HDx9WcHCwWrVqpdWrV+vWW2+V9PuN0F5eXurVq5fy8/MVFxen2bNnu7b39vbWihUr9Mgjj8jhcKh69eoaOHCgkpOTy6U+AABw6SvT9+yUSE9P1+7duyVJzZs317XXXltuhVUmvmcHuDzwPTtA5alK37NTpjM7OTk56tOnj9avX++6eTg3N1ddunTRokWLVLt27fKsEQAAoMzK9Oj5sGHDdPz4ce3atUtHjx7V0aNHtXPnTjmdTj322GPlXSMAAECZlenMzqpVq7Ru3To1a9bM1RYdHa2UlJQLukEZAACgopXpzE5xcbF8fX1Ltfv6+qq4uPiiiwIAACgvZQo7N998sx5//HEdOnTI1fbLL79o+PDhuuWWW8qtOAAAgItVprDz0ksvyel0qmHDhrr66qt19dVXq1GjRnI6nZo1a1Z51wgAAFBmZbpnp379+tq6davWrVunPXv2SJKaNWvm9hvKAQAAqoILOrPz6aefKjo6Wk6nUzabTbfeequGDRumYcOGqV27dmrevLm+/PLLiqoVAADggl1Q2Jk+fboeeOCBs37xXnBwsB566CFNnTq13IoDAAC4WBcUdrZv367bbrvtnP1du3ZVenr6RRcFAABQXi4o7GRnZ5/1kfMSPj4++vXXXy+6KAAAgPJyQWHniiuu0M6d5/69Mt9++63q1q170UUBAACUlwsKO7fffrvGjBmj06dPl+o7deqUxo0bpzvuuKPcigMAALhYF/To+ejRo/XBBx+oSZMmSkxMVNOmTSVJe/bsUUpKioqKivTMM89USKEAAABlcUFhJzw8XBs3btQjjzyipKQkWZYlSbLZbIqLi1NKSorCw8MrpFAAAICyuOAvFWzQoIFWrlypY8eOaf/+/bIsS40bN1bNmjUroj4AAICLUqZvUJakmjVrql27duVZCwAAQLkr0+/GAgAAuFQQdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0H08XcLk4faZImUf/4+kygMvGNbVryMvL5ukyAFQBhJ1Kknn0Pxr70U5PlwFcNuYOaq9AP29PlwGgCuAyFgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACM5tGwM3nyZLVr105BQUGqU6eOevTooYyMDLcxp0+fVkJCgsLCwlSjRg316tVL2dnZbmMyMzMVHx+vatWqqU6dOho5cqQKCwsrcyoAAKCK8mjY+fzzz5WQkKBNmzZp7dq1OnPmjLp27aqTJ0+6xgwfPlzLly/XkiVL9Pnnn+vQoUPq2bOnq7+oqEjx8fEqKCjQxo0bNX/+fM2bN09jx471xJQAAEAV49Hfer5q1Sq39Xnz5qlOnTpKT0/XTTfdpLy8PL3++utauHChbr75ZknS3Llz1axZM23atEkdO3bUmjVr9N1332ndunUKDw9XmzZtNHHiRI0aNUrjx4+Xn5+fJ6YGAACqiCp1z05eXp4kKTQ0VJKUnp6uM2fOKDY21jUmKipKkZGRSktLkySlpaWpZcuWCg8Pd42Ji4uT0+nUrl27KrF6AABQFXn0zM4fFRcX64knntD111+vFi1aSJKysrLk5+enkJAQt7Hh4eHKyspyjflj0CnpL+k7m/z8fOXn57vWnU5neU0DAABUMVXmzE5CQoJ27typRYsWVfixJk+erODgYNdSv379Cj8mAADwjCoRdhITE7VixQp99tlnuvLKK13tERERKigoUG5urtv47OxsRUREuMb899NZJeslY/5bUlKS8vLyXMvBgwfLcTYAAKAq8WjYsSxLiYmJ+vDDD/Xpp5+qUaNGbv0xMTHy9fVVamqqqy0jI0OZmZlyOBySJIfDoR07dignJ8c1Zu3atbLb7YqOjj7rcf39/WW3290WAABgJo/es5OQkKCFCxfqo48+UlBQkOsem+DgYAUGBio4OFhDhgzRiBEjFBoaKrvdrmHDhsnhcKhjx46SpK5duyo6OloDBgzQlClTlJWVpdGjRyshIUH+/v6enB4AAKgCPBp25syZI0nq3LmzW/vcuXM1aNAgSdK0adPk5eWlXr16KT8/X3FxcZo9e7ZrrLe3t1asWKFHHnlEDodD1atX18CBA5WcnFxZ0wAAAFWYR8OOZVl/OSYgIEApKSlKSUk555gGDRpo5cqV5VkaAAAwRJW4QRkAAKCiEHYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBoHg07X3zxhbp376569erJZrNp6dKlbv2WZWns2LGqW7euAgMDFRsbq3379rmNOXr0qPr37y+73a6QkBANGTJEJ06cqMRZAACAqsyjYefkyZNq3bq1UlJSzto/ZcoUzZw5Uy+//LK++uorVa9eXXFxcTp9+rRrTP/+/bVr1y6tXbtWK1as0BdffKEHH3ywsqYAAACqOB9PHrxbt27q1q3bWfssy9L06dM1evRo3XXXXZKkN998U+Hh4Vq6dKn69Omj3bt3a9WqVdq8ebPatm0rSZo1a5Zuv/12vfDCC6pXr16lzQUAAFRNVfaenR9//FFZWVmKjY11tQUHB6tDhw5KS0uTJKWlpSkkJMQVdCQpNjZWXl5e+uqrryq9ZgAAUPV49MzOn8nKypIkhYeHu7WHh4e7+rKyslSnTh23fh8fH4WGhrrGnE1+fr7y8/Nd606ns7zKBgAAVUyVPbNTkSZPnqzg4GDXUr9+fU+XBAAAKkiVDTsRERGSpOzsbLf27OxsV19ERIRycnLc+gsLC3X06FHXmLNJSkpSXl6eazl48GA5Vw8AAKqKKht2GjVqpIiICKWmprranE6nvvrqKzkcDkmSw+FQbm6u0tPTXWM+/fRTFRcXq0OHDufct7+/v+x2u9sCAADM5NF7dk6cOKH9+/e71n/88Udt27ZNoaGhioyM1BNPPKFnn31WjRs3VqNGjTRmzBjVq1dPPXr0kCQ1a9ZMt912mx544AG9/PLLOnPmjBITE9WnTx+exAIAAJI8HHa2bNmiLl26uNZHjBghSRo4cKDmzZunf/zjHzp58qQefPBB5ebm6oYbbtCqVasUEBDg2ubtt99WYmKibrnlFnl5ealXr16aOXNmpc8FAABUTR4NO507d5ZlWefst9lsSk5OVnJy8jnHhIaGauHChRVRHgAAMECVvWcHAACgPBB2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBohB0AAGA0wg4AADAaYQcAABiNsAMAAIxG2AEAAEYj7AAAAKMRdgAAgNEIOwAAwGiEHQAAYDTCDgAAMBphBwAAGI2wAwAAjEbYAQAARiPsAAAAoxF2AACA0Qg7AADAaMaEnZSUFDVs2FABAQHq0KGDvv76a0+XBAAAqgAjws7ixYs1YsQIjRs3Tlu3blXr1q0VFxennJwcT5cGAAA8zGZZluXpIi5Whw4d1K5dO7300kuSpOLiYtWvX1/Dhg3T008//ZfbO51OBQcHKy8vT3a7vUJqPH2mSJlH/1Mh+wZQ2jW1a8jLy+bWxucQqDyRodUU4Ovt6TIkST6eLuBiFRQUKD09XUlJSa42Ly8vxcbGKi0tzYOVuQvw9VaT8CBPlwFc1vgcApenSz7s/PbbbyoqKlJ4eLhbe3h4uPbs2XPWbfLz85Wfn+9az8vLk/T7GR4AAHBpCQoKks1mO2f/JR92ymLy5MmaMGFCqfb69et7oBoAAHAx/uo2lEs+7NSqVUve3t7Kzs52a8/OzlZERMRZt0lKStKIESNc68XFxTp69KjCwsL+NBni8uN0OlW/fn0dPHiwwu7nAnBufAZxPoKC/vzy9CUfdvz8/BQTE6PU1FT16NFD0u/hJTU1VYmJiWfdxt/fX/7+/m5tISEhFVwpLmV2u52/aAEP4jOIi3HJhx1JGjFihAYOHKi2bduqffv2mj59uk6ePKnBgwd7ujQAAOBhRoSd3r1769dff9XYsWOVlZWlNm3aaNWqVaVuWgYAAJcfI8KOJCUmJp7zshVQVv7+/ho3blypy54AKgefQZQHI75UEAAA4FyM+HURAAAA50LYAQAARiPsAAAAoxF2cNnp3LmznnjiiXLd5/r162Wz2ZSbm1uu+wVwcRo2bKjp06d7ugx4GGEHAAAYjbADAACMRtjBZamwsFCJiYkKDg5WrVq1NGbMGJV8C8Nbb72ltm3bKigoSBEREerXr59ycnLctl+5cqWaNGmiwMBAdenSRQcOHPDALIBLx/Hjx9W/f39Vr15ddevW1bRp09wuKR87dkz333+/atasqWrVqqlbt27at2+f2z7ef/99NW/eXP7+/mrYsKFefPFFt/6cnBx1795dgYGBatSokd5+++3Kmh6qOMIOLkvz58+Xj4+Pvv76a82YMUNTp07Va6+9Jkk6c+aMJk6cqO3bt2vp0qU6cOCABg0a5Nr24MGD6tmzp7p3765t27Zp6NChevrppz00E+DSMGLECG3YsEHLli3T2rVr9eWXX2rr1q2u/kGDBmnLli1atmyZ0tLSZFmWbr/9dp05c0aSlJ6ernvvvVd9+vTRjh07NH78eI0ZM0bz5s1z28fBgwf12Wef6b333tPs2bNL/aCCy5QFXGY6depkNWvWzCouLna1jRo1ymrWrNlZx2/evNmSZB0/ftyyLMtKSkqyoqOj3caMGjXKkmQdO3aswuoGLlVOp9Py9fW1lixZ4mrLzc21qlWrZj3++OPW3r17LUnWhg0bXP2//fabFRgYaL377ruWZVlWv379rFtvvdVtvyNHjnR9FjMyMixJ1tdff+3q3717tyXJmjZtWgXODpcCzuzgstSxY0fZbDbXusPh0L59+1RUVKT09HR1795dkZGRCgoKUqdOnSRJmZmZkqTdu3erQ4cObvtzOByVVzxwifnhhx905swZtW/f3tUWHByspk2bSvr9M+Xj4+P2uQoLC1PTpk21e/du15jrr7/ebb/XX3+963Nbso+YmBhXf1RUlEJCQipwZrhUEHaAPzh9+rTi4uJkt9v19ttva/Pmzfrwww8lSQUFBR6uDgBQFoQdXJa++uort/VNmzapcePG2rNnj44cOaLnnntON954o6Kiokpd82/WrJm+/vrrUtsDOLurrrpKvr6+2rx5s6stLy9Pe/fulfT7Z6qwsNDtc3nkyBFlZGQoOjraNWbDhg1u+92wYYOaNGkib29vRUVFqbCwUOnp6a7+jIwMvvsKkgg7uExlZmZqxIgRysjI0DvvvKNZs2bp8ccfV2RkpPz8/DRr1iz98MMPWrZsmSZOnOi27cMPP6x9+/Zp5MiRysjI0MKFC91ukgTgLigoSAMHDtTIkSP12WefadeuXRoyZIi8vLxks9nUuHFj3XXXXXrggQf073//W9u3b9d9992nK664QnfddZck6cknn1RqaqomTpyovXv3av78+XrppZf01FNPSZKaNm2q2267TQ899JC++uorpaena+jQoQoMDPTk1FFVePqmIaCyderUyXr00Uethx9+2LLb7VbNmjWt//3f/3XdsLxw4UKrYcOGlr+/v+VwOKxly5ZZkqxvvvnGtY/ly5db11xzjeXv72/deOON1htvvMENysCfcDqdVr9+/axq1apZERER1tSpU6327dtbTz/9tGVZlnX06FFrwIABVnBwsBUYGGjFxcVZe/fuddvHe++9Z0VHR1u+vr5WZGSk9fzzz7v1Hz582IqPj7f8/f2tyMhI680337QaNGjADcqwbJb1/3+5CAAAleTkyZO64oor9OKLL2rIkCGeLgeG8/F0AQAA833zzTfas2eP2rdvr7y8PCUnJ0uS6zIVUJEIOwCASvHCCy8oIyNDfn5+iomJ0ZdffqlatWp5uixcBriMBQAAjMbTWAAAwGiEHQAAYDTCDgAAMBphBwAAGI2wA+CSdeDAAdlsNm3bts3TpQCowgg7AADAaIQdAABgNMIOgCqvuLhYU6ZM0TXXXCN/f39FRkZq0qRJpcYVFRVpyJAhatSokQIDA9W0aVPNmDHDbcz69evVvn17Va9eXSEhIbr++uv1008/SZK2b9+uLl26KCgoSHa7XTExMdqyZUulzBFAxeEblAFUeUlJSfrXv/6ladOm6YYbbtDhw4e1Z8+eUuOKi4t15ZVXasmSJQoLC9PGjRv14IMPqm7durr33ntVWFioHj166IEHHtA777yjgoICff3117LZbJKk/v3769prr9WcOXPk7e2tbdu2ydfXt7KnC6Cc8Q3KAKq048ePq3bt2nrppZc0dOhQt74DBw6oUaNG+uabb9SmTZuzbp+YmKisrCy99957Onr0qMLCwrR+/Xp16tSp1Fi73a5Zs2Zp4MCBFTEVAB7CZSwAVdru3buVn5+vW2655bzGp6SkKCYmRrVr11aNGjX06quvKjMzU5IUGhqqQYMGKS4uTt27d9eMGTN0+PBh17YjRozQ0KFDFRsbq+eee07ff/99hcwJQOUi7ACo0gIDA8977KJFi/TUU09pyJAhWrNmjbZt26bBgweroKDANWbu3LlKS0vTddddp8WLF6tJkybatGmTJGn8+PHatWuX4uPj9emnnyo6Oloffvhhuc8JQOXiMhaAKu306dMKDQ3VzJkz//Iy1rBhw/Tdd98pNTXVNSY2Nla//fbbOb+Lx+FwqF27dpo5c2apvr59++rkyZNatmxZuc4JQOXizA6AKi0gIECjRo3SP/7xD7355pv6/vvvtWnTJr3++uulxjZu3FhbtmzR6tWrtXfvXo0ZM0abN2929f/4449KSkpSWlqafvrpJ61Zs0b79u1Ts2bNdOrUKSUmJmr9+vX66aeftGHDBm3evFnNmjWrzOkCqAA8jQWgyhszZox8fHw0duxYHTp0SHXr1tXDDz9catxDDz2kb775Rr1795bNZlPfvn316KOP6pNPPpEkVatWTXv27NH8+fN15MgR1a1bVwkJCXrooYdUWFioI0eO6P7771d2drZq1aqlnj17asKECZU9XQDljMtYAADAaFzGAgAARiPsAAAAoxF2AACA0Qg7AADAaIQdAABgNMIOAAAwGmEHAAAYjbADAACMRtgBAABGI+wAAACjEXYAAIDRCDsAAMBo/x9dWkm/NZ32CAAAAABJRU5ErkJggg==", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# plot the target variable \"class\"\n", + "p = sns.histplot(train[\"class\"], ec=\"w\", lw=4)\n", + "_ = p.set_title(\"Bad vs. Good Loan Count\")\n", + "_ = p.spines[\"top\"].set_visible(False)\n", + "_ = p.spines[\"right\"].set_visible(False)" + ] + }, + { + "cell_type": "markdown", + "id": "c6a697a5-5709-4a69-b644-62779b4f8bc5", + "metadata": {}, + "source": [ + "Now, view the first few records of the context data." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "79424785-129d-4007-84a5-041b6d38457d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDclassoutcome_timestampdurationcredit_amountinstallment_commitmentchecking_statusresidence_sinceageexisting_creditsnum_dependentshousing
18473good2023-12-16 03:29:12+00:00612384no checking43612own
764894good2023-11-15 23:19:35+00:001811694no checking32921own
504318good2023-11-23 13:03:53+00:00127014no checking23221own
454340good2023-12-26 17:59:37+00:0024574320<=X<20042421for free
453605good2023-12-18 11:27:02+00:002428284<042211own
\n", + "
" + ], + "text/plain": [ + " ID class outcome_timestamp duration credit_amount \\\n", + "18 473 good 2023-12-16 03:29:12+00:00 6 1238 \n", + "764 894 good 2023-11-15 23:19:35+00:00 18 1169 \n", + "504 318 good 2023-11-23 13:03:53+00:00 12 701 \n", + "454 340 good 2023-12-26 17:59:37+00:00 24 5743 \n", + "453 605 good 2023-12-18 11:27:02+00:00 24 2828 \n", + "\n", + " installment_commitment checking_status residence_since age \\\n", + "18 4 no checking 4 36 \n", + "764 4 no checking 3 29 \n", + "504 4 no checking 2 32 \n", + "454 2 0<=X<200 4 24 \n", + "453 4 <0 4 22 \n", + "\n", + " existing_credits num_dependents housing \n", + "18 1 2 own \n", + "764 2 1 own \n", + "504 2 1 own \n", + "454 2 1 for free \n", + "453 1 1 own " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# View first records in training data\n", + "train.head()" + ] + }, + { + "cell_type": "markdown", + "id": "fd52f5bc-aa0f-48db-b356-c52aa7ce3724", + "metadata": {}, + "source": [ + "### Feature Engineering" + ] + }, + { + "cell_type": "markdown", + "id": "3e5b5c02-ad4d-400e-bdac-bfdf2799f575", + "metadata": {}, + "source": [ + "Once data columns have been prepared so that they can be used to train an AI model, it is common to refer to them as \"features\". The process of preparing features is referred to as \"feature engineering\". \n", + "\n", + "Below, we will train a random forest model. Random forests are relatively robust to non-standardized, non-normalized data, making it easier for us to getting started. As such, the numerical columns are ready for a simple baseline training. \n", + "\n", + "We have pulled two categorical columns, wich we will need to engineer into numerical features." + ] + }, + { + "cell_type": "markdown", + "id": "45a6fb27-140c-4f5a-b464-1f5e5d81d086", + "metadata": {}, + "source": [ + "The `checking_status` column tells us roughly how much money the applicant has in their checking account, while the `housing` column shows the applicant's housing status. We presume that more money in checking correlates inversely with credit risk, while owing vs. renting, vs. living for free correlates directly with credit risk. Hence, converting these to ordinal features makes sense. Of course, in a real study we would want to quantitatively verify these presumptions." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "9e374096-b02d-4cbb-8fca-dcc451c90c50", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "checking_status\n", + "no checking 0.39375\n", + "0<=X<200 0.27500\n", + "<0 0.26125\n", + ">=200 0.07000\n", + "Name: proportion, dtype: float64" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Inspect the `checking_status` column distibution\n", + "train.checking_status.value_counts(normalize=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "0144b525-244b-4526-8e4b-d393cb174d06", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "housing\n", + "own 0.7225\n", + "rent 0.1675\n", + "for free 0.1100\n", + "Name: proportion, dtype: float64" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Inspect the `housing` column distribution\n", + "train.housing.value_counts(normalize=True)" + ] + }, + { + "cell_type": "markdown", + "id": "2cb340b4-7d21-4810-8be2-1633da2e4396", + "metadata": {}, + "source": [ + "We define a tranformer that can be used to convert `checking_status` and `housing` to ordinal variables. The transformer will also drop the non-feature columns (`class`, `ID`, and `application_timestamp`) from the feature data." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "27796e23-c12e-4e51-8fb4-090b26aff2ef", + "metadata": {}, + "outputs": [], + "source": [ + "# Feature lists\n", + "cat_features = [\"checking_status\", \"housing\"]\n", + "num_features = [\n", + " \"duration\", \"credit_amount\", \"installment_commitment\",\n", + " \"residence_since\", \"age\", \"existing_credits\", \"num_dependents\"\n", + "]\n", + "\n", + "# Ordinal encoder for cat_features\n", + "# (We use a ColumnTransformer to passthrough numerical feature columns)\n", + "col_transform = ColumnTransformer([\n", + " (\"cat_features\", OrdinalEncoder(), cat_features),\n", + " (\"num_features\", \"passthrough\", num_features),\n", + " ],\n", + " remainder=\"drop\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "318429b9-e008-4cc7-8108-779934f9ac2f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
checking_statushousingdurationcredit_amountinstallment_commitmentresidence_sinceageexisting_creditsnum_dependents
183.01.06.01238.04.04.036.01.02.0
7643.01.018.01169.04.03.029.02.01.0
5043.01.012.0701.04.02.032.02.01.0
4540.00.024.05743.02.04.024.02.01.0
4531.01.024.02828.04.04.022.01.01.0
\n", + "
" + ], + "text/plain": [ + " checking_status housing duration credit_amount \\\n", + "18 3.0 1.0 6.0 1238.0 \n", + "764 3.0 1.0 18.0 1169.0 \n", + "504 3.0 1.0 12.0 701.0 \n", + "454 0.0 0.0 24.0 5743.0 \n", + "453 1.0 1.0 24.0 2828.0 \n", + "\n", + " installment_commitment residence_since age existing_credits \\\n", + "18 4.0 4.0 36.0 1.0 \n", + "764 4.0 3.0 29.0 2.0 \n", + "504 4.0 2.0 32.0 2.0 \n", + "454 2.0 4.0 24.0 2.0 \n", + "453 4.0 4.0 22.0 1.0 \n", + "\n", + " num_dependents \n", + "18 2.0 \n", + "764 1.0 \n", + "504 1.0 \n", + "454 1.0 \n", + "453 1.0 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check the tranform outputs features as expected\n", + "# (Note: transform output is an array, so we convert it\n", + "# back to dataframe for inspection)\n", + "pd.DataFrame(\n", + " index=train.index,\n", + " columns=cat_features + num_features,\n", + " data= col_transform.fit_transform(train)\n", + ").head()" + ] + }, + { + "cell_type": "markdown", + "id": "a3785c93-8830-4fa2-bb9d-31b6e8fecb01", + "metadata": {}, + "source": [ + "Finally, let's separate out the labels, and engineer them from categorical (\"good\" | \"bad\") to float (1.0 | 0.0). We do this for both the training and validation data." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "30ebff90-a193-43a2-86fb-cf09e7d03777", + "metadata": {}, + "outputs": [], + "source": [ + "# Make \"class\" target variable numeric\n", + "train_y = (train[\"class\"] == \"good\").astype(float)\n", + "validate_y = (validate[\"class\"] == \"good\").astype(float)" + ] + }, + { + "cell_type": "markdown", + "id": "b052f6b2-2a34-441d-8a5f-2aad4e4db022", + "metadata": {}, + "source": [ + "### Train the Model" + ] + }, + { + "cell_type": "markdown", + "id": "c4f14590-31f4-4680-b1a1-75755a78513e", + "metadata": {}, + "source": [ + "Now that the features are prepared, we can train (fit) our baseline model on the feature data." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "0ff48f34-dbb6-4221-aefc-3c9b3f9da3e3", + "metadata": {}, + "outputs": [], + "source": [ + "# Specify the model\n", + "rf_model = RandomForestClassifier(\n", + " n_estimators=400,\n", + " criterion=\"entropy\",\n", + " max_depth=4,\n", + " min_samples_leaf=10,\n", + " class_weight={0:5, 1:1},\n", + " random_state=SEED\n", + ")\n", + "\n", + "# Package transform and model in pipeline\n", + "model = Pipeline([(\"transform\", col_transform), (\"rf_model\", rf_model)])" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "1d6ef38a-23b0-4056-a108-960495521164", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Pipeline(steps=[('transform',\n",
+       "                 ColumnTransformer(transformers=[('cat_features',\n",
+       "                                                  OrdinalEncoder(),\n",
+       "                                                  ['checking_status',\n",
+       "                                                   'housing']),\n",
+       "                                                 ('num_features', 'passthrough',\n",
+       "                                                  ['duration', 'credit_amount',\n",
+       "                                                   'installment_commitment',\n",
+       "                                                   'residence_since', 'age',\n",
+       "                                                   'existing_credits',\n",
+       "                                                   'num_dependents'])])),\n",
+       "                ('rf_model',\n",
+       "                 RandomForestClassifier(class_weight={0: 5, 1: 1},\n",
+       "                                        criterion='entropy', max_depth=4,\n",
+       "                                        min_samples_leaf=10, n_estimators=400,\n",
+       "                                        random_state=142))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "Pipeline(steps=[('transform',\n", + " ColumnTransformer(transformers=[('cat_features',\n", + " OrdinalEncoder(),\n", + " ['checking_status',\n", + " 'housing']),\n", + " ('num_features', 'passthrough',\n", + " ['duration', 'credit_amount',\n", + " 'installment_commitment',\n", + " 'residence_since', 'age',\n", + " 'existing_credits',\n", + " 'num_dependents'])])),\n", + " ('rf_model',\n", + " RandomForestClassifier(class_weight={0: 5, 1: 1},\n", + " criterion='entropy', max_depth=4,\n", + " min_samples_leaf=10, n_estimators=400,\n", + " random_state=142))])" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Fit the model\n", + "model.fit(train, train_y)" + ] + }, + { + "cell_type": "markdown", + "id": "73c45c39-9d8e-4f76-aca5-9f0c1568d263", + "metadata": {}, + "source": [ + "### Evaluate the Model" + ] + }, + { + "cell_type": "markdown", + "id": "ef58d432-80ba-428f-b59f-621a9e53b331", + "metadata": {}, + "source": [ + "Let's evaluate our baseline model performance. With credit risk, recall is going to be an important measure to look at. We compare the performance on the training data, with the performance on the validation data through a classification report." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "8c5472f6-2ddc-437d-8102-4d5bd2c9f39c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0.0 0.42 0.92 0.58 232\n", + " 1.0 0.94 0.49 0.64 568\n", + "\n", + " accuracy 0.61 800\n", + " macro avg 0.68 0.70 0.61 800\n", + "weighted avg 0.79 0.61 0.63 800\n", + "\n" + ] + } + ], + "source": [ + "# Evaluate training set performance\n", + "train_preds = model.predict(train)\n", + "print(classification_report(train_y, train_preds))" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "c296bbd3-603e-4615-abbe-2689ebcf5d8c", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0.0 0.46 0.87 0.61 68\n", + " 1.0 0.88 0.48 0.62 132\n", + "\n", + " accuracy 0.61 200\n", + " macro avg 0.67 0.68 0.61 200\n", + "weighted avg 0.74 0.61 0.62 200\n", + "\n" + ] + } + ], + "source": [ + "# Evaluate validation data performance\n", + "print(classification_report(validate_y, model.predict(validate)))" + ] + }, + { + "cell_type": "markdown", + "id": "d57ffbdc-f0b3-4fb6-9575-5acd983082cf", + "metadata": {}, + "source": [ + "The recall on the validation set for bad loans (0 class) is 0.87, meaning that the model correctly identified close to 90% of the bad loans. However, the precision of 0.46 tells us that the model is also classifying many loans that were actually good as bad. Precision and recall are technical metrics. In order to truly assess the models value, we would need feedback from the business side on the impact of misclassifications (for both good and bad loans).\n", + "\n", + "The difference in performance on the training vs. validation data, tells us that the model is slightly overfitting the data. Remember that this is just a quick baseline model. To improve further, we could do things like:\n", + "- gather more data\n", + "- engineer features\n", + "- experiment with hyperparameter settings\n", + "- experiment with other model types\n", + "\n", + "In fact, this is just a start. Creating AI models that meet business needs often requires a lot of guided experimentation." + ] + }, + { + "cell_type": "markdown", + "id": "0378d21a-d6db-42f9-851a-ce71f68c6802", + "metadata": {}, + "source": [ + "### Save the Model" + ] + }, + { + "cell_type": "markdown", + "id": "4450a328-f00c-4579-8e08-b2ebe5046961", + "metadata": {}, + "source": [ + "The last thing we do is save our trained model, so that we can pick it up later in the serving environment." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "da7a7906-d54f-4f2d-9803-6c82c86b28ad", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['rf_model.pkl']" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Save the model to a pickle file\n", + "joblib.dump(model, \"rf_model.pkl\")" + ] + }, + { + "cell_type": "markdown", + "id": "299588b8-ab67-4155-97a9-770e8e4a7476", + "metadata": {}, + "source": [ + "In the next notebook, [04_Credit_Risk_Model_Serving.ipynb](04_Credit_Risk_Model_Serving.ipynb), we will load the trained model and request predictions, with input features provided by the Feast online feature server." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/credit-risk-end-to-end/04_Credit_Risk_Model_Serving.ipynb b/examples/credit-risk-end-to-end/04_Credit_Risk_Model_Serving.ipynb new file mode 100644 index 00000000000..f263dd6cd7b --- /dev/null +++ b/examples/credit-risk-end-to-end/04_Credit_Risk_Model_Serving.ipynb @@ -0,0 +1,697 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9c870dcb-c66d-454d-a3fa-5f9a723bf8af", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "id": "339ab741-ac90-4763-9971-3b274f6a90b4", + "metadata": {}, + "source": [ + "# Credit Risk Model Serving" + ] + }, + { + "cell_type": "markdown", + "id": "31d29794-4c33-4bc1-9bb4-e238c59f882d", + "metadata": {}, + "source": [ + "### Introduction" + ] + }, + { + "cell_type": "markdown", + "id": "d6553fe7-5427-4ecc-b638-615b47acf1a8", + "metadata": {}, + "source": [ + "Model serving is an exciting part of AI/ML. All of our previous work was building to this phase where we can actually serve loan predictions. \n", + "\n", + "So what role does Feast play in model serving? We've already seen that Feast can \"materialize\" data from the training offline store to the serving online store. This comes in handy because many models need contextual features at inference time. \n", + "\n", + "With this example, we can imagine a scenario something like this:\n", + "1. A bank customer submits a loan application on a website. \n", + "2. The website backend requests features, supplying the customer's ID as input.\n", + "3. The backend retrieves feature data for the ID in question.\n", + "4. The backend submits the feature data to the model to obtain a prediction.\n", + "5. The backend uses the prediction to make a decision.\n", + "6. The response is recorded and made available to the user.\n", + "\n", + "With online requests like this, time and resource usage often matter a lot. Feast facilitates quickly retrieving the correct feature data.\n", + "\n", + "In real-life, some of the contextual feature data points could be requested from the user, while others are retrieved from data sources. While outside the scope of this example, Feast does facilitate retrieving request data, and joining it with feature data. (See [Request Source](https://rtd.feast.dev/en/master/#request-source)).\n", + "\n", + "In this notebook, we request feature data from the online store for a small batch of users. We then get outcome predictions from our trained model. This notebook is a continuation of the work done in the previous notebooks; it comes as the step after [03_Credit_Risk_Model_Training.ipynb](03_Credit_Risk_Model_Training.ipynb)." + ] + }, + { + "cell_type": "markdown", + "id": "53818109-c357-435f-8a8b-2a62982fa9a8", + "metadata": {}, + "source": [ + "### Setup" + ] + }, + { + "cell_type": "markdown", + "id": "92b5ab1b-186d-4b76-aac7-9b5110f8673e", + "metadata": {}, + "source": [ + "*The following code assumes that you have read the example README.md file, and that you have setup an environment where the code can be run. Please make sure you have addressed the prerequisite needs.*" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "378189ed-e967-4b2b-b591-aab980a685b3", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "import os\n", + "import joblib\n", + "import json\n", + "import requests\n", + "import warnings\n", + "import pandas as pd\n", + "\n", + "from feast import FeatureStore" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ea90edb2-16f0-4d40-a280-4e6ea79ea5be", + "metadata": {}, + "outputs": [], + "source": [ + "# ingnore warnings\n", + "warnings.filterwarnings(action=\"ignore\")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "55f8ed91-7c13-44f7-a294-b6cacd43f8db", + "metadata": {}, + "outputs": [], + "source": [ + "# Load the model\n", + "model = joblib.load(\"rf_model.pkl\")" + ] + }, + { + "cell_type": "markdown", + "id": "3093e1b6-66d9-4936-b197-d853631914db", + "metadata": {}, + "source": [ + "### Query Feast Online Server for Feature Data" + ] + }, + { + "cell_type": "markdown", + "id": "2b5bbc4a-e2d3-4b7b-8309-434ff3b3e2cf", + "metadata": {}, + "source": [ + "Here, we show two different ways to retrieve data from the online feature server. The first is using the Python `requests` library, and the second is using the Feast Python SDK.\n", + "\n", + "We can use the Python requests library to request feature data from the online feature server (that we deployed in notebook [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb)). The request takes the form of an HTTP POST command sent to the server endpoint (`url`). We request the data we need by supplying the entity and feature information in the data payload. We also need to specify an `application/json` content type in the request header." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "c6fd4f1a-917b-4a98-9bf6-101b4a074b64", + "metadata": {}, + "outputs": [], + "source": [ + "# ID examples\n", + "ids = [18, 764, 504, 454, 453, 0, 1, 2, 3, 4, 5, 6, 7, 8]\n", + "\n", + "# Submit get_online_features request to Feast online store server\n", + "response = requests.post(\n", + " url=\"http://localhost:6566/get-online-features\",\n", + " headers = {'Content-Type': 'application/json'},\n", + " data=json.dumps({\n", + " \"entities\": {\"ID\": ids},\n", + " \"features\": [\n", + " \"data_a:duration\",\n", + " \"data_a:credit_amount\",\n", + " \"data_a:installment_commitment\",\n", + " \"data_a:checking_status\",\n", + " \"data_b:residence_since\",\n", + " \"data_b:age\",\n", + " \"data_b:existing_credits\",\n", + " \"data_b:num_dependents\",\n", + " \"data_b:housing\"\n", + " ]\n", + " })\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "8e616a52-c18c-44a9-9e63-3aba071d7e79", + "metadata": {}, + "source": [ + "The response is returned as JSON, with feature values for each of the IDs." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "cf8948b7-4ed7-4c45-8acf-462331d9e4d2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'{\"metadata\":{\"feature_names\":[\"ID\",\"checking_status\",\"duration\",\"installment_commitment\",\"credit_amount\",\"residence_since\",\"num_dependents\",\"age\",\"housing\",\"existing_credits\"]},\"results\":[{\"values\":[18,764,504,454,453,0,1,2,3,4,5,6,7,8],\"statuses\":[\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\"],\"event_timestamps\":[\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\",\"1970-01-01T00:00:00Z\"]},{\"values\":[\"0<=X<200\",\"no checking\",\"<0\",\"<0\",\"no checking\",\"<0\",\"0<=X<200\",\"no checking\",\"<0\",\"<0\",\"no checking\",\"no checking\",\"0<=X<200\",\"no checking\"],\"statuses\":[\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",\"PRESENT\",'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show first 1000 characters of response\n", + "response.text[:1000]" + ] + }, + { + "cell_type": "markdown", + "id": "c719f702-578a-4f35-b8ff-e41707cda23e", + "metadata": {}, + "source": [ + "As the response data comes in JSON format, there is a little formatting required to organize the data into a dataframe with one record per row (and features as columns)." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "b992063d-8d83-4bf7-8153-f690b0410359", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDchecking_statusdurationinstallment_commitmentcredit_amountresidence_sincenum_dependentsagehousingexisting_credits
0180<=X<20024.04.012579.02.01.044.0for free1.0
1764no checking24.04.02463.03.01.027.0own2.0
2504<024.04.01207.04.01.024.0rent1.0
\n", + "
" + ], + "text/plain": [ + " ID checking_status duration installment_commitment credit_amount \\\n", + "0 18 0<=X<200 24.0 4.0 12579.0 \n", + "1 764 no checking 24.0 4.0 2463.0 \n", + "2 504 <0 24.0 4.0 1207.0 \n", + "\n", + " residence_since num_dependents age housing existing_credits \n", + "0 2.0 1.0 44.0 for free 1.0 \n", + "1 3.0 1.0 27.0 own 2.0 \n", + "2 4.0 1.0 24.0 rent 1.0 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Inspect the response\n", + "resp_data = json.loads(response.text)\n", + "\n", + "# Transform JSON into dataframe\n", + "records = pd.DataFrame(\n", + " columns=resp_data[\"metadata\"][\"feature_names\"], \n", + " data=[[r[\"values\"][i] for r in resp_data[\"results\"]] for i in range(len(ids))]\n", + ")\n", + "records.head(3)" + ] + }, + { + "cell_type": "markdown", + "id": "6db9b8ac-146e-40d3-b35a-cf4f4b6bbc8a", + "metadata": {}, + "source": [ + "Now, let's see how we can do the same with the Feast Python SDK. Note that we instantiate our `FeatureStore` object with the configuration that we set up in [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb), by pointing to the `./Feature_Store` directory." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "765dc62b-e1e7-45fe-88b4-cc0235519ff8", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:root:_list_feature_views will make breaking changes. Please use _list_batch_feature_views instead. _list_feature_views will behave like _list_all_feature_views in the future.\n", + "WARNING:root:Cannot use sqlite_vec for vector search\n" + ] + } + ], + "source": [ + "# Instantiate FeatureStore object\n", + "store = FeatureStore(repo_path=\"./Feature_Store\")\n", + "\n", + "# Retrieve features\n", + "records = store.get_online_features(\n", + " entity_rows=[{\"ID\":v} for v in ids],\n", + " features=[\n", + " \"data_a:duration\",\n", + " \"data_a:credit_amount\",\n", + " \"data_a:installment_commitment\",\n", + " \"data_a:checking_status\",\n", + " \"data_b:residence_since\",\n", + " \"data_b:age\",\n", + " \"data_b:existing_credits\",\n", + " \"data_b:num_dependents\",\n", + " \"data_b:housing\" \n", + " ]\n", + ").to_df()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1d214e55-df0b-460d-936c-8951f7365a93", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDcredit_amountinstallment_commitmentchecking_statusdurationnum_dependentshousingageresidence_sinceexisting_credits
01812579.04.00<=X<20024.01.0for free44.02.01.0
17642463.04.0no checking24.01.0own27.03.02.0
25041207.04.0<024.01.0rent24.04.01.0
\n", + "
" + ], + "text/plain": [ + " ID credit_amount installment_commitment checking_status duration \\\n", + "0 18 12579.0 4.0 0<=X<200 24.0 \n", + "1 764 2463.0 4.0 no checking 24.0 \n", + "2 504 1207.0 4.0 <0 24.0 \n", + "\n", + " num_dependents housing age residence_since existing_credits \n", + "0 1.0 for free 44.0 2.0 1.0 \n", + "1 1.0 own 27.0 3.0 2.0 \n", + "2 1.0 rent 24.0 4.0 1.0 " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "records.head(3)" + ] + }, + { + "cell_type": "markdown", + "id": "fd828758-6c57-4f9e-bbda-3983b6579da2", + "metadata": {}, + "source": [ + "### Get Predictions from the Model" + ] + }, + { + "cell_type": "markdown", + "id": "f446d7ec-0dae-409a-82a2-c0d7016c2001", + "metadata": {}, + "source": [ + "Now we can request predictions from our trained model. \n", + "\n", + "For convenience, we output the predictions along with the implied loan designations. Remember that these are predictions on loan outcomes, given context data from the loan application process. Since we have access to the actual `class` outcomes, we display those as well to see how the model did.|" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "70203f7b-f1e5-46ba-8623-f10bf3a5abf8", + "metadata": {}, + "outputs": [], + "source": [ + "# Get predictions from the model\n", + "preds = model.predict(records)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "27001dde-8bdb-4de1-8c33-a76f030748e0", + "metadata": {}, + "outputs": [], + "source": [ + "# Load labels\n", + "labels = pd.read_parquet(\"Feature_Store/data/labels.parquet\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "ddc958e8-8ff8-49b1-ac10-fc965f3bf21c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDPredictionLoan_DesignationTrue_Value
18180.0badbad
7647641.0goodgood
5045040.0badbad
4544540.0badbad
4534531.0goodgood
001.0goodgood
110.0badbad
221.0goodgood
330.0badgood
440.0badbad
551.0goodgood
661.0goodgood
770.0badgood
881.0goodgood
\n", + "
" + ], + "text/plain": [ + " ID Prediction Loan_Designation True_Value\n", + "18 18 0.0 bad bad\n", + "764 764 1.0 good good\n", + "504 504 0.0 bad bad\n", + "454 454 0.0 bad bad\n", + "453 453 1.0 good good\n", + "0 0 1.0 good good\n", + "1 1 0.0 bad bad\n", + "2 2 1.0 good good\n", + "3 3 0.0 bad good\n", + "4 4 0.0 bad bad\n", + "5 5 1.0 good good\n", + "6 6 1.0 good good\n", + "7 7 0.0 bad good\n", + "8 8 1.0 good good" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show preds\n", + "pd.DataFrame({\n", + " \"ID\": ids,\n", + " \"Prediction\": preds,\n", + " \"Loan_Designation\": [\"bad\" if i==0.0 else \"good\" for i in preds],\n", + " \"True_Value\": labels.loc[ids, \"class\"]\n", + "})" + ] + }, + { + "cell_type": "markdown", + "id": "87cd592a-61fc-4553-b84a-941d1785910d", + "metadata": {}, + "source": [ + "It's important to remember that the model's predictions are like educated guesses based on learned patterns. The model will get some predictions right, and other wrong. With the example records above, it looks like the model did pretty good! An AI/ML team's task is generally to make the model's predictions as useful as possible in helping the organization make decisions (for example, on loan approvals).\n", + "\n", + "In this case, we have a baseline model. While not ready for production, this model has set a low bar by which other models can be measured. Teams can also use a model like this to help with early testing, and with proving out things like pipelines and infrastructure before more sophisticated models are available.\n", + "\n", + "We have used Feast to query the feature data in support of model serving. The next notebook, [05_Credit_Risk_Cleanup.ipynb](05_Credit_Risk_Cleanup.ipynb), cleans up resources created in this and previous notebooks." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/credit-risk-end-to-end/05_Credit_Risk_Cleanup.ipynb b/examples/credit-risk-end-to-end/05_Credit_Risk_Cleanup.ipynb new file mode 100644 index 00000000000..846748dc425 --- /dev/null +++ b/examples/credit-risk-end-to-end/05_Credit_Risk_Cleanup.ipynb @@ -0,0 +1,296 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "cf46ec61-7914-4677-b12b-a9e478e88d3f", + "metadata": {}, + "source": [ + "# Credit Risk Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "6ae8aaec-e01d-48d3-b768-98661ad1ec85", + "metadata": {}, + "source": [ + "Run this notebook if you are done experimenting with this demo, or if you wish to start again with a clean slate.\n", + "\n", + "**RUNNING THE FOLLOWING CODE WILL REMOVE FILES AND PROCESSES CREATED BY THE PREVIOUS EXAMPLE NOTEBOOKS.**\n", + "\n", + "The notebook progresses in reverse order of how the files and processes were added. (The reverse order makes it possible to partially revert changes by running cells up to a certain point.)" + ] + }, + { + "cell_type": "markdown", + "id": "6feaa771-4226-459f-b6dd-214024cb5c7c", + "metadata": {}, + "source": [ + "#### Setup" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "20a39e94-920d-4108-aa6b-1e29d2224f71", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "import os\n", + "import time\n", + "import psutil" + ] + }, + { + "cell_type": "markdown", + "id": "3f124260-a8b2-475d-9103-8d336c543fce", + "metadata": {}, + "source": [ + "#### Remove Trained Model File" + ] + }, + { + "cell_type": "markdown", + "id": "f7a05a2b-9a26-4722-a526-84da99fc0b29", + "metadata": {}, + "source": [ + "This removes the model that was created and saved in [03_Credit_Risk_Model_Training.ipynb](03_Credit_Risk_Model_Training.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a6b21063-ea43-4329-be0c-c1644c705db2", + "metadata": {}, + "outputs": [], + "source": [ + "# Remove the model file that was saved in model training.\n", + "model_path = \"./rf_model.pkl\"\n", + "os.remove(model_path)" + ] + }, + { + "cell_type": "markdown", + "id": "ed97c24a-8f25-4e77-9037-f9cf4ad68dfa", + "metadata": {}, + "source": [ + "#### Shutdown Servers" + ] + }, + { + "cell_type": "markdown", + "id": "2f825d10-c13d-4701-b102-e15ad1c0bd3b", + "metadata": {}, + "source": [ + "Shut down the servers that were launched in [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb); also remove the `server_proc.txt` that held the process PIDs." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "66db4d46-a895-4041-ad87-ab0a77f13211", + "metadata": {}, + "outputs": [], + "source": [ + "# Load server process objects\n", + "server_pids = open(\"server_proc.txt\").readlines()\n", + "offline_server_proc = psutil.Process(int(server_pids[0].strip()))\n", + "online_server_proc = psutil.Process(int(server_pids[1].strip()))" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "001fd472-2e28-499e-9eac-0a16ad8187a0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Online server : psutil.Process(pid=44621, name='python3.11', status='running', started='14:19:05')\n", + "Online server is running: True\n", + "\n", + "Offline server PID: psutil.Process(pid=44594, name='python3.11', status='running', started='14:19:03')\n", + "Offline server is running: True\n" + ] + } + ], + "source": [ + "# Verify if servers are running\n", + "def verify_servers():\n", + " # online server\n", + " print(f\"Online server : {online_server_proc}\")\n", + " print(f\"Online server is running: {online_server_proc.is_running()}\", end='\\n\\n')\n", + " # offline server\n", + " print(f\"Offline server PID: {offline_server_proc}\")\n", + " print(f\"Offline server is running: {offline_server_proc.is_running()}\")\n", + " \n", + "verify_servers()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "68376350-790a-4e7e-9325-c7de4d22e54b", + "metadata": {}, + "outputs": [], + "source": [ + "# Terminate offline server\n", + "offline_server_proc.terminate()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "446b6bf9-aef2-4873-b477-8bf595a8eabf", + "metadata": {}, + "outputs": [], + "source": [ + "# Terminate online server (master and worker)\n", + "for child in online_server_proc.children(recursive=True):\n", + " child.terminate()\n", + "online_server_proc.terminate()\n", + "time.sleep(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "774827f6-4dcd-495b-b5c5-186b97148619", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Online server : psutil.Process(pid=44621, name='python3.11', status='terminated', started='14:19:05')\n", + "Online server is running: False\n", + "\n", + "Offline server PID: psutil.Process(pid=44594, name='python3.11', status='terminated', started='14:19:03')\n", + "Offline server is running: False\n" + ] + } + ], + "source": [ + "# Verify termination\n", + "verify_servers()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f8a155e4-23b3-4fb3-b868-02ba2e0a4a31", + "metadata": {}, + "outputs": [], + "source": [ + "# Remove server_proc.txt (file for keeping track of pids)\n", + "os.remove(\"server_proc.txt\")" + ] + }, + { + "cell_type": "markdown", + "id": "ed7d6f25-d255-4986-9cf2-9876f6c558cc", + "metadata": {}, + "source": [ + "#### Remove Feast Applied Configuration Files" + ] + }, + { + "cell_type": "markdown", + "id": "d73efe15-a1d9-459b-8142-835dc2bf1c9f", + "metadata": {}, + "source": [ + "Remove the registry and online store (SQLite) files created on`feast apply` created in [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "0f13a4ac-d2ad-462b-b65e-4266b7cb4922", + "metadata": {}, + "outputs": [], + "source": [ + "os.remove(\"Feature_Store/data/online_store.db\")\n", + "os.remove(\"Feature_Store/data/registry.db\")" + ] + }, + { + "cell_type": "markdown", + "id": "eb0494cd-0143-4f5f-b7d6-9675e1403d9f", + "metadata": {}, + "source": [ + "#### Remove Feast Configuration Files" + ] + }, + { + "cell_type": "markdown", + "id": "86c33ac7-9e1f-4798-9f14-77773a1c13bd", + "metadata": {}, + "source": [ + "Remove the configution and feature definition files created in [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "a747043f-05fe-4b44-979d-9b30565074ee", + "metadata": {}, + "outputs": [], + "source": [ + "os.remove(\"Feature_Store/feature_store.yaml\")\n", + "os.remove(\"Feature_Store/feature_definitions.py\")" + ] + }, + { + "cell_type": "markdown", + "id": "81975a0f-7fd6-4ed3-91cf-812946df4713", + "metadata": {}, + "source": [ + "#### Remove Data Files" + ] + }, + { + "cell_type": "markdown", + "id": "8182dc1e-d5c1-4739-b7c7-0620e93c5b64", + "metadata": {}, + "source": [ + "Remove the data files created in [01_Credit_Risk_Data_Prep.ipynb](01_Credit_Risk_Data_Prep.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "4ddb4fb2-fea1-4b70-8978-732af9a1cd3f", + "metadata": {}, + "outputs": [], + "source": [ + "for f in [\"data_a.parquet\", \"data_b.parquet\", \"labels.parquet\"]:\n", + " os.remove(f\"Feature_Store/data/{f}\")\n", + "os.rmdir(\"Feature_Store/data\")\n", + "os.rmdir(\"Feature_Store\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/credit-risk-end-to-end/README.md b/examples/credit-risk-end-to-end/README.md new file mode 100644 index 00000000000..5f59c750784 --- /dev/null +++ b/examples/credit-risk-end-to-end/README.md @@ -0,0 +1,39 @@ + +![Feast_Logo](https://raw.githubusercontent.com/feast-dev/feast/master/docs/assets/feast_logo.png) + +# Feast Credit Risk Classification End-to-End Example + +This example starts with an [OpenML](https://openml.org) credit risk dataset, and walks through the steps of preparing the data, setting up feature store resources, and serving features; this is all done inside the paradigm of an ML workflow, with the goal of helping users understand how Feast fits in the progression from data preparation, to model training and model serving. + +The example is organized in five notebooks: +1. [01_Credit_Risk_Data_Prep.ipynb](01_Credit_Risk_Data_Prep.ipynb) +2. [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb) +3. [03_Credit_Risk_Model_Training.ipynb](03_Credit_Risk_Model_Training.ipynb) +4. [04_Credit_Risk_Model_Serving.ipynb](04_Credit_Risk_Model_Serving.ipynb) +5. [05_Credit_Risk_Cleanup.ipynb](05_Credit_Risk_Cleanup.ipynb) + +Run the notebooks in order to progress through the example. See below for prerequisite setup steps. + +### Preparing your Environment +To run the example, install the Python dependencies. You may wish to do so inside a virtual environment. Open a command terminal, and run the following: + +``` +# create venv-example virtual environment +python -m venv venv-example +# activate environment +source venv-example/bin/activate +``` + +Install the Python dependencies: +``` +pip install -r requirements.txt +``` + +Note that this example was tested with Python 3.11, but it should also work with other similar versions. + +### Running the Notebooks +Once you have installed the Python dependencies, you can run the example notebooks. To run the notebooks locally, execute the following command in a terminal window: + +```jupyter notebook``` + +You should see a browser window open a page where you can navigate to the example notebook (.ipynb) files and open them. diff --git a/examples/credit-risk-end-to-end/requirements.txt b/examples/credit-risk-end-to-end/requirements.txt new file mode 100644 index 00000000000..8b9b1313e78 --- /dev/null +++ b/examples/credit-risk-end-to-end/requirements.txt @@ -0,0 +1,6 @@ +feast +jupyter==1.1.1 +scikit-learn==1.5.2 +pandas==2.2.3 +matplotlib==3.9.2 +seaborn==0.13.2 \ No newline at end of file