Skip to content

Latest commit

 

History

History
216 lines (161 loc) · 10.7 KB

README.md

File metadata and controls

216 lines (161 loc) · 10.7 KB

ModelSketchBook — Getting Started

Paper | DOI | Video | Sample NB | Open In Colab

ModelSketchBook logo

ModelSketchBook is a Python package introduced as part of an ACM CHI 2023 paper:

Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design. Michelle S. Lam, Zixian Ma, Anne Li, Izequiel Freitas, Dakuo Wang, James A. Landay, Michael S. Bernstein. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23). PDF | Arxiv

tl;dr

Machine learning practitioners often end up tunneling on low-level technical details like model architectures and performance metrics. Could early model development instead focus on high-level questions of which factors a model ought to pay attention to? ModelSketchBook instantiates a vision of model sketching, a technical framework for rapidly iterating over a machine learning model's decision-making logic. Model sketching refocuses practitioner attention on composing high-level, human-understandable concepts that the model is expected to reason over (e.g., profanity, racism, or sarcasm in a content moderation task) using zero-shot concept instantiation.

Summary of the model sketching process from concepts to sketch models to evaluating and iterating

With ModelSketchBook, you can create concepts using zero-shot methods,

Demo of the create_concept_model function

create sketches that combine these concept building blocks,

Demo of the create_sketch_model function

and continue to iterate on concepts and sketches to explore the ML model design space before diving into technical implementation details.

🚧 This repo is a work-in-progress research prototype and will continue to be updated! It has not yet been tested in a wide variety of development setups, but see our sample Colab notebook for a working example.

Installation

First, you'll need to install this package using PyPI.

pip install model_sketch_book

1: Basic setup

Imports

Then import the package into a Python notebook (i.e., a Jupyter or Colab notebook).

import model_sketch_book as msb

Set up the sketchbook

Data. ModelSketchBook is designed to work with datasets in Pandas DataFrame format. You can load CSVs to dataframes using the Pandas read_csv function.

Schema. To set up your sketchbook, you need to specify the schema of your dataframe. In the schema, the keys are the string names of the columns in your dataframe that you'd like to use, and the values indicate the type of data contained in that column. The currently-supported input types are:

  • InputType.Text: (string) Raw text fields of arbitrary length. Strings may be truncated to a max token length for prompts to GPT.
  • InputType.Image: (string) URLs to remotely-hosted images.
  • InputType.ImageLocal: (string) Filepaths to locally-stored images.
  • InputType.GroundTruth: (float or bool) Ground truth labels. Currently expected to be in the domain [0, 1].

Ground truth. The current API version requires exactly one ground truth column in the schema, though you can later add datasets that don't contain this ground truth column.

Credentials. Please supply your own OpenAI organization and API key if you would like to use GPT-based concepts. Otherwise, you can provide empty strings.

sb = msb.create_model_sketchbook(
    goal='Example sketchbook goal here',
    schema={
        # Specify your data schema here
        "picture_url": msb.InputType.Image,
        "neighborhood_overview": msb.InputType.Text,
        "description": msb.InputType.Text,
        "overall_rating": msb.InputType.GroundTruth,  # Required
    },
    credentials={
        "organization": "org-INSERT",
        "api_key": "sk-INSERT"
    }
)

Add your dataset(s)

Then, add your dataset from a Pandas dataframe (recommended: 40-50 rows). If the dataframe contains images, this step will involve caching the images to speed up processing later.

You should also specify exactly one dataset to be the default dataset with the default argument. It's recommended that your training set be the default dataset. With the notebook-widget functions documented here, the default dataset is used to set up concepts and sketches, and other datasets can only be used to evaluate trained sketches.

your_df = pd.read_csv("path_to_your_data/data.csv") # Insert logic to load your dataframe

sb.add_dataset(
    df=your_df,
    default=True,  # If dataset should be used by default, otherwise omit argument
)

Create concepts

You can then go ahead and create image and text concepts with the following function. This function will display widgets to specify your concept term, input field, and output type.

msb.create_concept_model(sb)

create_concept_model widgets

Models. Currently, text concepts can be served by GPT-3 (using the text-davinci-002 model) or OpenCLIP (using the ViT-B-32-quickgelu model), and image concepts can only be served by OpenCLIP.

Create sketches

Then, you can combine concepts together into sketches with the following function. This function will display widgets to select concepts and an aggregator.

msb.create_sketch_model(sb)

create_sketch_model widgets

2: Other functions

Tune concepts (optional)

You may optionally tune your existing concepts by binarizing them at a threshold, normalizing the values, or calibrating them between specified values. This function will display widgets to select an existing concept, a tuning method, and tuning-related parameters.

msb.tune_concept(sb)

tune_concept widgets

Logical concepts (AND, OR)

Logical concepts can be applied to any number of existing binary concepts. This function will display widgets to select the concepts and the logical operator (AND or OR) to apply to those concept scores.

msb.create_logical_concept_model(sb)

create_logical_concept_model widgets

Keyword concepts

Keyword concepts can be applied to any text-based input fields. The text of each example will be compared to the specified list of keywords, and if any of the keywords appear in the example, the example will be given a positive (True) label. This function will display widgets to select the input field and the comma-separated list of keywords, which can be treated in a case-sensitive or case-insensitive manner.

msb.create_keyword_concept_model(sb)

create_keyword_concept_model widgets

Take notes

You can select from among your created concepts and sketches and enter free-text notes into the text field. These notes will be displayed with your concept or sketch in calls to show_concepts() or show_sketches().

msb.take_note(sb)

take_note widgets

3: Concept helper functions

View existing concepts

Prints out a summary of all existing concepts with concept ID, concept term, input field, output type, and associated notes (if provided).

msb.show_concepts(sb)

Get concept term suggestions

Prints a set of synonyms for the provided concept term to assist in term ideation. Currently surfaced using WordNet synsets.

msb.get_similar_concepts("your concept term")

Compare concepts to ground truth labels

This function displays the Pearson correlation coefficients between each of the selected concepts and the ground truth label column. It also displays a table view that shows the difference between the concept score and ground truth label for each example in the default dataset.

msb.compare_concepts_to_gt(sb)

compare_concepts_to_gt widgets

Compare concepts to each other

This function displays the Pearson correlation coefficients between the two selected concepts. It also displays a table view that shows the difference between the concept scores for each example in the default dataset.

msb.compare_two_concepts(sb)

compare_two_concepts widgets

4: Sketch helper functions

View existing sketches

Prints out a summary of all existing sketches with sketch ID, model type, output type, concepts, and associated notes (if provided).

msb.show_sketches(sb)

Test an existing sketch on a dataset

Runs a selected (trained) sketch model on the selected dataset.

msb.test_sketch(sb)

test_sketch widgets

Compare sketch performance

Plots performance metrics (Mean Absolute Error, F1 Score, Classification accuracy, precision, and recall) for all of the selected sketches on the specified dataset.

msb.compare_sketches(sb)

compare_sketches widgets

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

model_sketch_book was created by Michelle Lam. It is licensed under the terms of the MIT license.

Credits

model_sketch_book was created with cookiecutter and the py-pkgs-cookiecutter template.