Add support for generating models ops test (#998)

1. **Restructure the model analysis script:** Fixes [#1001](#1001) Instead of having multiple class(i.e MarkDownWriter, MatchingExceptionRule, ModelVariantInfo, etc.) and functions in single python file, created python package named model_analysis and declared the function and class in separate file. Eg: MarkDownWriter class is used for creating and writing markdown files which is included in markdown.py file and common_failure_matching_rules_list has separate python module named expcetion_rules.py file 2. **Created a script for generating models ops test from the unique ops configuration extracted all the models present in the `forge/test/models` directory.** Fixes [#874](#761) Workflow: 1. Collect all the test that doesn't contain skip_model_analysis marker in the directory path specified the user eg: `forge/test/models`. 2. Run all the collected tests to extract the unique ops configuration and export the model unique ops configuration as excel and metadata json file Note: It will doesn't generate unique op test like model analysis pipeline 3. After extracting unique op configuration for the all test, then try to extract the unique ops configuration across all the test (i.e model variants) 4. By using extracted unique ops configuration across all the model variants, create models ops test with forge module in the directory path specified by the user. 5. Black formating and spdx headers are also automatically done in the generated nightly/push test **Note:** The ForgeModules present in the generated models ops test doesn't use actual parameter/constant tensor values from the model parameter/buffers by using the process_framework_parameter function. Instead, it will generate random tensor based upon the constant/parameter tensor shapes and dtypes which is done inside the test function. <img width="937" alt="Screenshot 2025-01-07 at 6 32 35 PM" src="https://github.com/user-attachments/assets/851ae7ef-da53-407a-a344-4656dd3e92e5" /> 3. **Breakdown the Model Analysis Weekly workflow:** Fixes [#1002](#1002) 1. model-analysis.yml -> Common yml file used for running the model analysis pipeline for markdown generation and models ops test generation. 2. model-analysis-weekly.yml -> Workflow used for triggering the Model Analysis Workflow(i.e model-analysis.yml) for running the model analysis and markdown files generation. 3. model-analysis-config.sh -> shell script containing the script and PR configuration/environmental variables for the markdown generation and model test generation. The generated models ops test PR for albert model - #1014
tenstorrent · Jan 10, 2025 · d9f3cfb · d9f3cfb
1 parent 640718a
commit d9f3cfb
Show file tree

Hide file tree

Showing 116 changed files with 2,254 additions and 1,639 deletions.
diff --git a/.github/model-analysis-config.sh b/.github/model-analysis-config.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: (c) 2024 Tenstorrent AI ULC
+#
+# SPDX-License-Identifier: Apache-2.0
+
+# If set to true, it will set the environment variables for models ops test generation otherwise markdown generation env variables will be set
+GENERATE_MODELS_OPS_TEST=$1
+
+# Declare an associative array to store environment variables
+declare -A env_vars
+
+# Markdown Generation
+# 1) PR config
+env_vars["BRANCH_NAME"]="model_analysis"
+env_vars["COMMIT_MESSAGE"]="Update model analysis documentation"
+env_vars["TITLE"]="Update model analysis documentation"
+env_vars["BODY"]="This PR will update model analysis documentation."
+env_vars["OUTPUT_PATH"]="model_analysis_docs/"
+
+# 2) Script config
+env_vars["MARDOWN_DIR_PATH"]="./model_analysis_docs"
+env_vars["SCRIPT_OUTPUT_LOG"]="model_analysis.log"
+
+
+# Model ops test generation
+# 1) Script config
+env_vars["MODELS_OPS_TEST_OUTPUT_DIR_PATH"]="forge/test"
+env_vars["MODELS_OPS_TEST_PACKAGE_NAME"]="models_ops"
+
+
+# Common Config for markdown generation and model ops test generation
+env_vars["TEST_DIR_OR_FILE_PATH"]="forge/test/models"
+env_vars["UNIQUE_OPS_OUTPUT_DIR_PATH"]="./models_unique_ops_output"
+
+
+# If GENERATE_MODELS_OPS_TEST is set to true, Modify the PR config to model ops test generation.
+if [[ "$GENERATE_MODELS_OPS_TEST" == "true" ]]; then
+    env_vars["BRANCH_NAME"]="generate_models_ops_test"
+    env_vars["COMMIT_MESSAGE"]="Generate and update models ops tests"
+    env_vars["TITLE"]="Generate and update models ops tests"
+    env_vars["BODY"]="This PR will generate models ops tests by extracting the unique ops configurations across all the pytorch models present inside the forge/test/models directory path."
+    env_vars["OUTPUT_PATH"]="forge/test/models_ops/"
+    env_vars["SCRIPT_OUTPUT_LOG"]="generate_models_ops_test.log"
+fi
+
+
+for key in "${!env_vars[@]}"; do
+  echo "$key=${env_vars[$key]}"
+done
diff --git a/.github/workflows/model-analysis-weekly.yml b/.github/workflows/model-analysis-weekly.yml
@@ -6,116 +6,8 @@ on:
     - cron: '0 23 * * 5' # 11:00 PM UTC Friday (12:00 AM Saturday Serbia)
 
 jobs:
-
-  docker-build:
-    uses: ./.github/workflows/build-image.yml
+  model-analysis-weekly:
+    uses: ./.github/workflows/model-analysis.yml
     secrets: inherit
-
-  model-analysis:
-    needs: docker-build
-    runs-on: runner
-    timeout-minutes: 10080 # Set job execution time to 7 days(default: 6 hours)
-
-    container:
-      image: ${{ needs.docker-build.outputs.docker-image }}
-      options: --device /dev/tenstorrent/0
-      volumes:
-        - /dev/hugepages:/dev/hugepages
-        - /dev/hugepages-1G:/dev/hugepages-1G
-        - /etc/udev/rules.d:/etc/udev/rules.d
-        - /lib/modules:/lib/modules
-        - /opt/tt_metal_infra/provisioning/provisioning_env:/opt/tt_metal_infra/provisioning/provisioning_env
-
-    env:
-      GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
-
-    steps:
-
-      - name: Set reusable strings
-        id: strings
-        shell: bash
-        run: |
-          echo "work-dir=$(pwd)" >> "$GITHUB_OUTPUT"
-          echo "build-output-dir=$(pwd)/build" >> "$GITHUB_OUTPUT"
-
-      - name: Git safe dir
-        run: git config --global --add safe.directory ${{ steps.strings.outputs.work-dir }}
-
-      - uses: actions/checkout@v4
-        with:
-            submodules: recursive
-            fetch-depth: 0 # Fetch all history and tags
-            token: ${{ env.GITHUB_TOKEN }}
-
-      # Clean everything from submodules (needed to avoid issues
-      # with cmake generated files leftover from previous builds)
-      - name: Cleanup submodules
-        run: |
-            git submodule foreach --recursive git clean -ffdx
-            git submodule foreach --recursive git reset --hard
-
-      - name: ccache
-        uses: hendrikmuhs/[email protected]
-        with:
-          create-symlink: true
-          key: model-analysis-${{ runner.os }}
-
-      - name: Build
-        shell: bash
-        run: |
-          source env/activate
-          cmake -G Ninja \
-          -B ${{ steps.strings.outputs.build-output-dir }} \
-          -DCMAKE_BUILD_TYPE=Release \
-          -DCMAKE_C_COMPILER=clang \
-          -DCMAKE_CXX_COMPILER=clang++ \
-          -DCMAKE_C_COMPILER_LAUNCHER=ccache \
-          -DCMAKE_CXX_COMPILER_LAUNCHER=ccache
-          cmake --build ${{ steps.strings.outputs.build-output-dir }}
-
-      - name: Run Model Analysis Script
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-          HF_HUB_DISABLE_PROGRESS_BARS: 1
-        shell: bash
-        run: |
-          source env/activate
-          apt-get update
-          apt install -y libgl1 libglx-mesa0
-          set -o pipefail # Ensures that the exit code reflects the first command that fails
-          python scripts/model_analysis.py \
-            --test_directory_or_file_path forge/test/models/pytorch \
-            --dump_failure_logs \
-            --markdown_directory_path ./model_analysis_docs \
-            --unique_ops_output_directory_path ./models_unique_ops_output \
-            2>&1 | tee model_analysis.log
-
-      - name: Upload Model Analysis Script Logs
-        uses: actions/upload-artifact@v4
-        if: success() || failure()
-        with:
-          name: model-analysis-outputs
-          path: model_analysis.log
-
-      - name: Upload Models Unique Ops test Failure Logs
-        uses: actions/upload-artifact@v4
-        if: success() || failure()
-        with:
-          name: unique-ops-logs
-          path: ./models_unique_ops_output
-
-      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v7
-        with:
-          branch: model_analysis
-          committer: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
-          author: ${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>
-          base: main
-          commit-message: "Update model analysis docs"
-          title: "Update model analysis docs"
-          body: "This PR will update model analysis docs"
-          labels: automatic_model_analysis
-          delete-branch: true
-          token: ${{ env.GITHUB_TOKEN }}
-          add-paths: |
-              model_analysis_docs/
+    with:
+      generate_models_ops_test: false
diff --git a/.github/workflows/model-analysis.yml b/.github/workflows/model-analysis.yml
@@ -0,0 +1,155 @@
+name: Model Analysis
+
+on:
+  workflow_dispatch:
+    inputs:
+      generate_models_ops_test:
+        description: 'If set to True, it will generate models ops test by extracting the unique ops config across all the models otherwise it will run the model analysis and generate markdown files'
+        required: false
+        type: boolean
+        default: false
+  workflow_call:
+    inputs:
+      generate_models_ops_test:
+        description: 'If set to True, it will generate models ops test by extracting the unique ops config across all the models otherwise it will run the model analysis and generate markdown files'
+        required: false
+        type: boolean
+        default: false
+
+jobs:
+
+  docker-build:
+    uses: ./.github/workflows/build-image.yml
+    secrets: inherit
+
+  model-analysis:
+    needs: docker-build
+    runs-on: runner
+    timeout-minutes: 4320 # Set job execution time to 3 days(default: 6 hours)
+
+    container:
+      image: ${{ needs.docker-build.outputs.docker-image }}
+      options: --device /dev/tenstorrent/0
+      volumes:
+        - /dev/hugepages:/dev/hugepages
+        - /dev/hugepages-1G:/dev/hugepages-1G
+        - /etc/udev/rules.d:/etc/udev/rules.d
+        - /lib/modules:/lib/modules
+        - /opt/tt_metal_infra/provisioning/provisioning_env:/opt/tt_metal_infra/provisioning/provisioning_env
+
+    env:
+      GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
+      HF_TOKEN: ${{ secrets.HF_TOKEN }}
+      HF_HUB_DISABLE_PROGRESS_BARS: 1
+
+    steps:
+
+      - name: Set reusable strings
+        id: strings
+        shell: bash
+        run: |
+          echo "work-dir=$(pwd)" >> "$GITHUB_OUTPUT"
+          echo "build-output-dir=$(pwd)/build" >> "$GITHUB_OUTPUT"
+
+      - name: Git safe dir
+        run: git config --global --add safe.directory ${{ steps.strings.outputs.work-dir }}
+
+      - uses: actions/checkout@v4
+        with:
+            submodules: recursive
+            fetch-depth: 0 # Fetch all history and tags
+            token: ${{ env.GITHUB_TOKEN }}
+
+      # Clean everything from submodules (needed to avoid issues
+      # with cmake generated files leftover from previous builds)
+      - name: Cleanup submodules
+        run: |
+            git submodule foreach --recursive git clean -ffdx
+            git submodule foreach --recursive git reset --hard
+
+      - name: ccache
+        uses: hendrikmuhs/[email protected]
+        with:
+          create-symlink: true
+          key: model-analysis-${{ runner.os }}
+
+      - name: Set environment variables
+        shell: bash
+        run: |
+            OUTPUT=$(bash .github/model-analysis-config.sh ${{ inputs.generate_models_ops_test }})
+            # Assign the script output to GitHub environment variables
+            echo "$OUTPUT" | while IFS= read -r line; do
+              echo "$line" >> $GITHUB_ENV
+            done
+
+      - name: Build
+        shell: bash
+        run: |
+          source env/activate
+          cmake -G Ninja \
+          -B ${{ steps.strings.outputs.build-output-dir }} \
+          -DCMAKE_BUILD_TYPE=Release \
+          -DCMAKE_C_COMPILER=clang \
+          -DCMAKE_CXX_COMPILER=clang++ \
+          -DCMAKE_C_COMPILER_LAUNCHER=ccache \
+          -DCMAKE_CXX_COMPILER_LAUNCHER=ccache
+          cmake --build ${{ steps.strings.outputs.build-output-dir }}
+
+      - name: Run Model Analysis Script
+        if: ${{ !inputs.generate_models_ops_test }}
+        shell: bash
+        run: |
+          source env/activate
+          apt-get update
+          apt install -y libgl1 libglx-mesa0
+          set -o pipefail # Ensures that the exit code reflects the first command that fails
+          python scripts/model_analysis/run_analysis_and_generate_md_files.py \
+            --test_directory_or_file_path ${{ env.TEST_DIR_OR_FILE_PATH }} \
+            --dump_failure_logs \
+            --markdown_directory_path ${{ env.MARDOWN_DIR_PATH }} \
+            --unique_ops_output_directory_path ${{ env.UNIQUE_OPS_OUTPUT_DIR_PATH }} \
+            2>&1 | tee ${{ env.SCRIPT_OUTPUT_LOG }}
+
+      - name: Generate Models Ops test
+        if: ${{ inputs.generate_models_ops_test }}
+        shell: bash
+        run: |
+          source env/activate
+          apt-get update
+          apt install -y libgl1 libglx-mesa0
+          set -o pipefail # Ensures that the exit code reflects the first command that fails
+          python scripts/model_analysis/generate_models_ops_test.py \
+            --test_directory_or_file_path ${{ env.TEST_DIR_OR_FILE_PATH }} \
+            --unique_ops_output_directory_path ${{ env.UNIQUE_OPS_OUTPUT_DIR_PATH }} \
+            --models_ops_test_output_directory_path ${{ env.MODELS_OPS_TEST_OUTPUT_DIR_PATH }} \
+            --models_ops_test_package_name ${{ env.MODELS_OPS_TEST_PACKAGE_NAME }} \
+            2>&1 | tee ${{ env.SCRIPT_OUTPUT_LOG }}
+
+      - name: Upload Script Output Logs
+        uses: actions/upload-artifact@v4
+        if: success() || failure()
+        with:
+          name: script-outputs
+          path: ${{ env.SCRIPT_OUTPUT_LOG }}
+
+      - name: Upload Models Unique Ops test Failure Logs
+        uses: actions/upload-artifact@v4
+        if: success() || failure()
+        with:
+          name: unique-ops-logs
+          path: ${{ env.UNIQUE_OPS_OUTPUT_DIR_PATH }}
+
+      - name: Create Pull Request
+        uses: peter-evans/create-pull-request@v7
+        with:
+          branch: ${{ env.BRANCH_NAME }}
+          committer: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
+          author: ${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>
+          base: main
+          commit-message: ${{ env.COMMIT_MESSAGE }}
+          title: ${{ env.TITLE }}
+          body: ${{ env.BODY }}
+          delete-branch: true
+          token: ${{ env.GITHUB_TOKEN }}
+          add-paths: |
+              ${{ env.OUTPUT_PATH }}
diff --git a/forge/forge/config.py b/forge/forge/config.py
@@ -184,16 +184,19 @@ class CompilerConfig:
     # Number of patterns to match for each module
     tvm_module_to_num_patterns: Dict[str, int] = field(default_factory=lambda: dict())
 
-    # If enabled, for given test, it generates Forge Modules in form of PyTest for each unique operation configuration within the given module.
+    # If enabled, for given test, it only extracts the unique operation configuration.
+    extract_tvm_unique_ops_config: bool = False
+
+    # If enabled, for given test, it extracts the unique operation configuration and generates Forge Modules in form of PyTest for each unique operation configuration within the given module.
     # Each configuration is based on:
     # - Operand Type (e.g., Activation, Parameter, Constant)
     # - Operand Shape
     # - Operand DataType
     # - Operation Arguments (if any)
-    tvm_generate_unique_op_tests: bool = False
+    tvm_generate_unique_ops_tests: bool = False
 
-    # Export the generated unique operations configurations information with test file path to the excel file
-    export_tvm_generated_unique_op_tests_details: bool = False
+    # Export the unique operations configurations information to the excel file
+    export_tvm_unique_ops_config_details: bool = False
 
     # Enables a transform for conv that directly reads input, such that it goes from stride > 1 to stride = 1
     # This usually translates to lower DRAM BW and less math as the input better populates tiles
@@ -359,9 +362,9 @@ def apply_env_config_overrides(self):
                 os.environ["FORGE_OVERRIDE_DEVICE_YAML"]
             )
 
-        if "FORGE_EXPORT_TVM_GENERATED_UNIQUE_OP_TESTS_DETAILS" in os.environ:
-            self.export_tvm_generated_unique_op_tests_details = bool(
-                int(os.environ["FORGE_EXPORT_TVM_GENERATED_UNIQUE_OP_TESTS_DETAILS"])
+        if "FORGE_EXPORT_TVM_UNIQUE_OPS_CONFIG_DETAILS" in os.environ:
+            self.export_tvm_unique_ops_config_details = bool(
+                int(os.environ["FORGE_EXPORT_TVM_UNIQUE_OPS_CONFIG_DETAILS"])
             )
 
     def __post_init__(self):