From 39ff2240bc03289daf020444d6e771bd070c853c Mon Sep 17 00:00:00 2001 From: Arindam Kulshi Date: Thu, 9 Jan 2025 11:33:00 -0800 Subject: [PATCH] minor edits --- OCR/README.md | 133 ++++++++++++++++++++++++++++----------------- backend/README.md | 10 ++-- frontend/README.md | 16 +++--- user_guide.md | 38 ++++++++----- 4 files changed, 121 insertions(+), 76 deletions(-) diff --git a/OCR/README.md b/OCR/README.md index 54c829f4..f594baf7 100644 --- a/OCR/README.md +++ b/OCR/README.md @@ -8,11 +8,14 @@ The **OCR Layer** in the ReportVision project processes document images, perform 1. [Introduction](#introduction) 2. [Installation](#installation) 3. [Running the Application](#running-the-application) -4. [Testing](#testing) -5. [End-to-End Benchmarking](#end-to-end-benchmarking) -6. [Dockerized Development](#dockerized-development) -7. [Development Tools](#development-tools) -8. [Contributing](#contributing) +4. [Development Tools](#development-tools) +5. [Testing](#testing) +6. [End-to-End Benchmarking](#end-to-end-benchmarking) +7. [Dockerized Development](#dockerized-development) +8. [Benchmarking](#end-to-end-benchmarking) +9. [Project Architecture](#project-architecture) +10. [API Endpoints](#api-endpoints) + --- @@ -23,8 +26,6 @@ The OCR layer uses **Poetry** for dependency management and virtual environment - Support for benchmarking OCR accuracy. - Configuration for different OCR models and segmentation templates. - - ### Installation ### Prerequisites @@ -56,46 +57,6 @@ Run unit tests poetry run pytest ``` -### End to End Benchmarking - - -#### Overview -End-to-end benchmarking evaluates OCR accuracy by: - -End-to-end benchmarking scripts can: - -1. Segment and run OCR on a folder of images using given segmentation template and labels file. -2. Compare OCR outputs to ground truth data based on matching file names. -3. Write metrics (confidence, raw distance, Hamming distance, Levenshtein distance) as well as total metrics to a CSV file. - - -To run benchmarking: - -1. Locate file `benchmark_main.py` -2. Ensure all the paths/folders exist by downloading from [Google Drive for all segmentation/label files](https://drive.google.com/drive/folders/1WS2FYn0BTxWv0juh7lblzdMaFlI7zbDd?usp=sharing) -3. Ensure `ground_truth` folder and files exist -4. Ensure `labels.json` is in the correct format (see `tax_form_segmented_labels.json` as an example) -5. When running make sure to pass arguments in this order: - -* `/path/to/image/folder` (path to the original image files which we need to run OCR on) -* `/path/to/segmentation_template.png` (single file) -* `/path/to/labels.json` (single file) -* `/path/to/output/folder` (path to folder where the output would be. This should exist but can be empty) -* `/path/to/ground/truth_folder` (path to folder for metrics that we would compare against) -* `/path/to/csv_out_folder` (path to folder where all metrics would be. This should exist but can be empty) - -By default, segmentation, OCR, and metrics computation are all run together. To disable one or the other, pass the `--no-ocr` or `--no-metrics` flags. You can change the backend model by passing `--model=...` as well. - -Run notes: -* Benchmark takes one second per segment for OCR using the default `trocr` model. Please be patient or set a counter to limit the number of files processed. -* Only one segment can be input at a time - - -### Test Data Sets - -You can run the script `pytest run reportvision-dataset-1/medical_report_import.py` to pull in all relevant data. - - ### Development Tools Adding new dependencies @@ -147,8 +108,6 @@ To build the OCR service into an executable artifact poetry run build ``` - - ### Dockerized Development It is also possible to run the project in a collection of docker containers. This is useful for development and testing purposes as it doesn't require any additional dependencies to be installed. @@ -169,6 +128,45 @@ The frontend container will automatically reload when changes are made to the fr The OCR service container will restart automatically when changes are made to the OCR code. To access the API, navigate to http://localhost:8000/ in your browser. +### End to End Benchmarking + +#### Overview +End-to-end benchmarking evaluates OCR accuracy by: + +End-to-end benchmarking scripts can: + +1. Segment and run OCR on a folder of images using given segmentation template and labels file. +2. Compare OCR outputs to ground truth data based on matching file names. +3. Write metrics (confidence, raw distance, Hamming distance, Levenshtein distance) as well as total metrics to a CSV file. + + +To run benchmarking: + +1. Locate file `benchmark_main.py` +2. Ensure all the paths/folders exist by downloading from [Google Drive for all segmentation/label files](https://drive.google.com/drive/folders/1WS2FYn0BTxWv0juh7lblzdMaFlI7zbDd?usp=sharing) +3. Ensure `ground_truth` folder and files exist +4. Ensure `labels.json` is in the correct format (see `tax_form_segmented_labels.json` as an example) +5. When running make sure to pass arguments in this order: + +* `/path/to/image/folder` (path to the original image files which we need to run OCR on) +* `/path/to/segmentation_template.png` (single file) +* `/path/to/labels.json` (single file) +* `/path/to/output/folder` (path to folder where the output would be. This should exist but can be empty) +* `/path/to/ground/truth_folder` (path to folder for metrics that we would compare against) +* `/path/to/csv_out_folder` (path to folder where all metrics would be. This should exist but can be empty) + +By default, segmentation, OCR, and metrics computation are all run together. To disable one or the other, pass the `--no-ocr` or `--no-metrics` flags. You can change the backend model by passing `--model=...` as well. + +Run notes: +* Benchmark takes one second per segment for OCR using the default `trocr` model. Please be patient or set a counter to limit the number of files processed. +* Only one segment can be input at a time + + +### Test Data Sets + +You can run the script `pytest run reportvision-dataset-1/medical_report_import.py` to pull in all relevant data. + + ## Project Architecture The OCR Layer is organized as follows: @@ -199,3 +197,40 @@ The OCR Layer is organized as follows: - **`poetry.lock`**: Lock file generated by Poetry to ensure dependency consistency. +## API Endpoints + +The OCR service exposes the following API endpoints: + +#### Health Check +- **`GET /`** + - **Description**: Returns the status of the OCR service. + - **Response**: Status message indicating the service's health. + +#### Image Alignment +- **`POST /image_alignment/`** + - **Description**: Aligns a source image with a segmentation template. + - **Request Body**: + - `source_image` (Base64-encoded string): The source image to align. + - `segmentation_template` (Base64-encoded string): The segmentation template to align with. + - **Response**: + - Base64-encoded string of the aligned image. + +#### Image File to Text +- **`POST /image_file_to_text/`** + - **Description**: Processes an image file and a segmentation template to extract text based on labeled regions. + - **Request Body**: + - `source_image` (file): The uploaded source image file. + - `segmentation_template` (file): The uploaded segmentation template file. + - `labels` (JSON string): Defines labeled regions in the segmentation template. + - **Response**: + - JSON object containing text extracted from labeled regions. + +#### Image to Text +- **`POST /image_to_text`** + - **Description**: Processes Base64-encoded images and extracts text from labeled regions. + - **Request Body**: + - `source_image` (Base64-encoded string): The source image. + - `segmentation_template` (Base64-encoded string): The segmentation template. + - `labels` (JSON string): Defines labeled regions in the segmentation template. + - **Response**: + - JSON object containing text extracted from labeled regions. diff --git a/backend/README.md b/backend/README.md index 50e9a732..e3f4f68d 100644 --- a/backend/README.md +++ b/backend/README.md @@ -1,6 +1,6 @@ # Backend Middleware - Spring Boot Application -This document provides a guide for the **Backend Middleware** of the ReportVision project. This middleware bridges the **frontend React app** with the **OCR backend** +This document provides a guide for the **Backend Middleware** of the ReportVision project. This middleware bridges the **frontend app** with the **OCR backend** --- @@ -18,7 +18,7 @@ This document provides a guide for the **Backend Middleware** of the ReportVisio The backend of ReportVision is a **Spring Boot** application designed to: - Serve as middleware connecting the frontend with OCR. -- Manage template storage +- Manage storage of template in the DB - Act as a middle layer to pass data for OCR extraction @@ -57,7 +57,7 @@ docker exec -it /bin/bash ## Project Architecture -The backend is organized into the following key directories and files: +The backend is organized into the following directories and files: - **`src/main/java/gov/cdc/reportvision/`**: - **`controllers/`**: handle API requests from the frontend. @@ -65,7 +65,7 @@ The backend is organized into the following key directories and files: - **`models/`**: Data models representing application entities - **`repositories/`**: Interfaces for database operations, - **`config/`**: Configuration files for security, database connections, and CORS policies. - - **`utils/`**: Utility classes for tasks like validation, logging, and file manipulation. + - **`utils/`**: Utility classes for validation, logging, and file manipulation. - **`src/test/`**: Includes unit and integration tests for the backend. - **`Dockerfile`**: Docker configuration file for containerizing the application. @@ -104,7 +104,7 @@ The backend middleware exposes the following RESTful API endpoints: #### Health Check - **`GET /api/health`** - **Description**: Returns the status of the backend server. - - **Response**: A status message indicating the server's health. + - **Response**: Status message indicating the server's health. #### Template Management - **`POST /api/templates`** diff --git a/frontend/README.md b/frontend/README.md index 8dff74e0..0f7e70c3 100644 --- a/frontend/README.md +++ b/frontend/README.md @@ -8,10 +8,9 @@ Welcome to the **Frontend React App** for the ReportVision project. This guide p ## Table of Contents 1. [Introduction](#introduction) 2. [Setup and Installation](#setup-and-installation) -3. [Development Workflow](#development-workflow) -4. [Testing and E2E Commands](#testing-and-e2e-commands) -5. [Frontend Architecture](#project-architecture) -8. [Troubleshooting](#troubleshooting) +3. [Testing](#testing) +4. [Frontend Architecture](#project-architecture) +5. [Troubleshooting](#troubleshooting) @@ -33,7 +32,6 @@ Make sure you have the following installed on your machine: ```shell git clone https://github.com/CDCgov/ReportVision.git cd ReportVision/frontend - 2. Install Dependencies: ```shell @@ -52,7 +50,7 @@ npm run dev npm run tests ``` -### Testing and E2E Commands +### Testing Runs the end-to-end tests. @@ -67,7 +65,7 @@ Starts the interactive UI mode. npx playwright test --ui ``` -Runs the tests only on Desktop Chrome. +Runs the tests only on Chrome. ```shell npx playwright test --project=chromium @@ -93,7 +91,7 @@ npx playwright codegen #### Fast Refresh -Currently, two official plugins are available: +Currently, two plugins are available: - [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react/README.md) uses [Babel](https://babeljs.io/) for Fast Refresh - [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh @@ -110,7 +108,7 @@ Currently, two official plugins are available: ### Description of Key Directories and Files in the frontend: - **`public/`**: Holds public static files like images, logos, and `index.html`. These files are directly served by the development and production servers. -- **`src/`**: Contains the core application code, including React components, pages, styles, and utilities. +- **`src/`**: Contains the application code, including React components, pages, styles, and utilities. - **`components/`**: Houses UI components. - **`pages/`**: Organizes page-level components corresponding to application routes. - **`styles/`**: Includes global and component-specific styles. diff --git a/user_guide.md b/user_guide.md index ab16fcb1..79b537e1 100644 --- a/user_guide.md +++ b/user_guide.md @@ -9,13 +9,15 @@ ReportVision is a tool that automates the reading and extracting of labs from PD 2. Extract Data based on selected annotations 3. Conversion of Extracted Data to PDF's +Please see "how to" instructions in order to understand features of the Application in more detail. + ### Getting Started #### Prerequisites -1. [Python3.8](https://www.python.org/downloads/) -2. [Node23.1](https://nodejs.org/en/download) -3. [Tesseract5.5](https://formulae.brew.sh/formula/tesseract) (brew install tesseract) +1. [Python 3.8](https://www.python.org/downloads/) +2. [Node 23.1](https://nodejs.org/en/download) +3. [Tesseract 5.5](https://formulae.brew.sh/formula/tesseract) (brew install tesseract) 4. [Java21](https://www.oracle.com/java/technologies/downloads/) 5. [PostgreSQL](https://www.postgresql.org/) 6. [Docker](https://www.docker.com/) (required for DB and middleware set up) @@ -30,21 +32,31 @@ ReportVision is a tool that automates the reading and extracting of labs from PD ![](arcdiagram.png) -A React-based Single Page Application: This serves as the front-end user interface for the application. +The **ReportVision** application is composed of the following core components: + +## Components + +### 1. **React-Based Single Page Application (SPA)** +- **Purpose**: Serves as the user interface for the application. + +### 2. **ReportVision Middleware** +- **Purpose**: Acts as middleware to handle communication between the UI, OCR API, and data storage. + +### 3. **OCR API** +- **Purpose**: Performs Optical Character Recognition (OCR) on provided input. + +### 4. **Data Storage (PostgreSQL)** +- **Purpose**: Stores saved templates and extracted data. -ReportVision Middleware: Acts as middleware to handle communication between the UI, OCR API, and data storage. -Responsible for coordinating requests, processing logic, and integrating with other components. -OCR API: Runs the Optical Character Recognition (OCR) process. -Receives data from the backend, performs OCR on the provided input, and returns the extracted information to the backend. +## Infrastructure and Cloud Components -Data Storage (Postgres):A managed database for data persistence. -Stores data processed by the backend and results generated by the OCR API. -Handles both structured and unstructured data related to the application. +### Hosting +- The application is hosted in **Azure** -### Infrastructure aod Cloud Components +### Infrastructure Guide +- For detailed information on how the application is deployed and managed in Azure, refer to our [Infrastructure Guide](./infrastructure/README.md). -The application is hosted in Azure. Please see our infrastructure guide here to learn more