From a03dbb46b31f81ba4a7fa0703fe5a12cdbae73eb Mon Sep 17 00:00:00 2001 From: Ismat Date: Sat, 9 Nov 2024 02:39:13 +0400 Subject: [PATCH] check --- README.md | 233 ++++++++++++++++++++++++------------------------------ 1 file changed, 105 insertions(+), 128 deletions(-) diff --git a/README.md b/README.md index 4a4e58d..6cf1028 100644 --- a/README.md +++ b/README.md @@ -1,180 +1,157 @@ -# Named_Entity_Recognition +# Named Entity Recognition for Azerbaijani Language -### Custom Named Entity Recognition (NER) Model for Azerbaijani Language +A custom Named Entity Recognition (NER) model specifically designed for the Azerbaijani language. This project includes a FastAPI application for model deployment and a user-friendly frontend interface for testing and visualizing NER results. -This project provides a custom Named Entity Recognition (NER) model tailored for the Azerbaijani language. It includes a FastAPI application for deploying the model, as well as a frontend interface to test and view the NER results. +## Demo -### Demo +Try the live demo: [Named Entity Recognition Demo](https://named-entity-recognition.fly.dev/) -You can try out the deployed model here: [Named Entity Recognition Demo](https://named-entity-recognition.fly.dev/) +**Note:** The server runs on a free tier and may take 1-2 minutes to initialize if inactive. Please be patient during startup. -**Note:** The server is hosted on a free tier, so it may take 1-2 minutes to wake up if it’s inactive when you access it. Please be patient as the server starts up. +## Project Structure -## File Structure - -```plaintext +``` . -├── Dockerfile # Defines instructions for building the Docker image -├── README.md # Project overview, setup, and usage instructions -├── fly.toml # Configuration file for Fly.io deployment -├── main.py # Main FastAPI app file handling API endpoints and model loading -├── models # Contains model-related notebooks, scripts, and data -│ ├── XLM-RoBERTa.ipynb # Notebook for XLM-RoBERTa model training/testing -│ ├── mBERT.ipynb # Notebook for mBERT model training/testing -│ ├── push_to_HF.py # Script to push model to Hugging Face hub -│ └── train-00000-of-00001.parquet # Parquet file with model training/evaluation data -├── requirements.txt # Lists all Python dependencies for the project -├── static # Contains frontend assets (JavaScript, CSS) -│ ├── app.js # JavaScript for handling frontend functionality -│ └── style.css # CSS for styling the frontend interface -└── templates # HTML templates for rendering the frontend interface - └── index.html # Main HTML file for the user interface +├── Dockerfile # Docker image configuration +├── README.md # Project documentation +├── fly.toml # Fly.io deployment configuration +├── main.py # FastAPI application entry point +├── models/ # Model-related files +│ ├── XLM-RoBERTa.ipynb # XLM-RoBERTa training notebook +│ ├── mBERT.ipynb # mBERT training notebook +│ ├── push_to_HF.py # Hugging Face upload script +│ └── train.parquet # Training data +├── requirements.txt # Python dependencies +├── static/ # Frontend assets +│ ├── app.js # Frontend logic +│ └── style.css # UI styling +└── templates/ # HTML templates + └── index.html # Main UI template ``` -## Data and Model Links - -- **Dataset**: [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset) -- **mBERT Model**: [mBERT Azerbaijani NER](https://huggingface.co/IsmatS/mbert-az-ner) -- **XLM-RoBERTa Model**: [XLM-RoBERTa Azerbaijani NER](https://huggingface.co/IsmatS/xlm-roberta-az-ner) -- **XLM-RoBERTa Large Model**: [XLM-RoBERTa Large Azerbaijani NER](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner) -- **Azeri-Turkish-BERT-NER**: [Azerbaijani-Turkish BERT Base NER](https://huggingface.co/IsmatS/azeri-turkish-bert-ner) +## Models & Dataset +### Available Models -All four models were fine-tuned on a premium A100 GPU in Google Colab for optimized training performance. +- [mBERT Azerbaijani NER](https://huggingface.co/IsmatS/mbert-az-ner) +- [XLM-RoBERTa Azerbaijani NER](https://huggingface.co/IsmatS/xlm-roberta-az-ner) +- [XLM-RoBERTa Large Azerbaijani NER](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner) +- [Azerbaijani-Turkish BERT Base NER](https://huggingface.co/IsmatS/azeri-turkish-bert-ner) -**Note**: The XLM-RoBERTa base model was selected for deployment. +### Dataset +- [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset) -## Model Performance Metrics +**Note:** All models were fine-tuned on an A100 GPU using Google Colab Pro+. The XLM-RoBERTa base model is currently deployed in production. -### mBERT Model +## Model Performance -| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy | -|-------|---------------|----------------|-----------|----------|----------|----------| -| 1 | 0.295200 | 0.265711 | 0.715424 | 0.622853 | 0.665937 | 0.919136 | -| 2 | 0.248600 | 0.252083 | 0.721036 | 0.637979 | 0.676970 | 0.921439 | -| 3 | 0.206800 | 0.253372 | 0.704872 | 0.650684 | 0.676695 | 0.920898 | +### mBERT Performance -### XLM-RoBERTa Base Model +| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy | +|-------|---------------|-----------------|-----------|---------|-------|-----------| +| 1 | 0.2952 | 0.2657 | 0.7154 | 0.6229 | 0.6659 | 0.9191 | +| 2 | 0.2486 | 0.2521 | 0.7210 | 0.6380 | 0.6770 | 0.9214 | +| 3 | 0.2068 | 0.2534 | 0.7049 | 0.6507 | 0.6767 | 0.9209 | -| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | -|-------|---------------|----------------|-----------|----------|----------| -| 1 | 0.323100 | 0.275503 | 0.775799 | 0.694886 | 0.733117 | -| 2 | 0.272500 | 0.262481 | 0.739266 | 0.739900 | 0.739583 | -| 3 | 0.248600 | 0.252498 | 0.751478 | 0.741152 | 0.746280 | -| 4 | 0.236800 | 0.249968 | 0.754882 | 0.741449 | 0.748105 | -| 5 | 0.223800 | 0.252187 | 0.764390 | 0.740460 | 0.752235 | -| 6 | 0.218600 | 0.249887 | 0.756352 | 0.741646 | 0.748927 | -| 7 | 0.209700 | 0.250748 | 0.760696 | 0.739438 | 0.749916 | +### XLM-RoBERTa Base Performance -### XLM-RoBERTa Large Model +| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | +|-------|---------------|-----------------|-----------|---------|-------| +| 1 | 0.3231 | 0.2755 | 0.7758 | 0.6949 | 0.7331 | +| 3 | 0.2486 | 0.2525 | 0.7515 | 0.7412 | 0.7463 | +| 5 | 0.2238 | 0.2522 | 0.7644 | 0.7405 | 0.7522 | +| 7 | 0.2097 | 0.2507 | 0.7607 | 0.7394 | 0.7499 | -| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | -|-------|---------------|----------------|-----------|----------|----------| -| 1 | 0.407500 | 0.253823 | 0.768923 | 0.721350 | 0.744377 | -| 2 | 0.255600 | 0.249694 | 0.783549 | 0.724464 | 0.752849 | -| 3 | 0.214400 | 0.248773 | 0.750857 | 0.748900 | 0.749877 | -| 4 | 0.193400 | 0.257051 | 0.768623 | 0.740371 | 0.754232 | -| 5 | 0.169800 | 0.275679 | 0.745789 | 0.753740 | 0.749743 | -| 6 | 0.152600 | 0.288074 | 0.783131 | 0.728423 | 0.754787 | -| 7 | 0.144300 | 0.303378 | 0.758504 | 0.738069 | 0.748147 | -| 8 | 0.126800 | 0.311300 | 0.745589 | 0.750863 | 0.748217 | -| 9 | 0.119400 | 0.331631 | 0.739316 | 0.749475 | 0.744361 | -| 10 | 0.109400 | 0.344823 | 0.754268 | 0.737189 | 0.745631 | -| 11 | 0.102900 | 0.354887 | 0.751948 | 0.741285 | 0.746578 | +### XLM-RoBERTa Large Performance +| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | +|-------|---------------|-----------------|-----------|---------|-------| +| 1 | 0.4075 | 0.2538 | 0.7689 | 0.7214 | 0.7444 | +| 3 | 0.2144 | 0.2488 | 0.7509 | 0.7489 | 0.7499 | +| 6 | 0.1526 | 0.2881 | 0.7831 | 0.7284 | 0.7548 | +| 9 | 0.1194 | 0.3316 | 0.7393 | 0.7495 | 0.7444 | -### Azeri-Turkish-BERT-NER +### Azeri-Turkish-BERT Performance -| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | -|-------|---------------|-----------------|-----------|--------|-------| -| 1 | 0.433100 | 0.306711 | 0.739000 | 0.693282 | 0.715412 | -| 2 | 0.292700 | 0.275796 | 0.781565 | 0.688937 | 0.732334 | -| 3 | 0.250600 | 0.275115 | 0.758261 | 0.709425 | 0.733031 | -| 4 | 0.233700 | 0.273087 | 0.756184 | 0.716277 | 0.735689 | -| 5 | 0.214800 | 0.278477 | 0.756051 | 0.710996 | 0.732832 | -| 6 | 0.199200 | 0.286102 | 0.755068 | 0.717012 | 0.735548 | -| 7 | 0.192800 | 0.297157 | 0.742326 | 0.725802 | 0.733971 | -| 8 | 0.178900 | 0.304510 | 0.743206 | 0.723930 | 0.733442 | -| 9 | 0.171700 | 0.313845 | 0.743145 | 0.725535 | 0.734234 | +| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | +|-------|---------------|-----------------|-----------|---------|-------| +| 1 | 0.4331 | 0.3067 | 0.7390 | 0.6933 | 0.7154 | +| 3 | 0.2506 | 0.2751 | 0.7583 | 0.7094 | 0.7330 | +| 6 | 0.1992 | 0.2861 | 0.7551 | 0.7170 | 0.7355 | +| 9 | 0.1717 | 0.3138 | 0.7431 | 0.7255 | 0.7342 | +## Setup Instructions -## Setup and Usage +### Local Development -1. **Clone the repository**: - ```bash - git clone https://github.com/Ismat-Samadov/Named_Entity_Recognition.git - cd named-entity-recognition - ``` - -2. **Create and activate a virtual environment**: - ```bash - python3 -m venv .venv - source .venv/bin/activate - - # On Windows use: .venv\Scripts\activate - ``` +1. **Clone the repository** +```bash +git clone https://github.com/Ismat-Samadov/Named_Entity_Recognition.git +cd Named_Entity_Recognition +``` -3. Install dependencies: - ```bash - pip install -r requirements.txt - ``` +2. **Set up Python environment** +```bash +# Create virtual environment +python -m venv .venv -4. **Run the FastAPI app**: - ```bash - uvicorn main:app --host 0.0.0.0 --port 8080 - ``` +# Activate virtual environment +# On Unix/macOS: +source .venv/bin/activate +# On Windows: +.venv\Scripts\activate -5. **Deploy on Fly.io**: - Use the following steps to deploy the app on Fly.io. +# Install dependencies +pip install -r requirements.txt +``` -## Fly.io Deployment +3. **Run the application** +```bash +uvicorn main:app --host 0.0.0.0 --port 8080 +``` -To deploy this FastAPI app on Fly.io, follow these steps: +### Fly.io Deployment -### Step 1: Install Fly CLI -If you haven't already, install the Fly.io CLI: +1. **Install Fly CLI** ```bash +# On Unix/macOS curl -L https://fly.io/install.sh | sh ``` -### Step 2: Authenticate with Fly.io -Log in to your Fly.io account: +2. **Configure deployment** ```bash +# Login to Fly.io fly auth login -``` -### Step 3: Initialize Fly.io App -Run the following command in the root directory of your project: -```bash +# Initialize app fly launch -``` -During the launch process: -- Fly will ask you for a unique app name. -- It will detect your `Dockerfile` automatically. -- Accept default region recommendations or specify your preferred region. -### Step 4: Scale Resources -Increase memory allocation for running the model. For example, to set the memory to 2 GB: -```bash +# Configure memory (minimum 2GB recommended) fly scale memory 2048 ``` -### Step 5: Deploy the App -Once configured, deploy the app with: +3. **Deploy application** ```bash fly deploy -``` -### Step 6: Monitor and Test -To check logs and ensure the app is running correctly: -```bash +# Monitor deployment fly logs ``` -Access your deployed app at the Fly.io-provided URL (e.g., `https://your-app-name.fly.dev`). - ## Usage -Access the web interface through the Fly.io URL or `http://localhost:8080` (if running locally) to test the NER model and view recognized entities. +1. Access the application: + - Local: http://localhost:8080 + - Production: https://named-entity-recognition.fly.dev + +2. Enter Azerbaijani text in the input field +3. Click "Process" to view the named entities +4. Results will display recognized entities highlighted in different colors + +## Contributing + +Contributions are welcome! Please feel free to submit a Pull Request. + +## License -This application leverages the XLM-RoBERTa Large model fine-tuned on Azerbaijani language data for high-accuracy named entity recognition. +This project is open source and available under the MIT License. \ No newline at end of file