Skip to content

Latest commit

 

History

History
68 lines (52 loc) · 2.68 KB

README.md

File metadata and controls

68 lines (52 loc) · 2.68 KB

Root Word Frequency Visualization on Maps Using OCR and Stemming

This project visualizes root word frequencies extracted from images using Optical Character Recognition (OCR) and stemming techniques. The extracted words are dynamically represented as circles overlaid on a map, with circle sizes proportional to word frequency. This tool provides an interactive way to analyze text data from images in a geospatial context.

Features

  • OCR Integration: Extracts text from images using the OCR.space API.
  • Root Word Stemming: Applies Snowball stemming to group words by their root form.
  • Dynamic Circle Overlays: Visualizes word frequencies by drawing circles on the map, with sizes relative to word occurrences.
  • Geospatial Visualization: Plots recognized words over map images, allowing for interactive analysis.

Installation

  1. Clone the repository:

    git clone https://github.com/SPVillacorta/OCR-WordFreqMap.git
    cd RootWordMapVis
  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Set up the OCR API:

    • Sign up at OCR.space to get your API key.
    • Replace the placeholder API key in the code (API_KEY = 'XXXXXXXXXX') with your own.
  4. Run the application:

    python main.py

Usage

  1. Prepare an image: Place the image file (e.g., "Perth3.jpg") in the project folder or specify the path in the image_path variable.

  2. Analyze root words: The script will use OCR to extract text from the image, stem the words, and then visualize their frequencies on the map by drawing dynamic circle overlays. Each circle represents a word, with its size proportional to the word's frequency.

  3. Custom Keywords: The script currently focuses on specific root words (sandstone, lake, hill). You can modify the keywords list in the code to track other words relevant to your analysis.

Example Output

The program overlays detected words as circles on the map, with each circle size proportional to the word’s frequency. Words are also annotated next to the circles.

Example Screenshot

Dependencies

  • Python 3.x
  • matplotlib
  • requests
  • Pillow
  • spacy
  • nltk
  • wordcloud

Install these by running:

pip install matplotlib requests pillow spacy nltk wordcloud

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments