Skip to content

Latest commit

 

History

History
501 lines (330 loc) · 19 KB

README.md

File metadata and controls

501 lines (330 loc) · 19 KB

Description

(Jump directly to How to Test)

Codes and input files are kept in the genai directory. All unit tests are kept in the test directory.

Initial Task

The initial task has three subtasks: hello world, split PDF and count word frequency in PDF. The first one is pretty straightforward. The script initial_task.py helps execute the second and third one.

Hello World

The script hello.py runs a default ‘Hello World’ job in Ganga on Local backend.

[Note: A visual tree of the working directories may help you easily follow the different code dependencies mentioned below.]

Split PDF

(Go back to Testing)

initial_task.py → task execution script (submits ganga job)

run_initial_task.sh → wrapper script that invokes the actual task script

split_pdf.py → splits PDF file

  • The script initial_task.py creates a bash script called run_initial_task.sh and submits a ganga job that executes this script as an Executable application.
  • This wrapper script, when invoked by the ganga job ‘split_pdf’, calls the python script split_pdf.py that splits the PDF file LHC.pdf into 29 separate PDFs to account for the 29 pages.
  • These extracted files are stored in the folder extracted_pages inside the genai directory.

Count Word Frequency

(Go back to Testing)

initial_task.py → task execution script (submits ganga job)

run_initial_task.sh → wrapper script that invokes the actual task script

count_it.py → counts the number of occurences of the word ‘it’ in LHC.pdf

  • The same script initial_task.py submits a ganga job named ‘count_it’ that invokes the bash script run_initial_task.sh.
  • However, this time the job passes individual page numbers and the target word ‘it’ as arguments to the bash script using ArgSplitter. As a result, run_initial_task.sh gets called 29 times to account for every page in LHC.pdf.
  • Each time run_initial_task.sh gets called, it invokes the Python script count_it.py with a different page number as one of the arguments.
  • The Python script then counts the word frequency of ‘it’ in that page and prints it out. Ganga’s ArgSplitter saves the output to a file called stdout in the user's local ganga workspace directory.
  • Then the job calls TextMerger to merge the 29 stdout files.
  • Finally, count_it.py parses the merged output by singling out the word counts, adds them up and stores the final count to a text file called count_it.txt in the genai directory.

Testing

  • There are 4 test files that contain 17 unit tests.
  • The files test_Hello.py, test_SplitPDF.py and test_CountIt.py contain tests that demonstrate if each unit that contribute to executing the tasks is working.
  • The file test_CompleteSystem.py contains 2 unit tests. These tests make complete system calls to demonstrate if the subtasks split PDF and Count word frequency are getting executed properly.
  • I used the sleep_until_completed function from ganga’s core testing framework to wait for job completion before making post-job assertions.

Interfacing Ganga

For this task, I chose the LLM deepseek-coder-1.3b-instruct. This model is trained for code generation and completion.

Shortlist LLMs

I used the Extractum LLM search directory, which has details on about 30,000 LLMs, to make a list of models that I would be able to test locally as well as on online notebook platforms Google Collab and Kaggle for free. These online platforms provide free GPU time that helped expedite testing time.

After shortlisting, I retrieved the models from Huggingface and created a test script to test their performance on the prompt.

Choose the Best Model

I tested 33 LLMs (see Appendix C: List of LLMs tested).

Based on quality of output, the best model was deepseek-coder-1.3b-instruct. It was consistently able to generate a perfect Python code snippet to approximate Pi using accept-reject simulation, generate another snippet to submit the Ganga job and also a wrapper bash script.

While testing the models, I was also able to fine tune my prompt (Appendix B shows this version).

I faced some drawbacks and challenges while testing the model.

  • The LLM could not generate proper import statements for ganga.
  • It would not use the bash script that it wrote as an argument to the ganga job. Instead, it kept passing the Python script as the argument to File or started hallucinating.
  • It used different types of markers to delineate the different code snippets. This issue made parsing its output to extract only the codes somewhat challenging.

Complete the Task

With the LLM selected and a working prompt crafted, I created two Python scripts, InterfaceGanga.py and run_InterfaceGanga.py, to programmatically generate output from the LLM. I also created a test file test_GangaLLM.py that executes a unit test to examine if the proposed code by the LLM tries to execute the job in Ganga.

  • InterfaceGanga.py

    Contains the class InterfaceGanga that contains methods to:

    • Initialize model parameters
    • Run inference on the LLM to generate output
    • Store the output
    • Extract necessary code snippets from the output
    • Write the snippets to appropriate scripts
  • run_InterfaceGanga.py

    Creates an InterfaceGanga object to generate code for the task using the LLM and store them as scripts in the genai directory.

  • test_GangaLLM.py

    Executes run_InterfaceGanga.py and checks if the code generated by the LLM attempts to execute the proposed code in Ganga. This test file is kept in the test directory.

Configuration

The setup.py file includes all the required packages to run and test my code.


How to Test

Setup Project

1.setup.webm
  • In the Linux terminal in your favourite directory, clone this repository by replacing [PAT] in the command below with your GitHub PAT (Personal Access Token) and enter the GangaGSoC2024 project directory.

    git clone https://[PAT]@github.com/dg1223/GangaGSoC2024.git
    cd GangaGSoC2024
  • Set up a virtual environment

    python3 -m venv GSoC
    cd GSoC/
    . bin/activate
  • Install dependencies (note the double dots in the second command - we need to be in the project's root directory to install additional packages)

    python -m pip install --upgrade pip wheel setuptools
    python -m pip install ..
  • Activate ganga

    activate-ganga.webm
    ./bin/ganga

Ganga Initial Task

You can run all three subtasks in the ganga prompt.

Subtask 1

Demonstrate that you can run a simple Hello World Ganga job that executes on a Local backend.

This task is executed by the script hello.py.

2.hello-world.webm

In the ganga prompt, first go to the genai directory which should be at the same level as the GSoC directory.

cd ../genai

Then run:

ganga hello.py

It will run the Hello World Ganga job. If the job runs successfully, you should see the following output:

To check the job's stdout, run the command: jobs(job_id).peek('stdout')

Let’s say the job_id is 100. Running the command shown in the output should show you the stdout of the job:

jobs(100).peek('stdout')

You should see:

Hello World
/path_to_your_ganga_workspace/user/LocalXML/100/output/stdout (END)

Press q to get back to ganga prompt.

Subtask 2

Create a job in Ganga that demonstrates splitting a job into multiple pieces and then collates the results at the end.

This task is executed by the script initial_task.py which takes the script split_pdf.py as an argument. split_pdf.py contains the logic for this subtask.

3.split-pdf.webm

In the ganga prompt, run:

ganga initial_task.py split_pdf.py

If the job runs successfully, you should see the following output:

Extracted pages from LHC.pdf have been saved in the folder /path_to_this_git_repo/GangaGSoC2024/genai/extracted_pages

For a detailed stdout, run the command: jobs(job_id).peek('stdout') in ganga prompt.

Let’s say the job_id is 101. Running the command shown in the output should show you the stdout of the job:

jobs(101).peek('stdout')

You should see 29 lines in the output. Each one of them should look like the following:

Extracted page 1 from /path_to_this_git_repo/GangaGSoC2024/genai/LHC.pdf and saved as /path_to_this_git_repo/GangaGSoC2024/genai/extracted_pages/LHC_page_1.pdf

Press q to get back to ganga prompt.

Check output

In the genai directory, you should see a new folder called extracted_pages. In this folder, there should be 29 PDF files. Page 1 of LHC.pdf has been extracted as LHC_page_1.pdf, page 2 as LHC_page_2.pdf and so on up to LHC_page_29.pdf.

Subtask 3

Create a a second job in Ganga that will count the number of occurences of the word "it" in the text of the PDF file.

This task is executed by the script initial_task.py which takes the script count_it.py as an argument. count_it.py contains the logic for this subtask.

4.count_it.webm

In the ganga prompt, run:

ganga initial_task.py count_it.py

As the job executes, you should see the following output that includes a timer. This job times out after 1 minute. It should not take more than a few seconds for this job to finish.

Waiting for job to finish. Maximum wait time: 1 minute

00:01

If the job runs successfully, you should see the following output:

>>> Frequency of the word 'it' = 31 <<<

The word count has been stored in the same directory as this script: /path_to_this_git_repo/GangaGSoC2024/genai/count_it.txt

Run this command to see the stored result: cat /path_to_this_git_repo/GangaGSoC2024/genai/count_it.txt

Run this command to check the output from TextMerger: jobs(746).peek('stdout')

Check output

As the second line in the output suggests, the job should have created a text file called count_it.txt in the genai directory. Open this file to check the word count. It should read 31.

Alternatively, you can check the content of this file by running the command shown by the third line of the job’s stdout:

cat /path_to_this_git_repo/GangaGSoC2024/genai/count_it.txt

You should see 31 in the ganga prompt.

Let’s say the job_id is 102. Running the command shown in the last line of the output should display the stdout from the job:

jobs(101).peek('stdout')

You should see the merged output from Ganga TextMergeTool.

# Ganga TextMergeTool - [date_and_timestamp] #
# Start of file /path_to_your_ganga_workspace/user/LocalXML/746/0/output/stdout #
5
...
# Ganga Merge Ended Successfully #
(END)

Press q to get back to ganga prompt.

This is all there is to checking if the ‘initial task’ was successfully executed. How to execute the unit tests is shown in the last section below.

Quit ganga and get back to the GSoC directory:

quit

Edge Cases to Consider in Counting Word Frequency

There were 2 edge cases that I needed to address to get the correct word count. I used the most popular PDF processing library pypdf to extract text from LHC.pdf. Upon examining the extracted text, I found the following edge cases:

  • The word ‘It’ appears after a line break and a bullet point in page 3.
    • It is already known that…’
  • Citation markers (square brackets [])
    • page 8: safety systems to contain it.[85]
    • page 16: TV series based on it.[177]

Interfacing Ganga

The purpose is to demonstrate that you can communicate with a Large Language Model in a programmatic way.

The most straightforward way to test this task is to run the corresponding unit test. If it passes, then the task is complete.

However, this test takes time to complete if it is run on a CPU. In this case, I suggest running all the tests together (see Running unit tests) to save time. The test first executes the run script run_InterfaceGanga.py that automatically detects if the system has a CUDA compatible GPU or not.

If you want to run this unit test spefically from test_GangaLLM.py, go to the test directory (assuming you are in GSoC):

cd ../test

Now run:

python -m pytest test_GangaLLM.py

If it passes, it means the test tried to execute the code in ganga that was proposed by the LLM.

Test Success or Failure Criteria

The test actually calls the function run_ganga_llm() from run_InterfaceGanga.py.

Success

If the LLM remains consistent with the type of answers it produced when I tested it locally, you should see 3 scripts in the test directory:

  • estimate_pi.py or pi_estimation.py, or a Python script with the same name as the function name that the LLM generated for the Pi approximation code.
  • run_ganga.sh: This script is supposed to be the Executable application for the ganga job that invokes the Python script to estimate Pi’s value.
  • run_ganga_job.py: This is the main script that submits the ganga job.

Failure

The test will fail if the script run_ganga_job.py is not found. It means the LLM either provided the code snippets in a different style than what it did during my testing or it hallucinated.

The test will also fail if it fails in its attempt to run the ganga job.

System Requirements

Depending on the system configuration. the test takes 8-25 minutes to finish on a CPU (at least Intel Core i5 3rd generation) or less than a minute on a CUDA compatible GPU such as the NVIDIA Tesla P100. Minimum memory requirements are 16GB RAM and 8GB vRAM (if run on GPU).

Running Unit Tests

(Go back to Subtask 3 or ‘Interfacing Ganga’)

5.test.webm

Assuming you are in the test directory of the project, all of the 18 unit tests can be run by executing:

python -m pytest

Test scripts:

test_ArgSplitter.py
test_CompleteSystem.py
test_CountIt.py
test_GangaLLM.py
test_Hello.py
test_SplitPDF.py
test_trivial.py

Additional References

(Go back to Preparation)

https://github.com/jncraton/languagemodels

languagemodels API documentation

👋 Welcome to MLC LLM — mlc-llm 0.1.0 documentation

New localllm lets you develop gen AI apps locally, without GPUs | Google Cloud Blog

Large Language Models for Code Generation

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

OpenAI GPT2

GitHub - GoogleCloudPlatform/localllm

Appendix A: Directory trees

(Go back to Initial task)

genai

./genai
├── count_it.py
├── hello.py
├── initial_task.py
├── __init__.py
├── InterfaceGanga.py
├── LHC.pdf
├── run_InterfaceGanga.py
└── split_pdf.py

test

./test
├── __init__.py
├── LHC.pdf
├── test_ArgSplitter.py
├── test_CompleteSystem.py
├── test_CountIt.py
├── test_GangaLLM.py
├── test_Hello.py
├── test_SplitPDF.py
└── test_trivial.py

Appendix B: Final Prompt

(Go back to Preparation)

I want to use Ganga to calculate an approximation to the number pi using an accept-reject simulation method with one million simulations. I would like to perform this calculation through a Ganga job. The job should be split into a number of subjobs that each do thousand simulations.The code should be written in Python.

Here are some instructions that you can follow.

  1. Write code to calculate the approximation of pi using the above-mentioned method.
  2. Write a bash script that will execute the code above.
  3. Run a ganga job using local backend: j = Job(name=job_name, backend=Local())
  4. Run the Bash script as an Executable application: j.application = Executable() j.application.exe = File(the_script_to_run)
  5. Use ArgSplitter to split the job: j.splitter = ArgSplitter(args=splitter_args) It should split the job into a number of subjobs that each do thousand simulations.
  6. Merge output from the splitter using TextMerger: j.postprocessors.append(TextMerger(files=['stdout']))
  7. Run the ganga job: j.submit()

Do not give me code as IPython or Jupyter prompts. Give me the python script.

Appendix C: List of LLMs tested

(Go back to Choose the best model)

List of LLMs that were tested:

# 33 models
deepseek-coder-1.3b-base
deepseek-coder-1.3b-instruct
deepseek-coder-6.7b-base
deepseek-coder-6.7b-instruct
Deci/DeciCoder-1b
ramgpt/deepseek-coder-6.7B-GPTQ
mlx-community/stable-code-3b-mlx
mlx-community/CodeLlama-7b-Python-4bit-MLX
mlx-community/CodeLlama-7b-Instruct-hf-4bit-MLX
stabilityai/stable-code-3b
stabilityai/stablecode-instruct-alpha-3b
stabilityai/stablecode-completion-alpha-3b
TheBloke/CodeLlama-7B-GGUF
TheBloke/CodeLlama-7B-GGML
TheBloke/Llama-2-Coder-7B-GGUF
TheBloke/deepseek-coder-1.3b-base-AWQ
TheBloke/deepseek-coder-6.7B-base-GGUF
TheBloke/deepseek-coder-6.7B-instruct-GGUF
TheBloke/stablecode-instruct-alpha-3b-GGML
Salesforce/codegen2-1B
microsoft/phi-1
LoneStriker/deepseek-coder-6.7b-instruct-4.0bpw-h6-exl2-2
casperhansen/mpt-7b-8k-chat-gptq
smangrul/codellama-hugcoder-merged
unsloth/gemma-2b-bnb-4bit
unsloth/codellama-13b-bnb-4bit
davzoku/cria-llama2-7b-v1.3-q4-mlx
Deci/DeciCoder-1b
WizardLM/WizardCoder-1B-V1.0
WizardLM/WizardCoder-3B-V1.0
codellama/CodeLlama-7b-hf
codellama/CodeLlama-7b-Python-hf
smallcloudai/Refact-1_6B-fim