hf_data_generation

This repo contains a simple machine-learning instruction datasets generation script using open-sourced Hugging Face models and a Hugging Face PRO subscription (required) or free tier of Groq Cloud account, which allows you to generate ~14k requests per day (30 per minute).

Installation

pip install -r requirements.txt

HF Configuration

Setup your huggingface account.
Subscribe to huggingface pro and get the api key.
Copy config.ini-example to config.ini and edit the config.ini file and add your api key.
To use a different model, edit the config.ini file and change the model name. You can get the endpoint from the hugging face model hub by clicking on "Deploy model" and copying the "Inference Endpoint (Serverless)" URL.

Groq Configuration

Setup your groq.com cloud account.
Copy config.ini-example to config.ini and edit the config.ini file and add your groq api key.
To use a different model, edit the config.ini file and change the model name. You can get the endpoint from the groq cloud account documentation (https://console.groq.com/docs/models).

Usage

From the root of your project, run the python src/main.py script.
Your data must contain a column named input with the prompt input.
The generation process is parallel but may take some time depending on data size, jobs count and timeout.
Each response is immediately included in the output file, so you don't have to wait for the process to complete.
Please, give a star if you like it. Thanks !
Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
config.ini-example		config.ini-example
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hf_data_generation

Installation

HF Configuration

Groq Configuration

Usage

About

Releases

Packages

Languages

mkurman/hf_data_generation

Folders and files

Latest commit

History

Repository files navigation

hf_data_generation

Installation

HF Configuration

Groq Configuration

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages