Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explain model selection in the readme, fix missing model selection argument in generate_embeddings_batch #7

Merged
merged 1 commit into from
Oct 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,26 @@ quizgen <command> [options]
quizgen anki --input-dir path/to/csv/files --root-deck-name "Root Deck Name"
```

## Model Selection

QuizGen allows you to select specific AI models for different stages of the question generation process. This can be useful for experimenting with different models or fine-tuning the output to your needs.

The following model selection arguments are available:

- `--concept-model`: Specifies the model used for concept extraction. Default is `gpt-4o-mini-2024-07-18`.
- `--questions-model`: Specifies the model used for question generation. Default is `gpt-4o-2024-08-06`.
- `--embedding-model`: Specifies the model used for generating embeddings. Default is `text-embedding-3-small`.

These arguments can be used with both single chapter and batch commands. For example:

```bash
# Generate questions using a specific model:
quizgen generate --chapter-path path/to/chapter.md --title "Course Title" --output path/to/output.json --questions-model gpt-4
# Generate embeddings using a specific model:
quizgen embeddings --batch --input-dir path/to/json/files --output-dir path/to/embeddings/output --embedding-model text-embedding-ada-002
```

## Example Anki Decks

Check out these example Anki decks generated using QuizGen:
Expand Down
12 changes: 10 additions & 2 deletions src/quizgen/scripts/generate_embeddings_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from tqdm import tqdm


def generate_embeddings(input_dir, output_dir):
def generate_embeddings(input_dir, output_dir, embedding_model):
input_dir = pathlib.Path(input_dir).resolve()
output_dir = pathlib.Path(output_dir).resolve()
json_files = list(input_dir.rglob("*.json"))
Expand All @@ -34,6 +34,8 @@ def generate_embeddings(input_dir, output_dir):
str(json_file),
"--output-path",
str(output_path),
"--embedding-model",
embedding_model,
],
check=True,
)
Expand All @@ -54,9 +56,15 @@ def main():
parser.add_argument(
"--output-dir", required=True, help="Output directory for embedding files"
)
parser.add_argument(
"--embedding-model",
type=str,
default="text-embedding-3-small",
help="Model to use for generating embeddings",
)
args = parser.parse_args()

generate_embeddings(args.input_dir, args.output_dir)
generate_embeddings(args.input_dir, args.output_dir, args.embedding_model)


if __name__ == "__main__":
Expand Down