Skip to content

Commit

Permalink
explain model selection in the readme, fix missing model selection ar…
Browse files Browse the repository at this point in the history
…gument in generate_embeddings_batch
  • Loading branch information
kdunee committed Oct 13, 2024
1 parent 066aaea commit 33bdabd
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 2 deletions.
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,26 @@ quizgen <command> [options]
quizgen anki --input-dir path/to/csv/files --root-deck-name "Root Deck Name"
```

## Model Selection

QuizGen allows you to select specific AI models for different stages of the question generation process. This can be useful for experimenting with different models or fine-tuning the output to your needs.

The following model selection arguments are available:

- `--concept-model`: Specifies the model used for concept extraction. Default is `gpt-4o-mini-2024-07-18`.
- `--questions-model`: Specifies the model used for question generation. Default is `gpt-4o-2024-08-06`.
- `--embedding-model`: Specifies the model used for generating embeddings. Default is `text-embedding-3-small`.

These arguments can be used with both single chapter and batch commands. For example:

```bash
# Generate questions using a specific model:
quizgen generate --chapter-path path/to/chapter.md --title "Course Title" --output path/to/output.json --questions-model gpt-4
# Generate embeddings using a specific model:
quizgen embeddings --batch --input-dir path/to/json/files --output-dir path/to/embeddings/output --embedding-model text-embedding-ada-002
```

## Example Anki Decks

Check out these example Anki decks generated using QuizGen:
Expand Down
12 changes: 10 additions & 2 deletions src/quizgen/scripts/generate_embeddings_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from tqdm import tqdm


def generate_embeddings(input_dir, output_dir):
def generate_embeddings(input_dir, output_dir, embedding_model):
input_dir = pathlib.Path(input_dir).resolve()
output_dir = pathlib.Path(output_dir).resolve()
json_files = list(input_dir.rglob("*.json"))
Expand All @@ -34,6 +34,8 @@ def generate_embeddings(input_dir, output_dir):
str(json_file),
"--output-path",
str(output_path),
"--embedding-model",
embedding_model,
],
check=True,
)
Expand All @@ -54,9 +56,15 @@ def main():
parser.add_argument(
"--output-dir", required=True, help="Output directory for embedding files"
)
parser.add_argument(
"--embedding-model",
type=str,
default="text-embedding-3-small",
help="Model to use for generating embeddings",
)
args = parser.parse_args()

generate_embeddings(args.input_dir, args.output_dir)
generate_embeddings(args.input_dir, args.output_dir, args.embedding_model)


if __name__ == "__main__":
Expand Down

0 comments on commit 33bdabd

Please sign in to comment.