-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- RecallEM metric. - Aggregation steps: filtering, column selection, tagging, value overwrite. - Local inference step using vLLM; can generate synthetic datasets. - Some minor modification of the QA system instructions. - Ruff configuration file. - Evaluation split in the training script.
- Loading branch information
1 parent
f21cd32
commit b5ed97f
Showing
17 changed files
with
272 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,7 @@ | ||
/.python-version | ||
/outputs/ | ||
__pycache__/ | ||
/site/ | ||
/site/ | ||
/multirun/ | ||
wandb | ||
.ipynb_checkpoints |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
name: nq | ||
cache: false | ||
output_path: . | ||
steps: | ||
- _target_: ragfit.processing.dataset_loaders.loaders.HFLoader | ||
inputs: train | ||
dataset_config: | ||
path: Tevatron/wikipedia-nq | ||
split: train | ||
|
||
- _target_: ragfit.processing.global_steps.sampling.ShuffleSelect | ||
inputs: train | ||
shuffle: 42 | ||
limit: 10000 | ||
|
||
- _target_: ragfit.processing.local_steps.prompter.TextPrompter | ||
inputs: train | ||
prompt_file: ragfit/processing/prompts/qa-short.txt | ||
output_key: prompt | ||
mapping: | ||
query: query | ||
|
||
- _target_: ragfit.processing.local_steps.inference.HFStep | ||
inputs: train | ||
input_key: prompt | ||
output_key: generated | ||
model_kwargs: | ||
model_name_or_path: meta-llama/Meta-Llama-3.1-8B-Instruct | ||
instruction: ragfit/processing/prompts/prompt_instructions/qa-short.txt | ||
num_gpus: 2 | ||
llm_params: | ||
dtype: auto | ||
max_model_len: 4096 | ||
generation: | ||
temperature: 0 | ||
max_tokens: 50 | ||
|
||
- _target_: ragfit.processing.global_steps.output.OutputData | ||
inputs: train | ||
prefix: nq-with-answers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: ragfit.processing.global_steps.filters |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: ragfit.processing.local_steps.inference |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
"""Module containing filters""" | ||
|
||
|
||
def msmarco_positive_filter(x): | ||
return 1 in x["passages"]["is_selected"] | ||
|
||
|
||
filters = {"MSMARCO": msmarco_positive_filter} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
"""Module for inference steps, which can use LLM output to augment the data.""" | ||
|
||
from ragfit.models.vllm import VLLMInference | ||
|
||
from ..step import LocalStep | ||
|
||
|
||
class HFStep(LocalStep): | ||
""" | ||
Class for running inference with a Hugging Face model based on the vLLM engine. | ||
""" | ||
|
||
def __init__(self, input_key, output_key, model_kwargs, **kwargs): | ||
""" | ||
Initialize the HFStep class. | ||
Args: | ||
input_key (str): The key for the input text to be served as the prompt. | ||
output_key (str): The key for for saving the generated text. | ||
model_kwargs (dict): The keyword arguments to pass to the vLLM model. | ||
**kwargs: Additional keyword arguments to pass to the LocalStep. | ||
""" | ||
super().__init__(**kwargs) | ||
self.input_key = input_key | ||
self.output_key = output_key | ||
self.model_kwargs = model_kwargs | ||
self.model = VLLMInference(**model_kwargs) | ||
|
||
def process_item(self, item, index, datasets, **kwargs): | ||
prompt = item[self.input_key] | ||
response = self.model.generate(prompt) | ||
item[self.output_key] = response | ||
return item |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
You are a helpful question answerer who can provide an answer given a question and relevant context. Please answer shortly as possible and don't repeat the question. | ||
You are a helpful question answerer who can provide an answer given a question and relevant context. Answer the following question with a short span. The answer needs to be just in a few words. |
1 change: 1 addition & 0 deletions
1
ragfit/processing/prompts/prompt_instructions/qa-yes-no-maybe.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
You are a helpful question answerer who can provide an answer given a question and relevant context. Please answer with "yes", "no" or "maybe", if there is not enough information to answer the question. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
line-length = 90 | ||
|
||
[lint] | ||
select = ["E", "F", "W", "I", "N", "Q"] | ||
ignore = ["E203", "F841", "E501", "F821"] | ||
exclude = ["*.ipynb"] |
Oops, something went wrong.