Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] distiset.push_to_hub() seems to have cache pathing issue #1091

Open
MoritzLaurer opened this issue Jan 7, 2025 · 2 comments
Open

[BUG] distiset.push_to_hub() seems to have cache pathing issue #1091

MoritzLaurer opened this issue Jan 7, 2025 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@MoritzLaurer
Copy link

Describe the bug
distiset.push_to_hub results in the following error with the following script.

To Reproduce

Code to reproduce

from distilabel.pipeline import Pipeline
from distilabel.llms import InferenceEndpointsLLM
from distilabel.steps import LoadDataFromDicts, LoadDataFromDisk
from distilabel.steps.tasks import TextGeneration
from prompt_templates import PromptTemplateLoader

pipeline_path = "./v3_modern_bert/"


with Pipeline(name="text-generation-pipeline", cache_dir=pipeline_path) as pipeline:
    load_dataset = LoadDataFromDisk(
        name="load_dataset",
        dataset_path="v3_modern_bert/dataset",
        output_mappings={"prompt": "instruction"},
    )

    text_generation = TextGeneration(
        name="text_generation",
        llm=InferenceEndpointsLLM(
            base_url="https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-1B-Instruct"  #Llama-3.3-70B-Instruct"
        ),
        output_mappings={"generation": "generation_1"},
    )
    
    prompt_template = PromptTemplateLoader.from_local("v3_modern_bert/judge_nli_text.yaml")
    
    
    text_judge = TextGeneration(
        name="text_judge",
        llm=InferenceEndpointsLLM(
            base_url="https://api-inference.huggingface.co/models/meta-llama/Llama-3.3-70B-Instruct"
        ),
        template=prompt_template.template,
        columns=["generation_1", "class_statement"],
        output_mappings={"generation": "judgment"},
    )

    load_dataset >> text_generation >> text_judge

if __name__ == "__main__":
    pipeline.save(pipeline_path, format="yaml")
    #pipeline.draw(
    #    "v3_modern_bert/pipeline.png",
    #    top_to_bottom=True,
    #    show_edge_labels=True,
    #)

    distiset = pipeline.run(
        use_cache=False,
        #batch_size=1,
        parameters={
            text_generation.name: {"llm": {
                "generation_kwargs": {
                    "temperature": 0.8,
                    "max_new_tokens": 512, #2048,
                    #"frequency_penalty": 0.2,
                    #"presence_penalty": 0.2,
                }
            }},
            text_judge.name: {"llm": {
                "generation_kwargs": {
                    "temperature": 0,
                    "max_new_tokens": 8
                }
            }},
        },
    )
    
    
    print(distiset.pipeline_path)
    # v3_modern_bert/text-generation-pipeline/9382a823500b899454233c337940521df3fb88eb/executions/16691af245778e4c1ff0bc7d7ba49b34a396a777/pipeline.yaml
    pipeline.save(distiset.pipeline_path, format="yaml")

    print(distiset)
    #print(distiset["default"]["train"][0].keys())
    #print(distiset["default"]["train"][0]["distilabel_metadata"])
    distiset.push_to_hub(
        "MoritzLaurer/distiset-test",
        private=False,
        #token=os.getenv("HF_TOKEN"),
        #generate_card=True,
        #include_script=True,
    )  # https://distilabel.argilla.io/latest/api/distiset/#distilabel.distiset.Distiset.push_to_hub

Results in this error from distiset.push_to_hub

╭─────────────────────────── Traceback (most recent call last) ───────────────────────────╮
│ /Users/moritzlaurer/huggingface/projects/zeroshot/zeroshot-classifier/v3_modern_bert/ge │
│ nerate.py:77 in <module>                                                                │
│                                                                                         │
│   74print(distiset)                                                                │
│   75#print(distiset["default"]["train"][0].keys())                                 │76#print(distiset["default"]["train"][0]["distilabel_metadata"])                 │
│ ❱ 77distiset.push_to_hub(                                                          │
│   78 │   │   "MoritzLaurer/distiset-test",                                              │
│   79 │   │   private=False,                                                             │
│   80 │   │   #token=os.getenv("HF_TOKEN"),                                              │
│                                                                                         │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │        distiset = Distiset({                                                        │ │
│ │                   │   default: DatasetDict({                                        │ │
│ │                   │   │   train: Dataset({                                          │ │
│ │                   │   │   │   features: ['profession', 'task_description',          │ │
│ │                   'class_names', 'class_statements', 'text_types', 'text_styles',   │ │
│ │                   'class_statement', 'text_type', 'text_style',                     │ │
│ │                   'profession_description', 'instruction', 'generation_1',          │ │
│ │                   'distilabel_metadata', 'model_name', 'judgment'],                 │ │
│ │                   │   │   │   num_rows: 350                                         │ │
│ │                   │   │   })                                                        │ │
│ │                   │   })                                                            │ │
│ │                   })                                                                │ │
│ │    load_dataset = LoadDataFromDisk(                                                 │ │
│ │                   │   exclude_from_signature={                                      │ │
│ │                   │   │   'disable_cuda_device_placement',                          │ │
│ │                   │   │   'type_info',                                              │ │
│ │                   │   │   'resources',                                              │ │
│ │                   │   │   'input_batch_size',                                       │ │
│ │                   │   │   'llm_offline_batch_generation_block_until_done',          │ │
│ │                   │   │   'llm_jobs_ids',                                           │ │
│ │                   │   │   'exclude_from_signature',                                 │ │
│ │                   │   │   'gpu_memory_utilization'                                  │ │
│ │                   │   },                                                            │ │
│ │                   │   name='load_dataset',                                          │ │
│ │                   │   resources=StepResources(                                      │ │
│ │                   │   │   replicas=1,                                               │ │
│ │                   │   │   cpus=None,                                                │ │
│ │                   │   │   gpus=None,                                                │ │
│ │                   │   │   memory=None,                                              │ │
│ │                   │   │   resources=None                                            │ │
│ │                   │   ),                                                            │ │
│ │                   │   input_mappings={},                                            │ │
│ │                   │   output_mappings={'prompt': 'instruction'},                    │ │
│ │                   │   use_cache=True,                                               │ │
│ │                   │   batch_size=50,                                                │ │
│ │                   │   repo_id=None,                                                 │ │
│ │                   │   split=None,                                                   │ │
│ │                   │   config='default',                                             │ │
│ │                   │   revision=None,                                                │ │
│ │                   │   streaming=False,                                              │ │
│ │                   │   num_examples=27638,                                           │ │
│ │                   │   storage_options=None,                                         │ │
│ │                   │   dataset_path='v3_modern_bert/dataset',                        │ │
│ │                   │   is_distiset=False,                                            │ │
│ │                   │   keep_in_memory=None                                           │ │
│ │                   )                                                                 │ │
│ │        pipeline = <distilabel.pipeline.local.Pipeline object at 0x104da06b0>        │ │
│ │   pipeline_path = './v3_modern_bert/'                                               │ │
│ │ prompt_template = TextPromptTemplate(template='You are a highly qualified text      │ │
│ │                   evaluator.\n\nYou..., template_variables=['generation_1',         │ │
│ │                   'class_statement'], metadata={}, client_parameters={},            │ │
│ │                   custom_data={}, populator='jinja2',                               │ │
│ │                   jinja2_security_level='standard')                                 │ │
│ │ text_generation = TextGeneration(                                                   │ │
│ │                   │   exclude_from_signature={                                      │ │
│ │                   │   │   'disable_cuda_device_placement',                          │ │
│ │                   │   │   'type_info',                                              │ │
│ │                   │   │   'resources',                                              │ │
│ │                   │   │   'input_batch_size',                                       │ │
│ │                   │   │   'llm_offline_batch_generation_block_until_done',          │ │
│ │                   │   │   'llm_jobs_ids',                                           │ │
│ │                   │   │   'exclude_from_signature',                                 │ │
│ │                   │   │   'gpu_memory_utilization'                                  │ │
│ │                   │   },                                                            │ │
│ │                   │   name='text_generation',                                       │ │
│ │                   │   resources=StepResources(                                      │ │
│ │                   │   │   replicas=1,                                               │ │
│ │                   │   │   cpus=None,                                                │ │
│ │                   │   │   gpus=None,                                                │ │
│ │                   │   │   memory=None,                                              │ │
│ │                   │   │   resources=None                                            │ │
│ │                   │   ),                                                            │ │
│ │                   │   input_mappings={},                                            │ │
│ │                   │   output_mappings={'generation': 'generation_1'},               │ │
│ │                   │   use_cache=True,                                               │ │
│ │                   │   input_batch_size=50,                                          │ │
│ │                   │   llm=InferenceEndpointsLLM(                                    │ │
│ │                   │   │   use_magpie_template=False,                                │ │
│ │                   │   │   magpie_pre_query_template=None,                           │ │
│ │                   │   │   generation_kwargs={                                       │ │
│ │                   │   │   │   'temperature': 0.8,                                   │ │
│ │                   │   │   │   'max_new_tokens': 512                                 │ │
│ │                   │   │   },                                                        │ │
│ │                   │   │   use_offline_batch_generation=False,                       │ │
│ │                   │   │   offline_batch_generation_block_until_done=None,           │ │
│ │                   │   │   jobs_ids=None,                                            │ │
│ │                   │   │   model_id=None,                                            │ │
│ │                   │   │   endpoint_name=None,                                       │ │
│ │                   │   │   endpoint_namespace=None,                                  │ │
│ │                   │   │                                                             │ │
│ │                   base_url='https://api-inference.huggingface.co/models/meta-llama… │ │
│ │                   │   │   api_key=SecretStr('**********'),                          │ │
│ │                   │   │   tokenizer_id=None,                                        │ │
│ │                   │   │   model_display_name=None,                                  │ │
│ │                   │   │   structured_output=None                                    │ │
│ │                   │   ),                                                            │ │
│ │                   │   group_generations=False,                                      │ │
│ │                   │   add_raw_output=True,                                          │ │
│ │                   │   add_raw_input=True,                                           │ │
│ │                   │   num_generations=1,                                            │ │
│ │                   │   use_default_structured_output=False,                          │ │
│ │                   │   system_prompt=None,                                           │ │
│ │                   │   use_system_prompt=True,                                       │ │
│ │                   │   template='{{ instruction }}',                                 │ │
│ │                   │   columns=['instruction']                                       │ │
│ │                   )                                                                 │ │
│ │      text_judge = TextGeneration(                                                   │ │
│ │                   │   exclude_from_signature={                                      │ │
│ │                   │   │   'disable_cuda_device_placement',                          │ │
│ │                   │   │   'type_info',                                              │ │
│ │                   │   │   'resources',                                              │ │
│ │                   │   │   'input_batch_size',                                       │ │
│ │                   │   │   'llm_offline_batch_generation_block_until_done',          │ │
│ │                   │   │   'llm_jobs_ids',                                           │ │
│ │                   │   │   'exclude_from_signature',                                 │ │
│ │                   │   │   'gpu_memory_utilization'                                  │ │
│ │                   │   },                                                            │ │
│ │                   │   name='text_judge',                                            │ │
│ │                   │   resources=StepResources(                                      │ │
│ │                   │   │   replicas=1,                                               │ │
│ │                   │   │   cpus=None,                                                │ │
│ │                   │   │   gpus=None,                                                │ │
│ │                   │   │   memory=None,                                              │ │
│ │                   │   │   resources=None                                            │ │
│ │                   │   ),                                                            │ │
│ │                   │   input_mappings={},                                            │ │
│ │                   │   output_mappings={'generation': 'judgment'},                   │ │
│ │                   │   use_cache=True,                                               │ │
│ │                   │   input_batch_size=50,                                          │ │
│ │                   │   llm=InferenceEndpointsLLM(                                    │ │
│ │                   │   │   use_magpie_template=False,                                │ │
│ │                   │   │   magpie_pre_query_template=None,                           │ │
│ │                   │   │   generation_kwargs={                                       │ │
│ │                   │   │   │   'temperature': 0,                                     │ │
│ │                   │   │   │   'max_new_tokens': 8                                   │ │
│ │                   │   │   },                                                        │ │
│ │                   │   │   use_offline_batch_generation=False,                       │ │
│ │                   │   │   offline_batch_generation_block_until_done=None,           │ │
│ │                   │   │   jobs_ids=None,                                            │ │
│ │                   │   │   model_id=None,                                            │ │
│ │                   │   │   endpoint_name=None,                                       │ │
│ │                   │   │   endpoint_namespace=None,                                  │ │
│ │                   │   │                                                             │ │
│ │                   base_url='https://api-inference.huggingface.co/models/meta-llama… │ │
│ │                   │   │   api_key=SecretStr('**********'),                          │ │
│ │                   │   │   tokenizer_id=None,                                        │ │
│ │                   │   │   model_display_name=None,                                  │ │
│ │                   │   │   structured_output=None                                    │ │
│ │                   │   ),                                                            │ │
│ │                   │   group_generations=False,                                      │ │
│ │                   │   add_raw_output=True,                                          │ │
│ │                   │   add_raw_input=True,                                           │ │
│ │                   │   num_generations=1,                                            │ │
│ │                   │   use_default_structured_output=False,                          │ │
│ │                   │   system_prompt=None,                                           │ │
│ │                   │   use_system_prompt=True,                                       │ │
│ │                   │   template='You are a highly qualified text evaluator.\n\nYour  │ │
│ │                   task is to read the following t'+361,                             │ │
│ │                   │   columns=['generation_1', 'class_statement']                   │ │
│ │                   )                                                                 │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                         │
│ /Users/moritzlaurer/Library/Caches/pypoetry/virtualenvs/zeroshot-classifier-Mmx6XhbH-py │
│ 3.12/lib/python3.12/site-packages/distilabel/distiset.py:156 in push_to_hub             │
│                                                                                         │
│   153 │   │   │   )                                                                     │
│   154 │   │                                                                             │
│   155 │   │   if generate_card:                                                         │
│ ❱ 156 │   │   │   self._generate_card(                                                  │
│   157 │   │   │   │   repo_id, token, include_script=include_script, filename_py=filena │
│   158 │   │   │   )                                                                     │
│   159                                                                                   │
│                                                                                         │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │         dataset = DatasetDict({                                                     │ │
│ │                   │   train: Dataset({                                              │ │
│ │                   │   │   features: ['profession', 'task_description',              │ │
│ │                   'class_names', 'class_statements', 'text_types', 'text_styles',   │ │
│ │                   'class_statement', 'text_type', 'text_style',                     │ │
│ │                   'profession_description', 'instruction', 'generation_1',          │ │
│ │                   'distilabel_metadata', 'model_name', 'judgment'],                 │ │
│ │                   │   │   num_rows: 350                                             │ │
│ │                   │   })                                                            │ │
│ │                   })                                                                │ │
│ │     filename_py = 'generate.py'                                                     │ │
│ │   generate_card = True                                                              │ │
│ │  include_script = False                                                             │ │
│ │          kwargs = {}                                                                │ │
│ │            name = 'default'                                                         │ │
│ │         private = False                                                             │ │
│ │         repo_id = 'MoritzLaurer/distiset-test'                                      │ │
│ │ script_filename = '/Users/moritzlaurer/huggingface/projects/zeroshot/zeroshot-clas… │ │
│ │     script_path = PosixPath('/Users/moritzlaurer/huggingface/projects/zeroshot/zer… │ │
│ │            self = Distiset({                                                        │ │
│ │                   │   default: DatasetDict({                                        │ │
│ │                   │   │   train: Dataset({                                          │ │
│ │                   │   │   │   features: ['profession', 'task_description',          │ │
│ │                   'class_names', 'class_statements', 'text_types', 'text_styles',   │ │
│ │                   'class_statement', 'text_type', 'text_style',                     │ │
│ │                   'profession_description', 'instruction', 'generation_1',          │ │
│ │                   'distilabel_metadata', 'model_name', 'judgment'],                 │ │
│ │                   │   │   │   num_rows: 350                                         │ │
│ │                   │   │   })                                                        │ │
│ │                   │   })                                                            │ │
│ │                   })                                                                │ │
│ │           token = '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'                           │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                         │
│ /Users/moritzlaurer/Library/Caches/pypoetry/virtualenvs/zeroshot-classifier-Mmx6XhbH-py │
│ 3.12/lib/python3.12/site-packages/distilabel/distiset.py:307 in _generate_card          │
│                                                                                         │
│   304 │   │                                                                             │
│   305 │   │   if self.log_filename_path:                                                │
│   306 │   │   │   # The same we had with "pipeline.yaml" but with the log file.         │
│ ❱ 307 │   │   │   HfApi().upload_file(                                                  │
│   308 │   │   │   │   path_or_fileobj=self.log_filename_path,                           │
│   309 │   │   │   │   path_in_repo=PIPELINE_LOG_FILENAME,                               │
│   310 │   │   │   │   repo_id=repo_id,                                                  │
│                                                                                         │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │           card = <distilabel.utils.card.dataset_card.DistilabelDatasetCard object   │ │
│ │                  at 0x16b716030>                                                    │ │
│ │    filename_py = 'generate.py'                                                      │ │
│ │ include_script = False                                                              │ │
│ │        repo_id = 'MoritzLaurer/distiset-test'                                       │ │
│ │           self = Distiset({                                                         │ │
│ │                  │   default: DatasetDict({                                         │ │
│ │                  │   │   train: Dataset({                                           │ │
│ │                  │   │   │   features: ['profession', 'task_description',           │ │
│ │                  'class_names', 'class_statements', 'text_types', 'text_styles',    │ │
│ │                  'class_statement', 'text_type', 'text_style',                      │ │
│ │                  'profession_description', 'instruction', 'generation_1',           │ │
│ │                  'distilabel_metadata', 'model_name', 'judgment'],                  │ │
│ │                  │   │   │   num_rows: 350                                          │ │
│ │                  │   │   })                                                         │ │
│ │                  │   })                                                             │ │
│ │                  })                                                                 │ │
│ │          token = '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'                            │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                         │
│ /Users/moritzlaurer/Library/Caches/pypoetry/virtualenvs/zeroshot-classifier-Mmx6XhbH-py │
│ 3.12/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114 in _inner_fn │
│                                                                                         │
│   111 │   │   if check_use_auth_token:                                                  │
│   112 │   │   │   kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_t │
│   113 │   │                                                                             │
│ ❱ 114 │   │   return fn(*args, **kwargs)                                                │
│   115 │                                                                                 │
│   116return _inner_fn  # type: ignore                                              │117                                                                                   │
│                                                                                         │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │             arg_name = 'token'                                                      │ │
│ │            arg_value = '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'                      │ │
│ │                 args = (<huggingface_hub.hf_api.HfApi object at 0x16f370bc0>,)      │ │
│ │ check_use_auth_token = True                                                         │ │
│ │            has_token = True                                                         │ │
│ │               kwargs = {                                                            │ │
│ │                        │   'path_or_fileobj':                                       │ │
│ │                        PosixPath('v3_modern_bert/text-generation-pipeline/9382a823… │ │
│ │                        │   'path_in_repo': 'pipeline.log',                          │ │
│ │                        │   'repo_id': 'MoritzLaurer/distiset-test',                 │ │
│ │                        │   'repo_type': 'dataset',                                  │ │
│ │                        │   'token': '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'         │ │
│ │                        }                                                            │ │
│ │            signature = <Signature (self, *, path_or_fileobj: 'Union[str, Path,      │ │
│ │                        bytes, BinaryIO]', path_in_repo: 'str', repo_id: 'str',      │ │
│ │                        token: 'Union[str, bool, None]' = None, repo_type:           │ │
│ │                        'Optional[str]' = None, revision: 'Optional[str]' = None,    │ │
│ │                        commit_message: 'Optional[str]' = None, commit_description:  │ │
│ │                        'Optional[str]' = None, create_pr: 'Optional[bool]' = None,  │ │
│ │                        parent_commit: 'Optional[str]' = None, run_as_future: 'bool' │ │
│ │                        = False) -> 'Union[CommitInfo, Future[CommitInfo]]'>         │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                         │
│ /Users/moritzlaurer/Library/Caches/pypoetry/virtualenvs/zeroshot-classifier-Mmx6XhbH-py │
│ 3.12/lib/python3.12/site-packages/huggingface_hub/hf_api.py:1559 in _inner              │
│                                                                                         │
│    1556 │   │   │   return self.run_as_future(fn, self, *args, **kwargs)                │
│    1557 │   │                                                                           │
│    1558 │   │   # Otherwise, call the function normally                                 │
│ ❱  1559 │   │   return fn(self, *args, **kwargs)                                        │
│    1560 │                                                                               │
│    1561_inner.is_future_compatible = True  # type: ignore                          │1562return _inner  # type: ignore                                               │
│                                                                                         │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │          args = ()                                                                  │ │
│ │   args_params = [                                                                   │ │
│ │                 │   'path_or_fileobj',                                              │ │
│ │                 │   'path_in_repo',                                                 │ │
│ │                 │   'repo_id',                                                      │ │
│ │                 │   'token',                                                        │ │
│ │                 │   'repo_type',                                                    │ │
│ │                 │   'revision',                                                     │ │
│ │                 │   'commit_message',                                               │ │
│ │                 │   'commit_description',                                           │ │
│ │                 │   'create_pr',                                                    │ │
│ │                 │   'parent_commit',                                                │ │
│ │                 │   ... +1                                                          │ │
│ │                 ]                                                                   │ │
│ │        kwargs = {                                                                   │ │
│ │                 │   'path_or_fileobj':                                              │ │
│ │                 PosixPath('v3_modern_bert/text-generation-pipeline/9382a823500b899… │ │
│ │                 │   'path_in_repo': 'pipeline.log',                                 │ │
│ │                 │   'repo_id': 'MoritzLaurer/distiset-test',                        │ │
│ │                 │   'repo_type': 'dataset',                                         │ │
│ │                 │   'token': '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'                │ │
│ │                 }                                                                   │ │
│ │ run_as_future = False                                                               │ │
│ │          self = <huggingface_hub.hf_api.HfApi object at 0x16f370bc0>                │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                         │
│ /Users/moritzlaurer/Library/Caches/pypoetry/virtualenvs/zeroshot-classifier-Mmx6XhbH-py │
│ 3.12/lib/python3.12/site-packages/huggingface_hub/hf_api.py:4746 in upload_file         │
│                                                                                         │
│    4743 │   │   commit_message = (                                                      │
│    4744 │   │   │   commit_message if commit_message is not None else f"Upload {path_in │
│    4745 │   │   )                                                                       │
│ ❱  4746 │   │   operation = CommitOperationAdd(                                         │
│    4747 │   │   │   path_or_fileobj=path_or_fileobj,                                    │
│    4748 │   │   │   path_in_repo=path_in_repo,                                          │
│    4749 │   │   )                                                                       │
│                                                                                         │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │ commit_description = None                                                           │ │
│ │     commit_message = 'Upload pipeline.log with huggingface_hub'                     │ │
│ │          create_pr = None                                                           │ │
│ │      parent_commit = None                                                           │ │
│ │       path_in_repo = 'pipeline.log'                                                 │ │
│ │    path_or_fileobj = PosixPath('v3_modern_bert/text-generation-pipeline/9382a82350… │ │
│ │            repo_id = 'MoritzLaurer/distiset-test'                                   │ │
│ │          repo_type = 'dataset'                                                      │ │
│ │           revision = None                                                           │ │
│ │      run_as_future = False                                                          │ │
│ │               self = <huggingface_hub.hf_api.HfApi object at 0x16f370bc0>           │ │
│ │              token = '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'                        │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
│ in __init__:5                                                                           │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │    path_in_repo = 'pipeline.log'                                                    │ │
│ │ path_or_fileobj = PosixPath('v3_modern_bert/text-generation-pipeline/9382a823500b8… │ │
│ │            self = CommitOperationAdd(                                               │ │
│ │                   │   path_in_repo='pipeline.log',                                  │ │
│ │                   │                                                                 │ │
│ │                   path_or_fileobj='v3_modern_bert/text-generation-pipeline/9382a82… │ │
│ │                   )                                                                 │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                         │
│ /Users/moritzlaurer/Library/Caches/pypoetry/virtualenvs/zeroshot-classifier-Mmx6XhbH-py │
│ 3.12/lib/python3.12/site-packages/huggingface_hub/_commit_api.py:170 in __post_init__   │
│                                                                                         │
│   167 │   │   if isinstance(self.path_or_fileobj, str):                                 │
│   168 │   │   │   path_or_fileobj = os.path.normpath(os.path.expanduser(self.path_or_fi │
│   169 │   │   │   if not os.path.isfile(path_or_fileobj):                               │
│ ❱ 170 │   │   │   │   raise ValueError(f"Provided path: '{path_or_fileobj}' is not a fi │
│   171 │   │   elif not isinstance(self.path_or_fileobj, (io.BufferedIOBase, bytes)):    │
│   172 │   │   │   # ^^ Inspired from: https://stackoverflow.com/questions/44584829/how- │
│   173 │   │   │   raise ValueError(                                                     │
│                                                                                         │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │ path_or_fileobj = 'v3_modern_bert/text-generation-pipeline/9382a823500b899454233c3… │ │
│ │            self = CommitOperationAdd(                                               │ │
│ │                   │   path_in_repo='pipeline.log',                                  │ │
│ │                   │                                                                 │ │
│ │                   path_or_fileobj='v3_modern_bert/text-generation-pipeline/9382a82… │ │
│ │                   )                                                                 │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Provided path: 
'v3_modern_bert/text-generation-pipeline/9382a823500b899454233c337940521df3fb88eb/execution
s/d923f6b346e482e2258cb30a2dcb7fa23070741b/pipeline.log' is not a file on the local file 
system
/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Expected behaviour
Successful push to hub.

Desktop (please complete the following information):
[tool.poetry.dependencies]
python = "^3.11"
transformers = "^4.47.1"
datasets = "^3.2.0"
accelerate = "^1.2.1"
mdutils = "^1.6.0"
scikit-learn = "^1.6.0"
tqdm = "^4.67.1"
wandb = "^0.19.1"
pandas = "^2.2.3"
distilabel = {extras = ["hf-inference-endpoints"], version = "^1.4.2"}
beautifulsoup4 = "^4.12.3"
prompt-templates = "^0.0.12"

pip freeze
accelerate==1.2.1
aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.7.0
attrs==24.3.0
beautifulsoup4==4.12.3
certifi==2024.12.14
charset-normalizer==3.4.1
click==8.1.8
datasets==3.2.0
dill==0.3.8
distilabel==1.4.2
docker-pycreds==0.4.0
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.9.0
gitdb==4.0.11
GitPython==3.1.43
h11==0.14.0
httpcore==1.0.7
httpx==0.28.1
huggingface-hub==0.26.5
idna==3.10
Jinja2==3.1.5
joblib==1.4.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
mdutils==1.6.0
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
orjson==3.10.12
packaging==24.2
pandas==2.2.3
platformdirs==4.3.6
portalocker==3.0.0
prompt-templates==0.0.12
propcache==0.2.1
protobuf==5.29.2
psutil==6.1.1
pyarrow==18.1.0
pydantic==2.10.4
pydantic_core==2.27.2
Pygments==2.18.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.2
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
ruamel.yaml==0.18.8
ruamel.yaml.clib==0.2.12
safetensors==0.4.5
scikit-learn==1.6.0
scipy==1.14.1
sentry-sdk==2.19.2
setproctitle==1.3.4
setuptools==75.6.0
shellingham==1.5.4
six==1.17.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.6
sympy==1.13.1
tblib==3.0.0
threadpoolctl==3.5.0
tokenizers==0.21.0
torch==2.5.1
tqdm==4.67.1
transformers==4.47.1
typer==0.15.1
typing_extensions==4.12.2
tzdata==2024.2
universal_pathlib==0.2.6
urllib3==2.3.0
wandb==0.19.1
xxhash==3.5.0
yarl==1.18.3

@MoritzLaurer
Copy link
Author

Note that the pipeline.log file does exist, but in a different path than the library seems to expect (it is in /Users/moritzlaurer/huggingface/projects/zeroshot/zeroshot-classifier/v3_modern_bert/text-generation-pipeline/9382a823500b899454233c337940521df3fb88eb/executions/5c964665b8a351571dce9a7edaa73b8d6465cde8/pipeline.log in my case)

@gabrielmbmb
Copy link
Member

Hi @MoritzLaurer, thanks for reporting! I'll have a look ASAP.

@gabrielmbmb gabrielmbmb self-assigned this Jan 9, 2025
@gabrielmbmb gabrielmbmb added the bug Something isn't working label Jan 9, 2025
@gabrielmbmb gabrielmbmb added this to the 1.5.0 milestone Jan 9, 2025
@gabrielmbmb gabrielmbmb modified the milestones: 1.5.0, 1.6.0 Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants