You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
My pipeline crached and I wanted to recover it but it seems to have gotten stuck and not process anything. As discussed with @plaguss
To Reproduce
Code to reproduce
importosimportrandomos.environ["DISTILABEL_LOG_LEVEL"] ="DEBUG"fromdistilabel.llmsimportInferenceEndpointsLLM# from distilabel.llms.huggingface import InferenceEndpointsLLMfromdistilabel.pipelineimportPipelinefromdistilabel.stepsimportGroupColumns, KeepColumns, LoadDataFromHub, StepInput, stepfromdistilabel.steps.baseimportStepInputfromdistilabel.steps.tasksimportTextGenerationfromdistilabel.steps.typingimportStepOutput## At the time of writing this, the distilabel library does not support the image generation endpoint.## This is a temporary fix to allow us to use the image generation endpoint.## Let's determine the categories and subcategories for the image generation task# https://huggingface.co/spaces/google/sdxl/blob/main/app.py#L55categories= {
# included"Cinematic": [
# included"emotional",
"harmonious",
"vignette",
"highly detailed",
"high budget",
"bokeh",
"cinemascope",
"moody",
"epic",
"gorgeous",
"film grain",
"grainy",
],
# included"Photographic": [
# included"film",
"bokeh",
"professional",
"4k",
"highly detailed",
## not included"Landscape",
"Portrait",
"Macro",
"Portra",
"Gold",
"ColorPlus",
"Ektar",
"Superia",
"C200",
"CineStill",
"CineStill 50D",
"CineStill 800T",
"Tri-X",
"HP5",
"Delta",
"T-Max",
"Fomapan",
"StreetPan",
"Provia",
"Ektachrome",
"Velvia",
],
# included"Anime": [
# included"anime style",
"key visual",
"vibrant",
"studio anime",
"highly detailed",
],
# included"Manga": [
# included"vibrant",
"high-energy",
"detailed",
"iconic",
"Japanese comic style",
],
# included"Digital art": [
# included"digital artwork",
"illustrative",
"painterly",
"matte painting",
"highly detailed",
],
# included"Pixel art": [
# included"low-res",
"blocky",
"pixel art style",
"8-bit graphics",
],
# included"Fantasy art": [
# included"magnificent",
"celestial",
"ethereal",
"painterly",
"epic",
"majestic",
"magical",
"fantasy art",
"cover art",
"dreamy",
],
# included"Neonpunk": [
# included"cyberpunk",
"vaporwave",
"neon",
"vibes",
"vibrant",
"stunningly beautiful",
"crisp",
"detailed",
"sleek",
"ultramodern",
"magenta highlights",
"dark purple shadows",
"high contrast",
"cinematic",
"ultra detailed",
"intricate",
"professional",
],
# included"3D Model": [
# included"octane render",
"highly detailed",
"volumetric",
"dramatic lighting",
],
# not included"Painting": [
"Oil",
"Acrylic",
"Watercolor",
"Digital",
"Mural",
"Sketch",
"Gouache",
"Renaissance",
"Baroque",
"Romanticism",
"Impressionism",
"Expressionism",
"Cubism",
"Surrealism",
"Pop Art",
"Minimalism",
"Realism",
"Encaustic",
"Tempera",
"Fresco",
"Ink Wash",
"Spray Paint",
"Mixed Media",
],
# not included"Animation": [
# not included"Animation",
"Stop motion",
"Claymation",
"Pixel Art",
"Vector",
"Hand-drawn",
"Cutout",
"Whiteboard",
],
# not included"Illustration": [
# not included"Book",
"Comics",
"Editorial",
"Advertising",
"Technical",
"Fantasy",
"Scientific",
"Fashion",
"Storyboard",
"Concept Art",
"Manga",
"Anime",
"Digital",
"Vector",
"Design",
],
}
## We will use the Qwen2.5-72B-Instruct model for the text generation task, this will help us to generate the quality and style promptsmodel_id= (
"meta-llama/Llama-3.1-8B-Instruct"
) # "meta-llama/Meta-Llama-3.1-70B-Instruct"llm=InferenceEndpointsLLM(
# model_id=model_id,# tokenizer_id=model_id,generation_kwargs={"temperature": 0.8, "max_new_tokens": 2048},
base_url="https://rti2mzernqmo00qy.us-east-1.aws.endpoints.huggingface.cloud",
api_key=os.getenv("HF_TOKEN"),
)
## We will use two types of prompts: quality and style. The quality prompt will help us to generate the quality-enhanced prompts and the style prompt will help us to generate the style-enhanced prompts.quality_prompt="""You are an expert at refining prompts for image generation models. Your task is to enhance the given prompt by adding descriptive details and quality-improving elements, while maintaining the original intent and core concept.Follow these guidelines:1. Preserve the main subject and action of the original prompt.2. Add specific, vivid details to enhance visual clarity.3. Incorporate elements that improve overall image quality and aesthetics.4. Keep the prompt concise and avoid unnecessary words.5. Use modifiers that are appropriate for the subject matter.Example modifiers (use as reference, adapt based on some aspect that's suitable for the original prompt):- Lighting: "soft golden hour light", "dramatic chiaroscuro", "ethereal glow"- Composition: "rule of thirds", "dynamic perspective", "symmetrical balance"- Texture: "intricate details", "smooth gradients", "rich textures"- Color: "vibrant color palette", "monochromatic scheme", "complementary colors"- Atmosphere: "misty ambiance", "serene mood", "energetic atmosphere"- Technical: "high resolution", "photorealistic", "sharp focus"The enhanced prompt should be short, concise, direct, avoid unnecessary words and written as it was a human expert writing the prompt.Output only one enhanced prompt without any additional text or explanations.## Original Prompt{{ style_prompt }}## Quality-Enhanced Prompt"""style_prompt="""You are an expert at refining prompts for image generation models. Your task is to enhance the given prompt by transforming it into a specific artistic style, technique, or genre, while maintaining the original core concept.Follow these guidelines:1. Preserve the main subject and action of the original prompt but rewrite stylistic elements already present in the prompt.2. Transform the prompt into a distinctive visual style (e.g., impressionism, surrealism, cyberpunk, art nouveau).3. Incorporate style-specific elements and techniques.4. Keep the prompt concise and avoid unnecessary words.5. Use modifiers that are appropriate for the chosen style.You should use the following style, technique, genre to enhance the prompt:{{ category }} / {{ subcategory }}The enhanced prompt should be short, concise, direct, avoid unnecessary words and written as it was a human expert writing the prompt.Output only one style-enhanced prompt without any additional text or explanations.## Original Prompt{{ prompt }}## Style-Enhanced Prompt"""simplification_prompt="""You are an expert at simplifying image descriptions. Your task is to simplify the description by removing any unnecessary words and phrases, while maintaining the original intent and core concept of the description.Follow these guidelines:1. Preserve the main subject of the original description.2. Remove all any unnecessary words and phrases.3. Ensure the simplified description could have been quickly written by a human.## Original Description{{ style_prompt }}## Simplified Description"""## Let's create the pipeline to generate the quality and style promptswithPipeline(name="image_preferences_synthetic_data_generation") aspipeline:
load_data=LoadDataFromHub(name="load_dataset")
@step(inputs=["prompt"], outputs=["category", "subcategory", "prompt"])defCategorySelector(inputs: StepInput) ->"StepOutput":
result= []
forinputininputs:
# Randomly select a categorycategory=random.choice(list(categories.keys()))
# Randomly select a subcategory from the chosen categorysubcategory=random.choice(categories[category])
result.append(
{
"category": category,
"subcategory": subcategory,
"prompt": input["prompt"],
}
)
yieldresultcategory_selector=CategorySelector(name="category_selector")
style_augmentation=TextGeneration(
llm=llm,
template=style_prompt,
columns=["prompt", "category", "subcategory"],
name="style_augmentation",
output_mappings={"generation": "style_prompt"},
input_batch_size=4,
)
simplification_augmentation=TextGeneration(
llm=llm,
template=simplification_prompt,
columns=["style_prompt"],
name="simplification_augmentation",
output_mappings={"generation": "simplified_prompt"},
input_batch_size=2,
)
quality_augmentation=TextGeneration(
llm=llm,
template=quality_prompt,
columns=["style_prompt"],
name="quality_augmentation",
output_mappings={"generation": "quality_prompt"},
input_batch_size=2,
)
group_columns=GroupColumns(columns=["model_name"])
keep_columns=KeepColumns(
columns=[
"prompt",
"category",
"subcategory",
"style_prompt",
"quality_prompt",
"simplified_prompt",
]
)
(
load_data>>category_selector>>style_augmentation>> [quality_augmentation, simplification_augmentation]
>>group_columns>>keep_columns
)
## Let's run the pipeline and push the resulting dataset to the hubif__name__=="__main__":
num_examples=15000distiset=pipeline.run(
use_cache=True,
parameters={
load_data.name: {
"num_examples": num_examples,
"repo_id": "data-is-better-together/imgsys-results-prompts-shuffled-cleaned-deduplicated-english",
}
},
)
dataset_name="data-is-better-together/imgsys-results-prompts-style_v2_part1"distiset.push_to_hub(
repo_id=dataset_name,
include_script=True,
generate_card=False,
token=os.getenv("HF_TOKEN"),
)
Error
/Users/davidberenstein/Documents/programming/argilla/data-is-better-together/community-efforts/image_preferences/01_synthetic_data_generation.py
[11/20/24 11:57:03] INFO ['distilabel.pipeline'] 💾 Loading `_BatchManager` from cache: base.py:818
'/Users/davidberenstein/.cache/distilabel/pipelines/image_preferences_synthetic_data_generation/547690a76b408c68dbc115a cd73d686a459f1bb5/executions/d9d5ad105e3564c6a30f68fd97510d36831dba42/batch_manager.json'
INFO ['distilabel.pipeline'] 📝 Pipeline data will be written to base.py:866
'/Users/davidberenstein/.cache/distilabel/pipelines/image_preferences_synthetic_data_generation/547690a76b408c68dbc115a cd73d686a459f1bb5/executions/d9d5ad105e3564c6a30f68fd97510d36831dba42/data/steps_outputs'
INFO ['distilabel.pipeline'] ⌛ The steps of the pipeline will be loaded in stages: base.py:889
* Stage 0:
- 'load_dataset' (results cached, won't be loaded and executed) - 'category_selector' (results cached, won't be loaded and executed)
- 'style_augmentation' (results cached, won't be loaded and executed) - 'quality_augmentation' - 'simplification_augmentation' (results cached, won't be loaded and executed)
- 'group_columns_0'
- 'keep_columns_0'
[11/20/24 11:57:04] DEBUG ['distilabel.pipeline'] Steps to be loaded in stage 0: ['quality_augmentation', 'group_columns_0', 'keep_columns_0'] base.py:1177
DEBUG ['distilabel.pipeline'] Running 1 replica of step 'quality_augmentation' with ID 0... base.py:1339
DEBUG ['distilabel.pipeline'] Running 1 replica of step 'group_columns_0' with ID 0... base.py:1339
DEBUG ['distilabel.pipeline'] Running 1 replica of step 'keep_columns_0' with ID 0... base.py:1339
INFO ['distilabel.pipeline'] ⏳ Waiting for all the steps of stage 0 to load... base.py:1183
DEBUG ['distilabel.pipeline'] Steps from stage 0 loaded: {'quality_augmentation': -999, 'group_columns_0': -999, base.py:1193
'keep_columns_0': -999}
[11/20/24 11:57:06] DEBUG ['distilabel.step.quality_augmentation'] Step 'quality_augmentation' loaded! step_wrapper.py:102
DEBUG ['distilabel.step.quality_augmentation'] Notifying load of step 'quality_augmentation' (replica ID 0)... step_wrapper.py:137
DEBUG ['distilabel.pipeline'] Step 'quality_augmentation' loaded replicas: 1 base.py:1129
DEBUG ['distilabel.step.group_columns_0'] Step 'group_columns_0' loaded! step_wrapper.py:102
DEBUG ['distilabel.step.group_columns_0'] Notifying load of step 'group_columns_0' (replica ID 0)... step_wrapper.py:137
DEBUG ['distilabel.pipeline'] Step 'group_columns_0' loaded replicas: 1 base.py:1129
DEBUG ['distilabel.step.keep_columns_0'] Step 'keep_columns_0' loaded! step_wrapper.py:102
DEBUG ['distilabel.step.keep_columns_0'] Notifying load of step 'keep_columns_0' (replica ID 0)... step_wrapper.py:137
DEBUG ['distilabel.pipeline'] Step 'keep_columns_0' loaded replicas: 1 base.py:1129
[11/20/24 11:57:07] DEBUG ['distilabel.pipeline'] Steps from stage 0 loaded: {'quality_augmentation': 1, 'group_columns_0': 1, 'keep_columns_0': base.py:1193
1}
INFO ['distilabel.pipeline'] ⏳ Steps from stage 0 loaded: 3/3 base.py:1216
*'quality_augmentation' replicas: 1/1
*'group_columns_0' replicas: 1/1
*'keep_columns_0' replicas: 1/1
INFO ['distilabel.pipeline'] ✅ All the steps from stage 0 have been loaded! base.py:1220
DEBUG ['distilabel.pipeline'] Waiting for output batch from step... base.py:908
get stuck here
Expected behaviour
I would expect it to run from cache.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Package version: 1.4.1
Python version: 3.10
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
My pipeline crached and I wanted to recover it but it seems to have gotten stuck and not process anything. As discussed with @plaguss
To Reproduce
Code to reproduce
Error
get stuck here
Expected behaviour
I would expect it to run from cache.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: