[Bug]: RuntimeError: Fastdup execution failed #361

rapidcrawler · 2025-01-07T07:51:14Z

What happened?

imgs_embs_array is numpy array of image embeddings

np.save(ip_dir+ip_file_name, imgs_embs_array)

from fastdup.engine import Fastdup
fd = Fastdup(input_dir=ip_dir)
imgs_embs_array_loaded = np.load(ip_dir+ip_file_name)
fd.run(embeddings=imgs_embs_array_loaded, annotations=annotations_df, overwrite=True)

2025-01-07 07:16:21 [FATAL] Failed to read any features
fastdup C++ error received:  2025-01-07 07:16:21 [FATAL] Failed to read any features
RuntimeError: Fastdup execution failed

What did you expect to see?

No response

What version of fastdup were you runnning on?

2.14

What version of Python were you running on?

Python 3.10

Operating System

[GCC 13.3.0]

Reproduction steps

No response

Relevant log output

No response

Attach a screenshot [Optional]

Contact Details [Optional]

[email protected]

rapidcrawler · 2025-01-07T09:35:48Z

Even when simplified as below code, still getting same error message

np.save("./input_dir/img_embds_numpy.npy", imgs_embs_array)
import fastdup
fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
fd.run()

NoneType: None
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[66], line 4
      2 import fastdup
      3 fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
----> 4 fd.run()

File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
    154     fastdup_func_params['model_path'] = model_path
    155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
    158             overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    144     else:
    145         fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146     raise ex
    148 except Exception as ex:
    149     fastdup_capture_exception(f"V1:{func.__name__}", ex)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    135 try:
    136     start_time = time.time()
--> 137     ret = func(*args, **kwargs)
    138     fastdup_performance_capture(f"V1:{func.__name__}", start_time)
    139     return ret

File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
    616 if not run_fast:
    617     if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618         raise RuntimeError('Fastdup execution failed')
    620     # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
    621     if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:

RuntimeError: Fastdup execution failed

dbickson · 2025-01-07T12:41:21Z

Hello @rapidcrawler
The proper way to save binary features to be read by fastdup is by the call: https://visual-layer.readme.io/docs/v02xx-api#save_binary_feature

Example for loading the feature is here:
https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/feature_vectors.ipynb
You need to provide the embeddings into fastdup to read them like this:

fd = fastdup.create(input_dir="images/", work_dir='output')  
fd.run(annotations=filenames, embeddings=feature_vec)

Please try it out and let us know if this works.
BTW to better debug please user versbose=True to the run() call

rapidcrawler · 2025-01-07T16:40:36Z

Thanks @dbickson, it's working now.

The general idea helped. Since I don't have direct access to images as of now, just image-embeddings, thus couldn't use save_binary_feature

But, passing available embeddings via fd.run(embeddings=np.array(embs)) helped me. Updated working code below

from fastdup.engine import Fastdup
fd = Fastdup(input_dir="/")
fd.run(embeddings=np.array(embs)
       , annotations=annotations_df
       , overwrite=True)

rapidcrawler · 2025-01-07T18:07:20Z

BTW @dbickson , any reason why the library returns error if I pass more than 5k embeddings at a time?

I.e. below code has slicer at top 5k, and it gets successfully executed and returns the answer as per expectations

start = dt.now()
from fastdup.engine import Fastdup

fd = Fastdup(input_dir="/")


fd.run(embeddings=np.array(embs)[:5000]
       , annotations=annotations_df.head(5000)
       , overwrite=True
       , verbose=True)

df_sim  = fd.similarity()
end = dt.now()
df_sim

However, If I increase the slicer index to 10k or 6k, it is returning below error message about RuntimeError: Fastdup execution failed

start = dt.now()
from fastdup.engine import Fastdup

fd = Fastdup(input_dir="/")


fd.run(embeddings=np.array(embs)[:6000]
       , annotations=annotations_df.head(6000)
       , overwrite=True
       , verbose=True)

df_sim  = fd.similarity()
end = dt.now()
df_sim

fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "/"
Work directory is set to "work_dir"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

2025-01-07 18:05:25 [FATAL] Failed to read any features
NoneType: None
fastdup C++ error received:  2025-01-07 18:05:25 [FATAL] Failed to read any features
 

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[42], line 7
      2 from fastdup.engine import Fastdup
      4 fd = Fastdup(input_dir="/")
----> 7 fd.run(embeddings=np.array(embs)[:6000]
      8        , annotations=annotations_df.head(6000)
      9        , overwrite=True)
     11 df_sim  = fd.similarity()
     12 end = dt.now()

File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
    154     fastdup_func_params['model_path'] = model_path
    155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
    158             overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    144     else:
    145         fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146     raise ex
    148 except Exception as ex:
    149     fastdup_capture_exception(f"V1:{func.__name__}", ex)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    135 try:
    136     start_time = time.time()
--> 137     ret = func(*args, **kwargs)
    138     fastdup_performance_capture(f"V1:{func.__name__}", start_time)
    139     return ret

File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
    616 if not run_fast:
    617     if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618         raise RuntimeError('Fastdup execution failed')
    620     # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
    621     if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:

RuntimeError: Fastdup execution failed

rapidcrawler · 2025-01-08T06:32:49Z

Today it's running if embeddings are ~ 500 rows. Anything more than 500 embeds is throwing RuntimeError: Fastdup execution failed error. Is there any API rate limit or internal throttling applied, that daily reduces the number of embeds that can be processed?

dbickson · 2025-01-09T12:33:41Z

Hi @rapidcrawler this is weird. Can you run() with verbose=1 so we can see what is the issue.

Alternatively, you can use v0.2 API namely:
fastdup.save_binary_features(work_dir, file_list, embedding) to save binary files to work_dir.
And then
fastdup.run(input_dir, work_dir, run_mode=2, threshold=0) to create the similarities. The output will be at
work_dir/similarity.csv

Let us know if this worked for you.

rapidcrawler added the bug Something isn't working label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: RuntimeError: Fastdup execution failed #361

[Bug]: RuntimeError: Fastdup execution failed #361

rapidcrawler commented Jan 7, 2025

rapidcrawler commented Jan 7, 2025

dbickson commented Jan 7, 2025 •

edited

Loading

rapidcrawler commented Jan 7, 2025 •

edited

Loading

rapidcrawler commented Jan 7, 2025 •

edited

Loading

rapidcrawler commented Jan 8, 2025

dbickson commented Jan 9, 2025

[Bug]: RuntimeError: Fastdup execution failed #361

[Bug]: RuntimeError: Fastdup execution failed #361

Comments

rapidcrawler commented Jan 7, 2025

What happened?

What did you expect to see?

What version of fastdup were you runnning on?

What version of Python were you running on?

Operating System

Reproduction steps

Relevant log output

Attach a screenshot [Optional]

Contact Details [Optional]

rapidcrawler commented Jan 7, 2025

dbickson commented Jan 7, 2025 • edited Loading

rapidcrawler commented Jan 7, 2025 • edited Loading

rapidcrawler commented Jan 7, 2025 • edited Loading

rapidcrawler commented Jan 8, 2025

dbickson commented Jan 9, 2025

dbickson commented Jan 7, 2025 •

edited

Loading

rapidcrawler commented Jan 7, 2025 •

edited

Loading

rapidcrawler commented Jan 7, 2025 •

edited

Loading