Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: RuntimeError: Fastdup execution failed #361

Open
rapidcrawler opened this issue Jan 7, 2025 · 6 comments
Open

[Bug]: RuntimeError: Fastdup execution failed #361

rapidcrawler opened this issue Jan 7, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@rapidcrawler
Copy link

What happened?

imgs_embs_array is numpy array of image embeddings

np.save(ip_dir+ip_file_name, imgs_embs_array)

from fastdup.engine import Fastdup
fd = Fastdup(input_dir=ip_dir)
imgs_embs_array_loaded = np.load(ip_dir+ip_file_name)
fd.run(embeddings=imgs_embs_array_loaded, annotations=annotations_df, overwrite=True)
2025-01-07 07:16:21 [FATAL] Failed to read any features
fastdup C++ error received:  2025-01-07 07:16:21 [FATAL] Failed to read any features
RuntimeError: Fastdup execution failed

What did you expect to see?

No response

What version of fastdup were you runnning on?

2.14

What version of Python were you running on?

Python 3.10

Operating System

[GCC 13.3.0]

Reproduction steps

No response

Relevant log output

No response

Attach a screenshot [Optional]

Screen Shot 2025-01-07 at 13 20 00 PM

Contact Details [Optional]

[email protected]

@rapidcrawler rapidcrawler added the bug Something isn't working label Jan 7, 2025
@rapidcrawler
Copy link
Author

Even when simplified as below code, still getting same error message

np.save("./input_dir/img_embds_numpy.npy", imgs_embs_array)
import fastdup
fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
fd.run()
NoneType: None
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[66], line 4
      2 import fastdup
      3 fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
----> 4 fd.run()

File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
    154     fastdup_func_params['model_path'] = model_path
    155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
    158             overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    144     else:
    145         fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146     raise ex
    148 except Exception as ex:
    149     fastdup_capture_exception(f"V1:{func.__name__}", ex)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    135 try:
    136     start_time = time.time()
--> 137     ret = func(*args, **kwargs)
    138     fastdup_performance_capture(f"V1:{func.__name__}", start_time)
    139     return ret

File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
    616 if not run_fast:
    617     if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618         raise RuntimeError('Fastdup execution failed')
    620     # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
    621     if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:

RuntimeError: Fastdup execution failed

@dbickson
Copy link
Collaborator

dbickson commented Jan 7, 2025

Hello @rapidcrawler
The proper way to save binary features to be read by fastdup is by the call: https://visual-layer.readme.io/docs/v02xx-api#save_binary_feature

Example for loading the feature is here:
https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/feature_vectors.ipynb
You need to provide the embeddings into fastdup to read them like this:

fd = fastdup.create(input_dir="images/", work_dir='output')  
fd.run(annotations=filenames, embeddings=feature_vec)

Please try it out and let us know if this works.
BTW to better debug please user versbose=True to the run() call

@rapidcrawler
Copy link
Author

rapidcrawler commented Jan 7, 2025

Thanks @dbickson, it's working now.

The general idea helped. Since I don't have direct access to images as of now, just image-embeddings, thus couldn't use save_binary_feature

But, passing available embeddings via fd.run(embeddings=np.array(embs)) helped me. Updated working code below

from fastdup.engine import Fastdup
fd = Fastdup(input_dir="/")
fd.run(embeddings=np.array(embs)
       , annotations=annotations_df
       , overwrite=True)

@rapidcrawler
Copy link
Author

rapidcrawler commented Jan 7, 2025

BTW @dbickson , any reason why the library returns error if I pass more than 5k embeddings at a time?

I.e. below code has slicer at top 5k, and it gets successfully executed and returns the answer as per expectations

start = dt.now()
from fastdup.engine import Fastdup

fd = Fastdup(input_dir="/")


fd.run(embeddings=np.array(embs)[:5000]
       , annotations=annotations_df.head(5000)
       , overwrite=True
       , verbose=True)

df_sim  = fd.similarity()
end = dt.now()
df_sim

However, If I increase the slicer index to 10k or 6k, it is returning below error message about RuntimeError: Fastdup execution failed

start = dt.now()
from fastdup.engine import Fastdup

fd = Fastdup(input_dir="/")


fd.run(embeddings=np.array(embs)[:6000]
       , annotations=annotations_df.head(6000)
       , overwrite=True
       , verbose=True)

df_sim  = fd.similarity()
end = dt.now()
df_sim
fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "/"
Work directory is set to "work_dir"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

2025-01-07 18:05:25 [FATAL] Failed to read any features
NoneType: None
fastdup C++ error received:  2025-01-07 18:05:25 [FATAL] Failed to read any features
 

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[42], line 7
      2 from fastdup.engine import Fastdup
      4 fd = Fastdup(input_dir="/")
----> 7 fd.run(embeddings=np.array(embs)[:6000]
      8        , annotations=annotations_df.head(6000)
      9        , overwrite=True)
     11 df_sim  = fd.similarity()
     12 end = dt.now()

File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
    154     fastdup_func_params['model_path'] = model_path
    155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
    158             overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    144     else:
    145         fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146     raise ex
    148 except Exception as ex:
    149     fastdup_capture_exception(f"V1:{func.__name__}", ex)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    135 try:
    136     start_time = time.time()
--> 137     ret = func(*args, **kwargs)
    138     fastdup_performance_capture(f"V1:{func.__name__}", start_time)
    139     return ret

File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
    616 if not run_fast:
    617     if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618         raise RuntimeError('Fastdup execution failed')
    620     # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
    621     if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:

RuntimeError: Fastdup execution failed

@rapidcrawler
Copy link
Author

Today it's running if embeddings are ~ 500 rows. Anything more than 500 embeds is throwing RuntimeError: Fastdup execution failed error. Is there any API rate limit or internal throttling applied, that daily reduces the number of embeds that can be processed?

@dbickson
Copy link
Collaborator

dbickson commented Jan 9, 2025

Hi @rapidcrawler this is weird. Can you run() with verbose=1 so we can see what is the issue.

Alternatively, you can use v0.2 API namely:
fastdup.save_binary_features(work_dir, file_list, embedding) to save binary files to work_dir.
And then
fastdup.run(input_dir, work_dir, run_mode=2, threshold=0) to create the similarities. The output will be at
work_dir/similarity.csv

Let us know if this worked for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants