Only run backend autodetection once per process #83

ssssam · 2019-01-03T00:14:01Z

This is a performance improvement. See #81 for details.

sampsyo

Awesome!! Thank you for doing this rearrangement—the speedup sounds great.

I understand the problem with API layering you described in #81, but I think we can solve it a different way without introducing the _backends global variable. I'm worried about introducing this implicit, hidden global state—this kind of thing can make the library unpredictable in corner cases, possibly problematic in multithreaded programs, and sometimes inconvenient in testing suites, which need to be careful to reset the state between tests.

Can we start first with a more explicit design? A straightforward way might be to keep the new backends parameter to audio_open but to automatically save away the result of available_backends.

It will be a little more work, but then we can also let pyacoustid clients specify their backends. I think it's OK for the beets chroma plugin to directly use audioread to save this backend information.

sampsyo · 2019-01-03T03:29:24Z

audioread/ffdec.py

+    if proc.returncode == 0:
+        return True
+    else:
+        return False


Just return proc.returncode == 0 or return not bool(proc.returncode) would be a little more concise than this if/else.

sampsyo · 2019-01-03T03:32:25Z

audioread/__init__.py

+        result.append(maddec.MadAudioFile)
+
+    # FFmpeg.
+    if _ffmpeg_available():


It's probably fine to just call ffdec.available() here instead of making a _ffmpeg_available function here. The only reason the import is conditional for the MAD backend, for example, is that the import will fail if pymad isn't installed. The ffmpeg backend can be imported anyway (and must be imported to run the check!).

It's also worth noting (but not worth worrying about) that this is introducing a small overhead in the one-call case: we now have to invoke ffmpeg twice, once as a backend availability check and once to actually do the decoding. In the old world, we just ran the ffmpeg command once and let it fail if the command isn't available.

OK, I'll change it.

ssssam · 2019-01-03T18:17:41Z

Thanks for the review. Yes, it's a good idea to avoid global state. We might need to rationalize the situation in pyacoustid and the beets/chroma plugin though:

Currently, beets/chroma doesn't depend on audioread. Can I change it so that it does depend on a future release of audioread that contains this branch?
pyacoustid currently only soft-depends on audioread. The fingerprint_file() function has a conditional which uses audioread and the 'chromaprint' library if both are available, and otherwise the fpcalc program. Are the benefits to supporting all these options? We can easily add a parameter to the fingerprint_file() function that accepts a list of audioread backends, but the behaviour of the function will then depend on 4 different factors: whether audioread is installed, the contents of the backends parameter, whether the chromaprint library is installed, and whether the fpcalc program is installed. It's difficult for API users to test that their code works in each of these different situations, so maybe we could take this opportunity to simplify pyacoustid, and remove either the audioread or the fpcalc option?

sampsyo · 2019-01-03T21:13:05Z

Currently, beets/chroma doesn't depend on audioread. Can I change it so that it does depend on a future release of audioread that contains this branch?

Yes, sounds good!

pyacoustid currently only soft-depends on audioread. The fingerprint_file() function has a conditional which uses audioread and the 'chromaprint' library if both are available, and otherwise the fpcalc program.

Oh right; I completely forgot about that feature. This makes the situation a little more complicated, for both pyacoustid and for beets. It is actually really quite convenient for pyacoustid to support the command-line tool, which sometimes easier to install (e.g., on Windows) than the library. On the other hand, talking directly to the library is the "right way" to do things on systems with a proper package manager.

Here's a strategy that might be nice to have, even if we weren't doing this whole "persistent backend detection" thing: empower pyacoustid with flags that control which backend is used. The appropriate function could behave as it does now when nothing specific is supplied (preserving backwards compatibility), or it could:

Accept a path to the fpcalc executable to use (and fail if it's not available/executable).
Accept a list of audioread backends to use (and not attempt to use fpcalc at all).

Having that additional level of control would be nice, and it wouldn't impact the configuration-free use case either.

We were reading audio data with the Gst.Buffer.extract_dup() method. This allocates new memory using g_malloc() and returns it to the caller. The memory needs to be freed with g_free(), however the PyGObject bindings do not do this. We can avoid problem by reading the audio data directory from the underlying Gst.Memory object. In this case the Python interpreter is responsible for copying the data and so it is able to correctly free the memory after it's no longer needed. I tested this by calling pyacoustid.fingerprint() on 34 .MP3 files in sequence, and I saw the following difference: - memory usage without the patch: 557052 KB - memory usage with the patch: 52752 KB The generated acoustid fingerprints were identical with and without the patch.

The backend detection can be inefficient. For example, detecting CoreAudio may lead to 'ctypes' invoking a C compiler in order to search for the library. If a process calls audio_open() many times, it will be more efficient if backend detection only happens once. This is now possible as there is a new available_backend() method which you can call once and then save the result. The list of backends can be passed to audio_open() in a new 'backends' keyword arg. Fixes beetbox#81

This avoids a circular dependency between __init__.py and the decoder modules.

ssssam · 2019-01-20T13:22:57Z

Since #84 I've weirdly not seen any performance issues related to backend autodetection. I'm not sure why. However, I think it's still a good idea to add a separate available_backends() method, so I've updated this branch and slightly cleaned it up.

Only run backend autodetection once per process

sampsyo · 2019-01-20T17:44:16Z

Looks great; thank you! This will be useful to have in case we find performance problems due to backend detection in the future.

On that note, thanks for all your attention to this project over the last few weeks! I've sent you an org invitation in case you'd like to commit more work in the future. PRs are of course always still an option if you want a code review, but you should now have permission to merge your own if that ever seems appropriate.

ssssam · 2019-01-20T19:35:49Z

OK, thanks! I promise not to sneak in any bitcoin mining code ;)

ssssam changed the title ~~Only run backend audiodetection once per process~~ Only run backend autodetection once per process Jan 3, 2019

ssssam force-pushed the sam/choose-backend branch from 53ae0d9 to 064a29d Compare January 3, 2019 00:20

sampsyo reviewed Jan 3, 2019

View reviewed changes

ssssam mentioned this pull request Jan 3, 2019

ffdec: Avoid AttributeError during destruction #82

Merged

ssssam added 5 commits January 20, 2019 13:58

Add a way to detect if ffmpeg is available

3ff2f68

Simplify ffdec detection

9e8da57

Separate exceptions into their own file

774af29

This avoids a circular dependency between __init__.py and the decoder modules.

ssssam force-pushed the sam/choose-backend branch from 064a29d to 774af29 Compare January 20, 2019 13:19

sampsyo merged commit 774af29 into beetbox:master Jan 20, 2019

sampsyo added a commit that referenced this pull request Jan 20, 2019

Merge pull request #83 from ssssam/sam/choose-backend

8685358

Only run backend autodetection once per process

sampsyo added a commit that referenced this pull request Jan 20, 2019

Simplify docstring for #83

cc45665

sampsyo added a commit that referenced this pull request Jan 20, 2019

Documentation for #83

5ec8a6a

ssssam deleted the sam/choose-backend branch January 20, 2019 21:39

sampsyo mentioned this pull request May 18, 2019

audioread-2.1.7 avconv not found #93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only run backend autodetection once per process #83

Only run backend autodetection once per process #83

ssssam commented Jan 3, 2019

sampsyo left a comment

sampsyo Jan 3, 2019

ssssam Jan 3, 2019

sampsyo Jan 3, 2019

sampsyo Jan 3, 2019

ssssam Jan 3, 2019

ssssam commented Jan 3, 2019 •

edited

Loading

sampsyo commented Jan 3, 2019

ssssam commented Jan 20, 2019

sampsyo commented Jan 20, 2019

ssssam commented Jan 20, 2019

Only run backend autodetection once per process #83

Only run backend autodetection once per process #83

Conversation

ssssam commented Jan 3, 2019

sampsyo left a comment

Choose a reason for hiding this comment

sampsyo Jan 3, 2019

Choose a reason for hiding this comment

ssssam Jan 3, 2019

Choose a reason for hiding this comment

sampsyo Jan 3, 2019

Choose a reason for hiding this comment

sampsyo Jan 3, 2019

Choose a reason for hiding this comment

ssssam Jan 3, 2019

Choose a reason for hiding this comment

ssssam commented Jan 3, 2019 • edited Loading

sampsyo commented Jan 3, 2019

ssssam commented Jan 20, 2019

sampsyo commented Jan 20, 2019

ssssam commented Jan 20, 2019

ssssam commented Jan 3, 2019 •

edited

Loading