Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only run backend autodetection once per process #83

Merged
merged 5 commits into from
Jan 20, 2019

Conversation

ssssam
Copy link
Contributor

@ssssam ssssam commented Jan 3, 2019

This is a performance improvement. See #81 for details.

@ssssam ssssam changed the title Only run backend audiodetection once per process Only run backend autodetection once per process Jan 3, 2019
@ssssam ssssam force-pushed the sam/choose-backend branch from 53ae0d9 to 064a29d Compare January 3, 2019 00:20
Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!! Thank you for doing this rearrangement—the speedup sounds great.

I understand the problem with API layering you described in #81, but I think we can solve it a different way without introducing the _backends global variable. I'm worried about introducing this implicit, hidden global state—this kind of thing can make the library unpredictable in corner cases, possibly problematic in multithreaded programs, and sometimes inconvenient in testing suites, which need to be careful to reset the state between tests.

Can we start first with a more explicit design? A straightforward way might be to keep the new backends parameter to audio_open but to automatically save away the result of available_backends.

It will be a little more work, but then we can also let pyacoustid clients specify their backends. I think it's OK for the beets chroma plugin to directly use audioread to save this backend information.

if proc.returncode == 0:
return True
else:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return proc.returncode == 0 or return not bool(proc.returncode) would be a little more concise than this if/else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

result.append(maddec.MadAudioFile)

# FFmpeg.
if _ffmpeg_available():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine to just call ffdec.available() here instead of making a _ffmpeg_available function here. The only reason the import is conditional for the MAD backend, for example, is that the import will fail if pymad isn't installed. The ffmpeg backend can be imported anyway (and must be imported to run the check!).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth noting (but not worth worrying about) that this is introducing a small overhead in the one-call case: we now have to invoke ffmpeg twice, once as a backend availability check and once to actually do the decoding. In the old world, we just ran the ffmpeg command once and let it fail if the command isn't available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll change it.

@ssssam
Copy link
Contributor Author

ssssam commented Jan 3, 2019

Thanks for the review. Yes, it's a good idea to avoid global state. We might need to rationalize the situation in pyacoustid and the beets/chroma plugin though:

  • Currently, beets/chroma doesn't depend on audioread. Can I change it so that it does depend on a future release of audioread that contains this branch?

  • pyacoustid currently only soft-depends on audioread. The fingerprint_file() function has a conditional which uses audioread and the 'chromaprint' library if both are available, and otherwise the fpcalc program. Are the benefits to supporting all these options? We can easily add a parameter to the fingerprint_file() function that accepts a list of audioread backends, but the behaviour of the function will then depend on 4 different factors: whether audioread is installed, the contents of the backends parameter, whether the chromaprint library is installed, and whether the fpcalc program is installed. It's difficult for API users to test that their code works in each of these different situations, so maybe we could take this opportunity to simplify pyacoustid, and remove either the audioread or the fpcalc option?

@sampsyo
Copy link
Member

sampsyo commented Jan 3, 2019

Currently, beets/chroma doesn't depend on audioread. Can I change it so that it does depend on a future release of audioread that contains this branch?

Yes, sounds good!

  • pyacoustid currently only soft-depends on audioread. The fingerprint_file() function has a conditional which uses audioread and the 'chromaprint' library if both are available, and otherwise the fpcalc program.

Oh right; I completely forgot about that feature. This makes the situation a little more complicated, for both pyacoustid and for beets. It is actually really quite convenient for pyacoustid to support the command-line tool, which sometimes easier to install (e.g., on Windows) than the library. On the other hand, talking directly to the library is the "right way" to do things on systems with a proper package manager.

Here's a strategy that might be nice to have, even if we weren't doing this whole "persistent backend detection" thing: empower pyacoustid with flags that control which backend is used. The appropriate function could behave as it does now when nothing specific is supplied (preserving backwards compatibility), or it could:

  • Accept a path to the fpcalc executable to use (and fail if it's not available/executable).
  • Accept a list of audioread backends to use (and not attempt to use fpcalc at all).

Having that additional level of control would be nice, and it wouldn't impact the configuration-free use case either.

We were reading audio data with the Gst.Buffer.extract_dup() method.
This allocates new memory using g_malloc() and returns it to the caller.
The memory needs to be freed with g_free(), however the PyGObject
bindings do not do this.

We can avoid problem by reading the audio data directory from the
underlying Gst.Memory object. In this case the Python interpreter is
responsible for copying the data and so it is able to correctly free
the memory after it's no longer needed.

I tested this by calling pyacoustid.fingerprint() on 34 .MP3 files in
sequence, and I saw the following difference:

  - memory usage without the patch: 557052 KB
  - memory usage with the patch: 52752 KB

The generated acoustid fingerprints were identical with and without the
patch.
The backend detection can be inefficient. For example, detecting
CoreAudio may lead to 'ctypes' invoking a C compiler in order to
search for the library.

If a process calls audio_open() many times, it will be more efficient
if backend detection only happens once. This is now possible as there
is a new available_backend() method which you can call once and then
save the result. The list of backends can be passed to audio_open()
in a new 'backends' keyword arg.

Fixes beetbox#81
This avoids a circular dependency between __init__.py and the decoder
modules.
@ssssam ssssam force-pushed the sam/choose-backend branch from 064a29d to 774af29 Compare January 20, 2019 13:19
@ssssam
Copy link
Contributor Author

ssssam commented Jan 20, 2019

Since #84 I've weirdly not seen any performance issues related to backend autodetection. I'm not sure why. However, I think it's still a good idea to add a separate available_backends() method, so I've updated this branch and slightly cleaned it up.

@sampsyo sampsyo merged commit 774af29 into beetbox:master Jan 20, 2019
sampsyo added a commit that referenced this pull request Jan 20, 2019
Only run backend autodetection once per process
sampsyo added a commit that referenced this pull request Jan 20, 2019
sampsyo added a commit that referenced this pull request Jan 20, 2019
@sampsyo
Copy link
Member

sampsyo commented Jan 20, 2019

Looks great; thank you! This will be useful to have in case we find performance problems due to backend detection in the future.

On that note, thanks for all your attention to this project over the last few weeks! I've sent you an org invitation in case you'd like to commit more work in the future. PRs are of course always still an option if you want a code review, but you should now have permission to merge your own if that ever seems appropriate.

@ssssam
Copy link
Contributor Author

ssssam commented Jan 20, 2019

OK, thanks! I promise not to sneak in any bitcoin mining code ;)

@ssssam ssssam deleted the sam/choose-backend branch January 20, 2019 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants