Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding basic set of visualizations #5

Merged
merged 14 commits into from
Jun 20, 2024

Conversation

SRSteinkamp
Copy link
Collaborator

@SRSteinkamp SRSteinkamp commented Jul 21, 2023

Closes #

Adding the following plotting functions:

  • raw data plot (from peakdet)
  • average peak (using a peak detection step)
  • power spectrum
  • histogram wrapper

Proposed Changes

  • Adding visualization submodule to the new interface folder.

Change Type

  • bugfix (+0.0.1)
  • minor (+0.1.0)
  • major (+1.0.0)
  • refactoring (no version update)
  • test (no version update)
  • infrastructure (no version update)
  • documentation (no version update)
  • other

Checklist before review

  • I added everything I wanted to add to this PR.
  • [Code or tests only] I wrote/updated the necessary docstrings.
  • [Code or tests only] I ran and passed tests locally.
  • [Documentation only] I built the docs locally.
  • My contribution is harmonious with the rest of the code: I'm not introducing repetitions.
  • My code respects the adopted style, especially linting conventions.
  • The title of this PR is explanatory on its own, enough to be understood as part of a changelog.
  • I added or indicated the right labels.
  • I added information regarding the timeline of completion for this PR.
  • Please, comment on my PR while it's a draft and give me feedback on the development!

@SRSteinkamp SRSteinkamp added Enhancement New feature or request Minormod This PR generally closes an "Enhancement" issue. It increments the minor version (0.+1.0) and removed Enhancement New feature or request labels Dec 4, 2023
@SRSteinkamp SRSteinkamp mentioned this pull request Dec 5, 2023
8 tasks
@SRSteinkamp SRSteinkamp requested a review from m-miedema December 6, 2023 08:53
Comment on lines 100 to 103
if phys.fs > 1_000:
# This should probably go to the logs, or warning
print(f"Fs = {phys.fs} Hz too high, resampling to {target_fs} Hz")
phys = pk.operations.interpolate_physio(phys, target_fs, kind="linear")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider raising an error if target_fs > 1000 Hz.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the sentiment above, I think it should be possible to re-sample to larger than 1000 Hz if one really wants to and know what they are doing.

The warning / message is probably misleading. I think I would change it to phys.fs > target_fs and display the message: "Fs = xx Hz higher than target_fs, resampling to target_fs or something.

For plotting purposes, I still think it makes sense to enforce an upper limit for the sampling rate - and 1000 Hz seemed reasonable enough for me, at the Brainhack ...

Copy link
Member

@m-miedema m-miedema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the changes are functional and look good! I was testing using Python 3.11.4 in Windows. I am leaving some specific feedback to improve small things throughout, particularly for the plot_average_peak function.

Other things that should be considered, possibly in a future PR (?) but also reasonably in this one, include:

  • functionality to save plots
  • logging functionality
  • raising errors, especially if the plotting function does not receive a Physio-type object
  • adding test coverage

Comment on lines +59 to +63
phys: pk.Physio,
window: List = [-3, 3],
target_fs: float = 1000.0,
peak_dist: float = 1.0,
peak_thr: float = 0.1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, a better way to implement this function would be to check if the Physio object already has undergone peak detection, and if so, to use the existing peaks.

Right now, it seems like these peak detection variables are buried in a fairly unexpected place. It could improve functionality in the long run to add these as fields in the Physio object itself, since they seem to be frequently specified throughout this package and it is important to be consistent with them if they need to be altered for different data.

Copy link
Collaborator Author

@SRSteinkamp SRSteinkamp Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one decision we made, is that peak-detection is fast enough, so we do it here specifically for plotting purposes, on a potentially lower sampling rate than required for other estimates of the signal.

Copy link
Member

@m-miedema m-miedema Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test whether these parameters robustly detected peaks across tricky data and a number of physio-types? My concern is that the way they are specified in the function here may not translate to different data types (e.g. a relatively high heart rate, or data with an unexpected scaling or low SNR). Since these parameters will end up being embedded (maybe unexpectedly to the end user) inside the function call, they might not be obvious to change. If the user has spent some time optimizing peak detection with peak det, it makes sense to propagate those parameters forward here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't really test this (don't really have a good grasp on the parameters). Just remembered, the reason for redoing peak detection was due to this bug https://github.com/physiopy/peakdet/issues/63, and higher fs than 1000Hz is too much for plotting. We've also been operating under the idea of applying QC to rawdata first, we did not want to pre-suppose a fixed pipeline.
Until the bug is fixed, I'd suggest, to leave the function as is. But for the automated pipeline, I think we should have an autodetection of modality and set some default parameters accordingly, but also allow the parameters to be overridden by the user in the CLI.

Comment on lines +114 to +120
peaks = list(
filter(
lambda ps: ((ps + window[0]) >= 0)
and ((ps + window[1] + 1) < phys.data.shape[0]),
phys.peaks,
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was a bit confusing to me, why are we not using these peaks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, it is discarding peaks where the data is not fully covered by the window. The idea of the figure is to plot the signal shape around peaks. (Hope that makes sense)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does, thanks! That's my understanding of what it's doing, but I'm wondering whether it wouldn't make more sense to include all peaks in the average, regardless of whether some data is being replicated inside the window.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean with "replicated". The filtering makes the process slightly easier, as it avoid indexing errors. Otherwise, some padding would be needed, but that complicates the timing (very slightly).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, I misunderstood. That makes sense to me the way it is then, thanks! Maybe add a comment to say that this is just a question of indexing errors so some peaks at the beginning/end of the series may be excluded.

freqs, psd = multimodal.power_spectrum(phys)

ax.plot(freqs, psd)
ax.set(xlabel="Frequencies", ylabel="V^2/Hz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to be more specific with the units we do know (Frequencies (Hz)) and not assume volts (Power Spectral Density)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Units are really not my thing (: ... what would that be for power a power spectrum? P/Hz?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PSD, the units depend on the units of your time series. Since this isn't something we ask for (e.g. could be mV, could be V, could be something else entirely) it's better just not to include it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could have a default unit (e.g. [a.u.]), except for when they are specified by the user (e.g. in BIDS metadata or in the CLI), in which case we can be more specific

Copy link

@beccaclements99 beccaclements99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I was testing using Python 3.11.6 on a Mac. I agree with Mary's comments, and also left some comments suggesting minor changes to the function descriptions to improve clarity. I also suggested some non-essential changes to the plot_histogram function

physioqc/interfaces/visualizations.py Outdated Show resolved Hide resolved

fig, ax = check_create_figure(ax, figsize=(7, 5))

ax.hist(signal)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't essential, but you could consider adding axis labels for the histogram and the option for the user to change the number of bins

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with adding bins, but not sure if labels should be added here, as it really is just supposed to be a wrapper for the histogram function.

window : List, optional
window size around the peak in s, by default [-3, 3]
target_fs : float, optional
sampling rate for plotting and peak detection, by default 1000.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like target_fs is only used if the sampling rate of the data is greater than 1000, so it'd be helpful to mention that here

@SRSteinkamp
Copy link
Collaborator Author

In general, the changes are functional and look good! I was testing using Python 3.11.4 in Windows. I am leaving some specific feedback to improve small things throughout, particularly for the plot_average_peak function.

Other things that should be considered, possibly in a future PR (?) but also reasonably in this one, include:

* [ ]  functionality to save plots

* [ ]  logging functionality

* [ ]  raising errors, especially if the plotting function does not receive a Physio-type object

* [ ]  adding test coverage

Hi, thanks for the feedback, the idea for PhysioQC (as far as I remember) was to create MRIQC like reportlets. I have that already implemented in a very quick and dirty fashion, building on physiopy/peakdet#11

In principle, what would happen is that you call the CLI on the raw physiological data in your BIDS directory and it creates html outputs. In that workflow the images would be saved.

That logic might provide some context regarding a few of the coding decisions. But also it has been a long while, so I'll have a closer look :)

Do you have a good guideline for implementing logging? That is something I have no experience in.

@smoia
Copy link
Member

smoia commented Apr 19, 2024

I'm not sure this is exactly what you are looking for, but:
https://betterstack.com/community/guides/logging/loguru/

(see also phys2bids for a standard python logger implementation: https://github.com/physiopy/phys2bids/blob/d796fe17f7af4aed9605bd51f4ab3d8c0609a3ae/phys2bids/phys2bids.py#L46)

In general, you will need to declare a log object at the beginning of each file (always the same object), then any time you would print something, make a log item instead (e.g. here).
If the print message is utterly important (e.g. you're changing an input parameter, or something doesn't seem right with the input parameters, but not wrong enough to raise an exception, or you're doing some unexpected trick to make the input work, e.g. transposing arrays) that's a log warning. If the print message is to inform the user that something is happening (good to do that from time to time), that's a log info, and if you are implementing loguru, it's good to throw a success every now and then after something worked out (e.g. you finished computing a metric, or created a graph, or put together the report).
@maestroque might be able to help you better!

Then, in the workflow & CLI, we can make implement a logger report level (like here)

@m-miedema
Copy link
Member

m-miedema commented Apr 19, 2024

For logging specifically, I think we can make another PR - what do you think @smoia ? It's mostly the code coverage/testing that I think needs to be added in before we can merge this. But I'm not sure of the current approach to tests in this toolbox, so let me know if that's also something better to address later.

@smoia
Copy link
Member

smoia commented Apr 19, 2024

That sounds like a sensible idea - especially after physiopy/peakdet#11 is merged as well, so we have one complete log-related PR. It would be better to open an issue to track the log points raised here though!

@SRSteinkamp
Copy link
Collaborator Author

I know this might break the general development cycle, but it might be worth also move testing to a different PR so we can make progress on physiopy/peakdet#11 .

@m-miedema
Copy link
Member

@SRSteinkamp I agree! In that case, I think that the main thing to address before we merge the PR are the specific comments we've made so far. @smoia what are your thoughts about opening a separate issue for adding test coverage?

@m-miedema
Copy link
Member

Actually, there's already an open Loguru issue, so no need to open a new one for logging.

@smoia
Copy link
Member

smoia commented Apr 19, 2024

We've never been strict about testing, so I'm inclined to say ok, given we're not in beta stage yet - as long as the testing happens (chatGPT is a great friend for that), that's fine!

Copy link
Member

@m-miedema m-miedema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good to go on this for now and move over to the workflow PR, yes? :)

@SRSteinkamp SRSteinkamp merged commit 0855a9c into physiopy:master Jun 20, 2024
1 check passed
@smoia
Copy link
Member

smoia commented Jun 20, 2024

🚀 PR was released in 0.4.0 🚀

@smoia smoia added the released This issue/pull request has been released label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Minormod This PR generally closes an "Enhancement" issue. It increments the minor version (0.+1.0) released This issue/pull request has been released
Projects
Status: Done
Status: PR review in progress
Development

Successfully merging this pull request may close these issues.

4 participants