-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_observations() returns too many deaths #96
Comments
have you tried tracing the data back? The summary your drawing from is made using |
working on it :) |
Linked to: #93 |
it's taking me a while to track down - do you know from the top of your head where the epinow regional data comes from? |
covid-us-forecasts/models/rt/update-rt.R Line 19 in ab05dee
|
I'm very confused... three different versions to get data, three different results. Presumably I'm just tired and it is really obvious what's going on...
|
load_observations exists to load the truth data used for modelling which is stored by EpiNow2 (hence the file path). This means we can evaluate and ensemble against the correct data rather than using data that is updated retrospectively. Once anomaly correction is added this function needs to draw from another folder in which non-adjusted truth data is stored by date. |
The reason data input is different in the time series is that they were written by different people and standardisation was difficult. I don't know why load-observations gives a different result but it needs more investigation. I'd suggest graphing it. In general the use of data here had always been quite disjointed and a little messy. It would be good to rationalise aside from this potential bug. |
ok it seems like at least the first two do agree (only for some reason the green one has a week of data more when filtering for the same period). Difference apparently mostly comes from Ohio. But for some reason the US curves don't agree even if almost all of the state curves do agree. This is I assume because the data we download with get_us_deaths() gets corrected? Questions are then:
|
This sounds like potentially it is due to the internal anomaly handling in I am not sure a split out package is required to handle data only processing? Though potentially for some of the processing tasks. We need:
Anomaly correction in https://github.com/epiforecasts/EpiNow2/blob/d2b2aa6e76190000d5aad37e66f132f7c44d4644/R/create.R#L34 |
Sounds very reasonable. Should we have a quick chat at some point to discuss how to move forward and divide up work? as this is related to #88 (that one isn't merged because the plotting hasn't happened yet): how should I proceed with the PR? Keep it open until we solved the data issues, then do all the past plots there and then merge? |
Sounds like a good idea. Nite sure why this is blocking #88? I'd prefer to keep PRs modular if possible. |
What is missing from #88 is an update of past plots with all models. I'm
however unsure what data to use for the plotting
Sam Abbott <[email protected]> schrieb am Mi., 24. Feb. 2021, 20:51:
… Sounds like a good idea.
Nite sure why this is blocking #88
<#88>? I'd prefer
to keep PRs modular if possible.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJBYFLNNA2RXSGP5LSR6CF3TAVKETANCNFSM4X2R7B7Q>
.
|
Can you use the structure as present and we can fix the underling data it draws from later? |
👍 did that. PR can be merged now I think and then we can address this issue |
the function
load_observations()
which reads the case data from hereobs <- fread(here("models", "rt", "data", "summary", target_date, "reported_cases.csv"))
returns too many deaths for the US as a whole.
Numbers in the data look different from the ones on Ourworldindata and also differ from the ones returned by
get_us_deaths()
They do, however, look similar to the ones shown on Google.The text was updated successfully, but these errors were encountered: