Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing POSIXlt Warning !!! #4

Open
englianhu opened this issue Dec 29, 2022 · 3 comments
Open

Confusing POSIXlt Warning !!! #4

englianhu opened this issue Dec 29, 2022 · 3 comments

Comments

@englianhu
Copy link
Owner

englianhu commented Dec 29, 2022

议题

3.6 GiB [世博量化研究院*]❯ 样本
[data.table]: 
# A tibble: 1,324,800 × 12
   年月日时分           年份  季度  月份    周 周日  周分计 日分计 时分计  序列 日期      
   <dttm>              <dbl> <int> <int> <dbl> <chr>  <int>  <int>  <int> <int> <date>    
 1 2015-01-05 00:01:00  2015     1     1     1 周一       1      1      1     1 2015-01-05
 2 2015-01-05 00:02:00  2015     1     1     1 周一       2      2      2     2 2015-01-05
 3 2015-01-05 00:03:00  2015     1     1     1 周一       3      3      3     3 2015-01-05
 4 2015-01-05 00:04:00  2015     1     1     1 周一       4      4      4     4 2015-01-05
 5 2015-01-05 00:05:00  2015     1     1     1 周一       5      5      5     5 2015-01-05
 6 2015-01-05 00:06:00  2015     1     1     1 周一       6      6      6     6 2015-01-05
 7 2015-01-05 00:07:00  2015     1     1     1 周一       7      7      7     7 2015-01-05
 8 2015-01-05 00:08:00  2015     1     1     1 周一       8      8      8     8 2015-01-05
 9 2015-01-05 00:09:00  2015     1     1     1 周一       9      9      9     9 2015-01-05
10 2015-01-05 00:10:00  2015     1     1     1 周一      10     10     10    10 2015-01-05
# … with 1,324,790 more rows, and 1 more variable: 闭市价 <dbl>
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Warning messages:
1: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
2: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
3: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
4: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
5: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
6: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
7: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
8: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
9: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
10: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range3.6 GiB [世博量化研究院*]❯ head(样本$年月日时分)
[1] "2015-01-05 00:01:00 CST" "2015-01-05 00:02:00 CST" "2015-01-05 00:03:00 CST"
[4] "2015-01-05 00:04:00 CST" "2015-01-05 00:05:00 CST" "2015-01-05 00:06:00 CST"3.6 GiB [世博量化研究院*]❯ class(样本$年月日时分)
[1] "POSIXct" "POSIXt"3.6 GiB [世博量化研究院*]❯ anyNA(样本$年月日时分)
[1] FALSE3.6 GiB [世博量化研究院*]❯ anyNA.POSIXlt(样本$年月日时分)
[1] FALSE3.6 GiB [世博量化研究院*]❯ anyDuplicated.data.frame(样本)
[1] 0

《大秦赋》
忧从巫来,不可断绝;
何以解忧,唯有除巫。
秦人牧马,始于汧渭;
巫裔尽弃,瓦釜雷鸣。

上奏天朝:时间序列议题如上,而相关案例如下。

OK so what's happening is that the evaluation environment of j has strptime overwritten locally:

https://github.com/Rdatatable/data.table/blob/a8e926a48a87cd669ffe2ee310a73173be652f2b/R/data.table.R#L1151-L1154

From there, it doesn't discriminate on whether strptime is operating/producing a column. I don't think there's any easy fix to be more selective on this warning, but the message could be more helpful.

Note that AFAIK strptime can always be replaced by an as.POSIXct call (which wraps to as.POSIXlt-->strptime anyway), in which case j will be ignorant to strptime being called "under the hood" (since the call chain will end up at base::as.POSIXct and so base::strptime is used, not SDenv$strptime)

操作系统

3.6 GiB [世博量化研究院*]❯ session_info()$platform
 setting  value
 version  R version 4.2.2 (2022-10-31)
 os       RedFlag Desktop 11.0
 system   x86_64, linux-gnu
 ui       RStudio
 language zh_CN:en
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Asia/Shanghai
 date     2022-12-29
 rstudio  2022.12.0+353 Elsbeth Geranium (desktop)
 pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
3.6 GiB [世博量化研究院*]❯ sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: RedFlag Desktop 11.0

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
 [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
...
...

相关资源:

@englianhu
Copy link
Owner Author

# --------- eval = FALSE ---------
## 检验是否已设置途径。
if (!exists('.蜀道')) {
  .蜀道 <- getwd() |> 
    {\(.) str_split(., '/')}() |> 
    {\(.) c('/', .[[1]][2:5])}() |> 
    {\(.) c(., 'binary.com-interview-question-data/')}() |> 
    {\(.) paste(., collapse = '/')}() |> 
    {\(.) substring(., 2)}()
}

if (!exists('.蜀道仓库')) {
  .蜀道仓库 <- paste0(.蜀道, '文艺数据库/fx/USDJPY/仓库/')
}

## 倘若环境尚未有数据,读取文件数据。
if (!exists('样本')) {
  样本 <- readRDS(paste0(.蜀道, '文艺数据库/fx/USDJPY/样本1.rds'))
  }


✖ 3.6 GiB [世博量化研究院*]❯ 样本 %>% filter(is.na(闭市价))
Source: local data table [7,200 x 12]
Call:   `_DT2`[is.na(闭市价)]

  年月日时分           年份  季度  月份    周 周日  周分计 日分计 时分计    序列 日期      
  <dttm>              <dbl> <int> <int> <dbl> <chr>  <int>  <int>  <int>   <int> <date>    
1 2018-01-02 00:01:00  2017     1     1    53 周二       1      1      1 1123201 2018-01-02
2 2018-01-02 00:02:00  2017     1     1    53 周二       3      3      3 1123203 2018-01-02
3 2018-01-02 00:03:00  2017     1     1    53 周二       5      5      5 1123205 2018-01-02
4 2018-01-02 00:04:00  2017     1     1    53 周二       7      7      7 1123207 2018-01-02
5 2018-01-02 00:05:00  2017     1     1    53 周二       9      9      9 1123209 2018-01-02
6 2018-01-02 00:06:00  2017     1     1    53 周二      11     11     11 1123211 2018-01-02
# … with 7,194 more rows, and 1 more variable: 闭市价 <dbl>
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# Use as.data.table()/as.data.frame()/as_tibble() to access results
Warning messages:
1: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
2: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
3: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
4: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
5: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range
6: In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
  NAs introduced by coercion to integer range3.6 GiB [世博量化研究院*]❯ 样本 %>% filter(is.na(闭市价)) %>% data.frame
           年月日时分 年份 季度 月份 周 周日 周分计 日分计 时分计    序列       日期 闭市价
1 2018-01-02 00:01:00 2017    1    1 53 周二      1      1      1 1123201 2018-01-02     NA
2 2018-01-02 00:02:00 2017    1    1 53 周二      3      3      3 1123203 2018-01-02     NA
3 2018-01-02 00:03:00 2017    1    1 53 周二      5      5      5 1123205 2018-01-02     NA
4 2018-01-02 00:04:00 2017    1    1 53 周二      7      7      7 1123207 2018-01-02     NA
5 2018-01-02 00:05:00 2017    1    1 53 周二      9      9      9 1123209 2018-01-02     NA
6 2018-01-02 00:06:00 2017    1    1 53 周二     11     11     11 1123211 2018-01-02     NA
7 2018-01-02 00:07:00 2017    1    1 53 周二     13     13     13 1123213 2018-01-02     NA
8 2018-01-02 00:08:00 2017    1    1 53 周二     15     15     15 1123215 2018-01-02     NA
 [ reached 'max' / getOption("max.print") -- omitted 7192 rows ]
✖ 3.6 GiB [世博量化研究院*]❯ 样本 %>% filter(is.na(闭市价)) %>% data.frame %>% tail
              年月日时分 年份 季度 月份 周 周日 周分计 日分计 时分计    序列       日期 闭市价
7195 2018-01-06 23:55:00 2017    1    1 53 周六   7189   1429     49 1137589 2018-01-06     NA
7196 2018-01-06 23:56:00 2017    1    1 53 周六   7191   1431     51 1137591 2018-01-06     NA
7197 2018-01-06 23:57:00 2017    1    1 53 周六   7193   1433     53 1137593 2018-01-06     NA
7198 2018-01-06 23:58:00 2017    1    1 53 周六   7195   1435     55 1137595 2018-01-06     NA
7199 2018-01-06 23:59:00 2017    1    1 53 周六   7197   1437     57 1137597 2018-01-06     NA
7200 2018-01-07 00:00:00 2017    1    1 53 周日   7199   1439     59 1137599 2018-01-07     NA

@englianhu
Copy link
Owner Author

骇客入侵,人为因素:样本$年月日时分 %<>% ymd_hms(tz = 'Asia/Shanghai')样本$年月日时分 %<>% as_datetime还是出现警讯,倘若样本[, 年月日时分 := format(年月日时分, '%Y-%m-%d %H:%M:%S', tz = 'Asia/Shanghai', usetz = TRUE)]日期格式就转成了文本。

@englianhu
Copy link
Owner Author

小插曲:数据应该使用经过过滤NA值和重新赋值周分计日分计时分计序列等参数和数据的样本2,而非样本1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant