forked from robjhyndman/linear-hierarchical-forecasting
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlhf.tex
1550 lines (1262 loc) · 81.6 KB
/
lhf.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt,a4paper,]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\usepackage{unicode-math}
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\PassOptionsToPackage{hyphens}{url} % url is loaded by hyperref
\usepackage[unicode=true]{hyperref}
\hypersetup{
pdftitle={Fast forecast reconciliation using linear models},
pdfkeywords={hierarchical forecasting, grouped forecasting, reconciling forecast, linear regression},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{geometry}
\geometry{left=2.5cm,right=2.5cm,top=2.5cm,bottom=2.5cm}
\usepackage[style=authoryear-comp,]{biblatex}
\addbibresource{references.bib}
\usepackage{longtable,booktabs}
% Fix footnotes in tables (requires footnote package)
\IfFileExists{footnote.sty}{\usepackage{footnote}\makesavenoteenv{long table}}{}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
% set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\title{Fast forecast reconciliation using linear models}
%% MONASH STUFF
%% CAPTIONS
\RequirePackage{caption}
\DeclareCaptionStyle{italic}[justification=centering]
{labelfont={bf},textfont={it},labelsep=colon}
\captionsetup[figure]{style=italic,format=hang,singlelinecheck=true}
\captionsetup[table]{style=italic,format=hang,singlelinecheck=true}
%% FONT
\RequirePackage{bera}
\RequirePackage{mathpazo}
%% HEADERS AND FOOTERS
\RequirePackage{fancyhdr}
\pagestyle{fancy}
\rfoot{\Large\sffamily\raisebox{-0.1cm}{\textbf{\thepage}}}
\makeatletter
\lhead{\textsf{\expandafter{\@title}}}
\makeatother
\rhead{}
\cfoot{}
\setlength{\headheight}{15pt}
\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}
\fancypagestyle{plain}{%
\fancyhf{} % clear all header and footer fields
\fancyfoot[C]{\sffamily\thepage} % except the center
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}}
%% MATHS
\RequirePackage{bm,amsmath}
\allowdisplaybreaks
%% GRAPHICS
\RequirePackage{graphicx}
\setcounter{topnumber}{2}
\setcounter{bottomnumber}{2}
\setcounter{totalnumber}{4}
\renewcommand{\topfraction}{0.85}
\renewcommand{\bottomfraction}{0.85}
\renewcommand{\textfraction}{0.15}
\renewcommand{\floatpagefraction}{0.8}
%\RequirePackage[section]{placeins}
%% SECTION TITLES
\RequirePackage[compact,sf,bf]{titlesec}
\titleformat{\section}[block]
{\fontsize{15}{17}\bfseries\sffamily}
{\thesection}
{0.4em}{}
\titleformat{\subsection}[block]
{\fontsize{12}{14}\bfseries\sffamily}
{\thesubsection}
{0.4em}{}
\titlespacing{\section}{0pt}{*5}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.2}
%% TITLE PAGE
\def\Date{\number\day}
\def\Month{\ifcase\month\or
January\or February\or March\or April\or May\or June\or
July\or August\or September\or October\or November\or December\fi}
\def\Year{\number\year}
\makeatletter
\def\wp#1{\gdef\@wp{#1}}\def\@wp{??/??}
\def\jel#1{\gdef\@jel{#1}}\def\@jel{??}
\def\showjel{{\large\textsf{\textbf{JEL classification:}}~\@jel}}
\def\nojel{\def\showjel{}}
\def\addresses#1{\gdef\@addresses{#1}}\def\@addresses{??}
\def\cover{{\sffamily\setcounter{page}{0}
\thispagestyle{empty}
\placefig{2}{1.5}{width=5cm}{monash2}
\placefig{16.9}{1.5}{width=2.1cm}{MBusSchool}
\begin{textblock}{4}(16.9,4)ISSN 1440-771X\end{textblock}
\begin{textblock}{7}(12.7,27.9)\hfill
\includegraphics[height=0.7cm]{AACSB}~~~
\includegraphics[height=0.7cm]{EQUIS}~~~
\includegraphics[height=0.7cm]{AMBA}
\end{textblock}
\vspace*{2cm}
\begin{center}\Large
Department of Econometrics and Business Statistics\\[.5cm]
\footnotesize http://monash.edu/business/ebs/research/publications
\end{center}\vspace{2cm}
\begin{center}
\fbox{\parbox{14cm}{\begin{onehalfspace}\centering\Huge\vspace*{0.3cm}
\textsf{\textbf{\expandafter{\@title}}}\vspace{1cm}\par
\LARGE\@author\end{onehalfspace}
}}
\end{center}
\vfill
\begin{center}\Large
\Month~\Year\\[1cm]
Working Paper \@wp
\end{center}\vspace*{2cm}}}
\def\pageone{{\sffamily\setstretch{1}%
\thispagestyle{empty}%
\vbox to \textheight{%
\raggedright\baselineskip=1.2cm
{\fontsize{24.88}{30}\sffamily\textbf{\expandafter{\@title}}}
\vspace{2cm}\par
\hspace{1cm}\parbox{14cm}{\sffamily\large\@addresses}\vspace{1cm}\vfill
\hspace{1cm}{\large\Date~\Month~\Year}\\[1cm]
\hspace{1cm}\showjel\vss}}}
\def\blindtitle{{\sffamily
\thispagestyle{plain}\raggedright\baselineskip=1.2cm
{\fontsize{24.88}{30}\sffamily\textbf{\expandafter{\@title}}}\vspace{1cm}\par
}}
\def\titlepage{{\cover\newpage\pageone\newpage\blindtitle}}
\def\blind{\def\titlepage{{\blindtitle}}\let\maketitle\blindtitle}
\def\titlepageonly{\def\titlepage{{\pageone\end{document}}}}
\def\nocover{\def\titlepage{{\pageone\newpage\blindtitle}}\let\maketitle\titlepage}
\let\maketitle\titlepage
\makeatother
%% SPACING
\RequirePackage{setspace}
\spacing{1.5}
%% LINE AND PAGE BREAKING
\sloppy
\clubpenalty = 10000
\widowpenalty = 10000
\brokenpenalty = 10000
\RequirePackage{microtype}
%% PARAGRAPH BREAKS
\setlength{\parskip}{1.4ex}
\setlength{\parindent}{0em}
%% HYPERLINKS
\RequirePackage{xcolor} % Needed for links
\definecolor{darkblue}{rgb}{0,0,.6}
\RequirePackage{url}
\makeatletter
\@ifpackageloaded{hyperref}{}{\RequirePackage{hyperref}}
\makeatother
\hypersetup{
citecolor=0 0 0,
breaklinks=true,
bookmarksopen=true,
bookmarksnumbered=true,
linkcolor=darkblue,
urlcolor=blue,
citecolor=darkblue,
colorlinks=true}
%% KEYWORDS
\newenvironment{keywords}{\par\vspace{0.5cm}\noindent{\sffamily\textbf{Keywords:}}}{\vspace{0.25cm}\par\hrule\vspace{0.5cm}\par}
%% ABSTRACT
\renewenvironment{abstract}{\begin{minipage}{\textwidth}\parskip=1.4ex\noindent
\hrule\vspace{0.1cm}\par{\sffamily\textbf{\abstractname}}\newline}
{\end{minipage}}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[showonlyrefs]{mathtools}
\usepackage[no-weekday]{eukdate}
%% BIBLIOGRAPHY
\makeatletter
\@ifpackageloaded{biblatex}{}{\usepackage[style=authoryear-comp, backend=biber, natbib=true]{biblatex}}
\makeatother
\ExecuteBibliographyOptions{bibencoding=utf8,minnames=1,maxnames=3, maxbibnames=99,dashed=false,terseinits=true,giveninits=true,uniquename=false,uniquelist=false,doi=false, isbn=false,url=true,sortcites=false}
\DeclareFieldFormat{url}{\texttt{\url{#1}}}
\DeclareFieldFormat[article]{pages}{#1}
\DeclareFieldFormat[inproceedings]{pages}{\lowercase{pp.}#1}
\DeclareFieldFormat[incollection]{pages}{\lowercase{pp.}#1}
\DeclareFieldFormat[article]{volume}{\mkbibbold{#1}}
\DeclareFieldFormat[article]{number}{\mkbibparens{#1}}
\DeclareFieldFormat[article]{title}{\MakeCapital{#1}}
\DeclareFieldFormat[inproceedings]{title}{#1}
\DeclareFieldFormat{shorthandwidth}{#1}
% No dot before number of articles
\usepackage{xpatch}
\xpatchbibmacro{volume+number+eid}{\setunit*{\adddot}}{}{}{}
% Remove In: for an article.
\renewbibmacro{in:}{%
\ifentrytype{article}{}{%
\printtext{\bibstring{in}\intitlepunct}}}
\makeatletter
\DeclareDelimFormat[cbx@textcite]{nameyeardelim}{\addspace}
\makeatother
\renewcommand*{\finalnamedelim}{%
%\ifnumgreater{\value{liststop}}{2}{\finalandcomma}{}% there really should be no funny Oxford comma business here
\addspace\&\space}
\wp{29/19}
\jel{C10,C14,C22}
\RequirePackage[absolute,overlay]{textpos}
\setlength{\TPHorizModule}{1cm}
\setlength{\TPVertModule}{1cm}
\def\placefig#1#2#3#4{\begin{textblock}{.1}(#1,#2)\rlap{\includegraphics[#3]{#4}}\end{textblock}}
\author{Mahsa~Ashouri, Rob J~Hyndman, Galit~Shmueli}
\addresses{\textbf{Mahsa Ashouri}\newline
Institute of Service Science, National Tsing Hua University, Taiwan
\newline{Email: \href{mailto:[email protected]}{\nolinkurl{[email protected]}}}\newline Corresponding author\\[1cm]
\textbf{Rob J Hyndman}\newline
Monash University, Clayton VIC 3800, Australia
\newline{Email: \href{mailto:[email protected]}{\nolinkurl{[email protected]}}}\\[1cm]
\textbf{Galit Shmueli}\newline
Institute of Service Science, National Tsing Hua University, Taiwan
\newline{Email: \href{mailto:[email protected]}{\nolinkurl{[email protected]}}}\\[1cm]
}
\date{\sf\Date~\Month~\Year}
\makeatletter
\lfoot{\sf Ashouri, Hyndman, Shmueli: \@date}
\makeatother
%% Any special functions or other packages can be loaded here.
\usepackage{booktabs}
\usepackage{float}
\usepackage{longtable}
\usepackage{cases}
\usepackage{array}
\usepackage{todonotes}
%\usepackage[ruled,vlined,linesnumbered]{algorithm2e}
%\usepackage{algpseudocode}
\usepackage{algorithm,algorithmic,amsmath}
%\usepackage[backend=biber]{biblatex}
%\usepackage[backend=biber, bibencoding=utf8, style=authoryear, citestyle=authoryear]{biblatex}
\mathtoolsset{showonlyrefs=true}
\allowdisplaybreaks
\def\addlinespace{}
\usepackage[section]{placeins}
\usepackage{float}
\let\origfigure\figure
\let\endorigfigure\endfigure
\renewenvironment{figure}[1][2] {
\expandafter\origfigure\expandafter[!htbp]
} {
\endorigfigure
}
\let\origtable\table
\let\endorigtable\endtable
\renewenvironment{table}[1][2] {
\expandafter\origtable\expandafter[!htbp]
} {
\endorigtable
}
\usepackage{booktabs}
\usepackage{longtable}
\usepackage{array}
\usepackage{multirow}
\usepackage{wrapfig}
\usepackage{float}
\usepackage{colortbl}
\usepackage{pdflscape}
\usepackage{tabu}
\usepackage{threeparttable}
\usepackage{threeparttablex}
\usepackage[normalem]{ulem}
\usepackage{makecell}
\usepackage{xcolor}
\begin{document}
\maketitle
\begin{abstract}
Forecasting hierarchical or grouped time series using a reconciliation approach involves two steps: computing base forecasts and reconciling the forecasts. Base forecasts can be computed by popular time series forecasting methods such as Exponential Smoothing (ETS) and Autoregressive Integrated Moving Average (ARIMA) models. The reconciliation step is a linear process that adjusts the base forecasts to ensure they are coherent. However using ETS or ARIMA for base forecasts can be computationally challenging when there are a large number of series to forecast, as each model must be numerically optimized for each series. We propose a linear model that avoids this computational problem and handles the forecasting and reconciliation in a single step. The proposed method is very flexible in incorporating external data, handling missing values and model selection. We illustrate our approach using a dataset on monthly Australian domestic tourism, as well as a simulated dataset. We compare our approach to reconciliation using ETS and ARIMA, and show that our approach is much faster while providing similar levels of forecast accuracy.
\end{abstract}
\begin{keywords}
hierarchical forecasting, grouped forecasting, reconciling forecast, linear regression
\end{keywords}
\hypertarget{introduction}{%
\section{Introduction}\label{introduction}}
Modern data collection tools have dramatically increased the amount of available time series data \autocite{januschowski2013forecasting}. For example, the internet of things and point-of-sale scanning produce huge volumes of time series in a short period of time. Naturally, there is an interest in forecasting these time series, yet forecasting large collections of time series is computationally challenging.
\hypertarget{hierarchical-and-grouped-time-series}{%
\subsection{Hierarchical and grouped time series}\label{hierarchical-and-grouped-time-series}}
In many cases, these time series can be structured and disaggregated based on hierarchies or groups such as geographic location, product type, gender, etc. An example of hierarchical time series is the monthly number of Australian domestic tourists, which can be disaggregated into different states and then into different zones. Figure \ref{fig:hierarchicalexample} shows a schematic of such a hierarchical time series structure with three levels. The top level is the total series, formed by aggregating all the bottom level series. In the middle level, series are aggregations of their own child series; for instance, series A is the aggregation of AW and AX. Finally, the bottom level series, includes the most disaggregated series. In our example, A might represent the Northern Territory (NT) state, which can be disaggregated into northern coast NT and central NT.
\begin{figure}
{\centering \includegraphics[width=200px,height=170px,trim=0 0 190 0,clip=true]{Paper-Figures/hierarchical_example}
}
\caption{An example of a two level hierarchical structure.}\label{fig:hierarchicalexample}
\end{figure}
Grouped time series involve more complicated aggregation structures compared to strictly hierarchical time series. To continue our Australian tourism example: suppose we have two grouping factors for each tourist that are not nested: purpose of travel (Business/Holiday) and sex (Male/Female). The disaggregated series for each combination of purpose of travel and sex can be combined to form purpose of travel subtotals, or sex subtotals. These subtotals can be combined to give the overall total. Both subtotals are of interest.
We can think of such structures as hierarchical time series without a unique hierarchy. A schematic of this grouped time series structure is shown in Figure \ref{fig:groupexample} with two grouping factors, each of two levels (A/B and C/D). The series in this structure can be split first into groups A and B and then subdivided further into C and D (left side), or split first into C and D and then subdivided into A and B (right side). The final disaggregation is identical in both cases, but the middle level aggregates are different.
\begin{figure}
{\centering \includegraphics[width=330px,height=180px]{lhf_files/figure-latex/groupexample-1}
}
\caption{An example of a two level grouped structure.}\label{fig:groupexample}
\end{figure}
We use the same notation \autocite[following][]{fpp2} for both hierarchical and grouped time series. We denote the total series at time \(t\) by \(y_t\), and the series at node \(Z\) (subaggregation level \(Z\)) and time \(t\) by \(y_{Z,t}\). For describing the relationships between series, we use an \(N\times M\) matrix, called the ``summing matrix'', denoted by \(\bm{S}\), in which \(N\) is the overall number of nodes and \(M\) is the number of bottom level nodes. For example in Figure \ref{fig:hierarchicalexample}, \(N = 7\) and \(M = 4\), while in Figure \ref{fig:groupexample}, \(N=9\) and \(M=4\). Then we can write \(\bm{y}_t=\bm{S}\bm{b}_t\), where \(\bm{y}_t\) is a vector of all the level nodes at time \(t\) and \(\bm{b}_t\) is the vector of all the bottom level nodes at time \(t\). For the example shown in Figure \ref{fig:groupexample}, the equation can be written as follows:
\begin{equation}\label{eq:Smatrixexample}
\begin{pmatrix}
y_{t}\\y_{A,t}\\y_{B,t}\\y_{C,t}\\y_{D,t}\\y_{AC,t}\\y_{AD,t}\\y_{BC,t}\\y_{BD,t}
\end{pmatrix} =
\begin{pmatrix}
1&1&1&1\\1&1&0&0\\0&0&1&1\\1&0&1&0\\0&1&0&1\\1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\\
\end{pmatrix}
\begin{pmatrix}
y_{AC,t}\\y_{AD,t}\\y_{BC,t}\\y_{BD,t}\\
\end{pmatrix}.
\end{equation}
\hypertarget{forecasting-hierarchical-time-series}{%
\subsection{Forecasting hierarchical time series}\label{forecasting-hierarchical-time-series}}
If we just forecast each series individually, we are ignoring the hierarchical or grouping structure, and the forecasts will not be ``coherent''. That is, forecasts for lower level series will not necessarily add up to forecasts for higher level series. This means forecasts will not add up in a way that is consistent with the aggregation structure of the time series collection \autocite{fpp2}.
There are several available methods that consider the hierarchical structure information when forecasting time series. These include the top-down \autocite{gross1990disaggregation,fliedner2001hierarchical}, bottom-up \autocite{kahn1998revisiting}, middle-out and optimal combination \autocite{hyndman2011optimal} approaches. In the top-down approach, we first forecast the total series and then disaggregate the forecast to form lower level series forecasts based on a set of historical or forecasted proportions \autocite[for details see][]{athanasopoulos2009hierarchical}. In the bottom-up approach, the forecasts in each level of the hierarchy can be computed by aggregating the bottom level series forecasts. However, we may not get good upper-level forecasts because the most disaggregated series can be noisy and so their forecasts are often inaccurate. In the middle-out approach, the process can be started from one of the middle levels and other forecasts can be computed using aggregation for upper levels and disaggregation for lower levels. Finally, optimal combination uses all the \(N\) forecasts for all of the series in the entire structure, and then uses an optimization process to reconcile the resulting forecasts. The advantage of the optimal combination method, compared with the other methods, is that it considers all information in the hierarchy, including any correlations among the series.
In the optimal combination method, reconciled forecasts can be computed using the following generalized least squares equation \autocite{mint2018}
\begin{equation}\label{eq:mint}
\tilde{\bm{y}}_{t+h}=\bm{S}(\bm{S}'\bm{W}_{h}^{-1}\bm{S})^{-1}\bm{S}'\bm{W}_{h}^{-1}\hat{\bm{y}}_{t+h},
\end{equation}
where \(\hat{\bm{y}}_{t+h}\) represents a vector of \(h\)-step-ahead base forecasts for all levels of the hierarchy, and \(\bm{W}_{h}\) is the covariance matrix of forecast errors for the \(h\)-step-ahead base forecasts.
Several possible methods for estimating \(\bm{W}_{h}\) are available. \textcite{mint2018} discuss a simple ``structural scaling'' approximation whereby \(\bm{W}_{h} = k_{h} \bm{\Lambda}\) with \(k_{h}\) being a positive constant, \(\bm{\Lambda} = \text{diag}(\bm{S}\bm{1})\), and \(\bm{1}\) being a unit vector of dimension \(M\) (the number of bottom level series). Note that \(\bm{\Lambda}\) simply contains the row sums of the summing matrix \(\bm{S}\), and that \(k_{h}\) will cancel out in \eqref{eq:mint}.\footnote{For simplicity of exposition, we used the structural scaling (wls\_struct) summing matrix for reconciliation in all the results. In \protect\hyperlink{appendixA}{Appendix A}, we compare two alternatives: Tables \ref{tab:Tourismdatadifrecrolling} and \ref{tab:Tourismdatadifrecfix} display the results for estimates of \(\bm{W}_h\) using a shrinkage estimator (\texttt{mint\_shrink}) and variance scaling (\texttt{wls\_var}) for the Australian tourism example.} Thus
\begin{equation}\label{eq:mint2}
\tilde{\bm{y}}_{t+h}=\bm{S}(\bm{S}'\bm{\Lambda}^{-1}\bm{S})^{-1}\bm{S}'\bm{\Lambda}^{-1}\hat{\bm{y}}_{t+h}.
\end{equation}
The most computationally challenging part of the optimal combination method is to produce all the base forecasts that make up \(\hat{\bm{y}}_{t+h}\). In many applications, there may be thousands or even millions of individual series, and each of them must be forecast independently. The most popular time series forecasting methods such as ETS and ARIMA models \autocite{fpp2} involve non-linear optimization routines to estimate the parameters via maximum likelihood estimation. Usually, multiple models are fitted for each series, and the best is selected by minimizing Akaike's Information Citerion \autocite{akaike1998information}. This computational challenges increases with the number of lower level series as well as in the number of aggregations of interest.
We therefore propose a new approach to compute the base forecasts that is both computationally fast while maintaining an acceptable forecasting accuracy level. Our proposed approach avoids the computational challenge of using ETS or ARIMA that require numerical optimization for each series. It is very flexible in terms of incorporating external data, handling missing values, and model selection. And finally, our approach handles the forecasting and reconciliation in a single step.
\hypertarget{proposed-approach-linear-model}{%
\section{\texorpdfstring{Proposed approach: Linear model \label{sec:proposedapproach1}}{Proposed approach: Linear model }}\label{proposed-approach-linear-model}}
Our proposed approach is based on using linear regression models for computing base forecasts. Suppose we have a linear model that we use for forecasting, and we wish to apply it to \(N\) different series which have some aggregation constraints. We have observations \(y_{t,i}\) from times \(t=1,\dots,T\) and series \(i=1,\dots,N\). Then
\begin{equation}
\label{eq:basicequation}
y_{t,i} = \bm{\beta}_{i}' \bm{x}_{t,i} + \varepsilon_{t,i}
\end{equation}
where \(\bm{x}_{t,i}= (1, x_{t,1,i},\dots,x_{t,p,i})\) is a \((p+1)\)-vector of regression variables, \({\bm{\beta}}_i = (\beta_{0,i}, \beta_{1,i}, \beta_{2,i}, \dots, \beta_{p,i})\) is a \((p+1)\)-vector of coefficients and \({\varepsilon}_{t,i}\) is the error\footnote{If different predictors are chosen for different series, we can still choose a constant \(p\) that includes all the predictors, and then set the relevant coefficients in the \(X\) matrix to \(0\).}. This equation for all the observations in matrix form can be written as follows:
\begin{align}\label{eq:linearmodel}
\begin{pmatrix}
\bm{y}_1\\
\bm{y}_2\\
\bm{y}_3 \\
\vdots\\
\bm{y}_N
\end{pmatrix}&=
\begin{pmatrix}
\bm{X}_1 & 0 & 0 & \dots & 0\\
0 & \bm{X}_2 & 0 & \dots & 0\\
0 & 0 & \bm{X}_3 & \ddots & \vdots \\
\vdots & \vdots & \ddots & \ddots & 0\\
0 & 0 & \dots & 0 & \bm{X}_N
\end{pmatrix}
\begin{pmatrix}
\bm{\beta}_1\\
\bm{\beta}_2\\
\bm{\beta}_3\\
\vdots\\
\bm{\beta}_N
\end{pmatrix}+
\begin{pmatrix}
\bm{\varepsilon}_1\\
\bm{\varepsilon}_2\\
\bm{\varepsilon}_3\\
\vdots \\
\bm{\varepsilon}_N
\end{pmatrix}
\\[2ex]
\bm{Y} &= \bm{BX} + \bm{E},
\end{align}
where \(\bm{y}_i = (y_{1,i}, y_{2,i}, \dots, y_{T,i})\) is a \(T\)-vector, \({\bm{\beta}}_i = (\beta_{0,i}, \beta_{1,i}, \beta_{2,i}, \dots, \beta_{p,i})\) is a \((p+1)\)-vector, \({\bm{\varepsilon}}_i = (\varepsilon_{1,i}, \varepsilon_{2,i}, \dots, \varepsilon_{T,i})\) is a \(T\)-vector and \(\bm{X}_i\) is the \(T\times (p+1)\)-matrix
\begin{equation}\label{eq:Xmatrixdefinition}
\bm{X}_i = \begin{pmatrix}
1 & x_{1,i,1} & x_{1,i,2} & \dots & x_{1,i,p}\\
1 & x_{2,i,1} & x_{2,i,2} & \dots & x_{2,i,p}\\
\vdots & \vdots & \vdots & & \vdots \\
1 & x_{T,i,1} & x_{T,i,2} & \dots & x_{T,i,p}
\end{pmatrix}.
\end{equation}
Equation \eqref{eq:linearmodel} can be written as \(\bm{Y} = \bm{X} \bm{B} + \bm{E}\), with parameter estimates given by \(\hat{\bm{B}} = (\bm{X}'\bm{X})^{-1} \bm{X}'\bm{Y}\). Then the base forecasts are obtained using
\begin{equation}\label{eq:baseforecasts}
\hat{\bm{y}}_{t+h} = \bm{X}_{t+h}^* \hat{\bm{B}},
\end{equation}
where \(\hat{\bm{y}}_{t+h}\) is an \(N\)-vector of forecasts, \(\hat{\bm{B}}\) comprises \(N\) stacked \((p+1)\)-vectors of estimated coefficients, and \(\bm{X}_{t+h}^*\) is the \(N\times N(p+1)\) matrix
\pagebreak[3]\begin{equation}
\bm{X}_{t+h}^* =
\begin{pmatrix}
\bm{x}_{t+h,1}' & 0 & 0 & \dots & 0\\
0 & \bm{x}_{t+h,2}' & 0 & \dots & 0\\
0 & 0 & \bm{x}_{t+h,3}' & \ddots & \vdots \\
\vdots & \vdots & \ddots & \ddots & 0\\
0 & 0 & \dots & 0 & \bm{x}_{t+h,N}'
\end{pmatrix}.
\end{equation}
Note that we use \(\bm{X}^*_{t}\) to distinguish this matrix, which combines \(\bm{x}_{t,i}\) across all series for one time from \(\bm{X}_i\) which combines \(\bm{x}_{t,i}\) across all time for one series.
Finally, we can combine the two linear equations for computing base forecasts and reconciled forecasts (Equations \eqref{eq:mint2} and \eqref{eq:baseforecasts}) to obtain the reconciled forecasts with a single equation:
\begin{equation}\label{eq:singlestep}
\tilde{\bm{y}}_{t+h} = \bm{S}(\bm{S}'\bm{\Lambda}\bm{S})^{-1}\bm{S}'\bm{\Lambda}
(\bm{X}_{t+h}^* \hat{\bm{B}})
= \bm{S}(\bm{S}'\bm{\Lambda}\bm{S})^{-1}\bm{S}'\bm{\Lambda}
\bm{X}_{t+h}^* (\bm{X}'\bm{X})^{-1} \bm{X}'\bm{Y}.
\end{equation}
\hypertarget{simplified-formulation-for-a-fixed-set-of-predictors-bf-x}{%
\subsection{\texorpdfstring{Simplified formulation for a fixed set of predictors (\(\bf {X}\)) \label{sec:proposedapproach2}}{Simplified formulation for a fixed set of predictors (\textbackslash bf \{X\}) }}\label{simplified-formulation-for-a-fixed-set-of-predictors-bf-x}}
If we have the same set of predictor variables, \(\bm{X}\), for all the series, we can write Equations \eqref{eq:linearmodel} to \eqref{eq:singlestep} more easily using multivariate regression equations, and we can obtain all the reconciled forecasts for all the series in one equation. In that case, Equation \eqref{eq:linearmodel} can be rearranged as follows:
\begin{equation}\label{eq:linearmodelsameX}
\begin{pmatrix}
y_{11} & \dots & y_{1N}\\
y_{21} & \dots & y_{2N}\\
\vdots & & \vdots\\
y_{T1} & \dots & y_{TN}
\end{pmatrix} =
\begin{pmatrix}
1 & X_{11} & \dots & X_{1p}\\
1 & X_{21} & \dots & X_{2p}\\
\vdots & \vdots & & \vdots\\
1 & X_{T1} & \dots & X_{Tp}
\end{pmatrix}
\begin{pmatrix}
\beta_{01} & \dots & \beta_{0N}\\
\beta_{11} & \dots & \beta_{1N}\\
\vdots & & \vdots\\
\beta_{p1} & \dots & \beta_{pN}
\end{pmatrix} \\
+
\begin{pmatrix}
\varepsilon_{11} & \dots & \varepsilon_{1N}\\
\varepsilon_{21} & \dots & \varepsilon_{2N}\\
\vdots & & \vdots\\
\varepsilon_{T1} & \dots & \varepsilon_{TN}
\end{pmatrix},
\end{equation}
where \(\bm{Y}\), \(\bm{X}\), \(\bm{B}\) and \(\bm{E}\) are now matrices of size \(T\times N\), \(T\times (p+1)\), \((p+1)\times N\) and \(T \times N\), respectively. Equations \eqref{eq:baseforecasts} to \eqref{eq:singlestep} can be written accordingly using Equation \eqref{eq:linearmodelsameX} and here \(\bm{X}^*_{t+h,i} = \bm{X}^*_{t+h}\), where \(\bm{X}^*_{t+h}\) is an \(h\times (p+1)\) matrix. This simpler formulation translates into a computational advantage, as the \(\bm X\) matrix is smaller, thereby easing matrix multiplication operations.
\hypertarget{ols-predictors}{%
\subsection{OLS predictors}\label{ols-predictors}}
As an example of the \(\bm{X}_t\) matrix in Equation \eqref{eq:linearmodel}, we can refer to the set of predictors proposed in \textcite{ashouri2018} for modeling trend, seasonality and autocorrelation by using lagged values (\(y_{t-1}\), \(y_{t-2}\), \dots), trend variables and seasonal dummy variables:
\begin{equation}\label{eq:linearmodelexample}
y_t = \alpha_0 + \alpha_1 t + \beta_1 s_{1,t} + \cdots + \beta_{m-1} s_{m-1,t} + \gamma_1 y_{t-1} + \cdots + \gamma_p y_{t-p} + \delta z_t + \varepsilon_t.
\end{equation}
Here, \(s_{j,t}\) is a dummy variable taking value 1 if time \(t\) is in season \(j\) (\(j=1, 2, \dots, m\)), \(y_{t-k}\) is the \(k\)th lagged value for \(y_t\) and \(z_t\) is some external information at time \(t\). The seasonal period \(m\) depends on the problem; for instance, if we have daily data with day-of-week seasonality, then \(m=7\).
When a single OLS model is fitted to a collection of time series (e.g.~several bottom-level series), then trend and seasonality predictors are the same for all series and we can use the simpler multivariate regression models in Equation \eqref{eq:linearmodelsameX}. However, this formulation is inappropriate when including lags, which differ for each series, and/or series-specific external series, in which case we use the formulation in Equation \eqref{eq:linearmodel}.
When there are many options for choosing predictors, such as many seasonal dummy variables, lags, or high order trend terms, we can consider applying a model selection approach such as Akaike's Information Criterion or leave-one-out cross-validation (LOOCV) to select the best set of predictors in terms of prediction. In practice, LOOCV can be computationally heavy except in the special case of linear models \autocite{christensenplane} and therefore using linear models provide a viable solution. Also, when the number of seasons \(m\) is large (e.g.~in hourly data), Fourier terms can result in fewer predictors than dummy variables. The number of Fourier terms can also be determined using the same AIC or LOOCV approach \autocite{fpp2}.
\hypertarget{computational-considerations}{%
\subsection{\texorpdfstring{Computational considerations \label{sec:computationalconsiderations}}{Computational considerations }}\label{computational-considerations}}
There are two ways for computing the above forecasts. First, we could create the matrices \(\bm{Y}\), \(\bm{X}\) and \(\bm{E}\), and then directly use the above equations (taking advantage of sparse matrix routines) to obtain the forecasts. Alternatively, we could use separate regression models to compute the coefficients for each linear model individually. Although the matrix, \(\bm{X}'\bm{X}\), which we need to invert is sparse and block diagonal, it is still faster to use the second approach involving separate regression models.
\hypertarget{prediction-intervals}{%
\subsection{Prediction intervals}\label{prediction-intervals}}
For obtaining prediction intervals, we need to compute the variance of reconciled forecasts as follows \autocite{mint2018}:
\begin{equation}\label{eq:variance}
\text{Var}(\tilde{\bm{y}}_{t+h})
= \bm{S}\bm{G}{\bm{\Sigma}_{t+h}} \bm{G}'\bm{S}',
\end{equation}
where \(\bm{G} = (\bm{S}'\bm{\Lambda}\bm{S})^{-1}\bm{S}'\bm{\Lambda}\) and \({\bm{\Sigma}_{t+h}}\) denotes the variance of the base forecasts given by the usual linear model formula \autocite{fpp2}
\begin{equation}\label{eq:recvariance}
\bm{\Sigma}_{t+h} = \sigma^2\left[1 + \bm{X}_{t+h}^*(\bm{X}'\bm{X})^{-1}(\bm{X}_{t+h}^*)'\right].
\end{equation}
where \(\sigma^2\) is the variance of the base model residuals. Assuming normally distributed errors, we can easily obtain any required prediction intervals corresponding to elements of \(\tilde{\bm{y}}_{t+h}\) using the diagonals of \eqref{eq:variance}.
\hypertarget{applications}{%
\section{Applications}\label{applications}}
In this section we illustrate our approach using a real dataset and a simulated dataset\footnote{All methods were run on a Linux server with Intel Xeon Silver 4108 (1.80GHz / 8-Cores / 11MB Cache)*2 and 8GB DDR4 2666 DIMM ECC Registered Memory. R version 3.6.1. All the displayed computation times are only for the reconciled point forecasts.}. The real data study includes forecasting monthly Australian domestic tourism. This dataset contains 304 series with both hierarchical and grouped structure with strong seasonality. In the simulation studies, we simulate series based on the monthly Australian domestic tourism data and systematically modify the forecasting horizon, noise level, hierarchy levels, and number of series. In \protect\hyperlink{appendixB}{Appendix B} , we use another real dataset involving daily Wikipedia pageviews. In all these cases, we compare the forecasting accuracy of ETS, ARIMA\footnote{For running ETS and ARIMA, we applied \texttt{ETS} and \texttt{ARIMA} functions from the \texttt{fable} package \autocite{o2019fable}. The two sets of functions were run independently and not immediately one after the other.} and the proposed linear OLS forecasting model, with and without the reconciliation step. In these applications, we used the weighted reconciliation approach from Equation \eqref{eq:mint2}. For comparing these methods, we use the average of Root Mean Square Errors (RMSEs) across all series and also display box plots for forecast errors along with the raw forecast errors. To aid visibility, we suppress plotting the outliers.
We apply two methods for generating forecasts that align with two different practical forecasting scenarios. The first approach is \emph{rolling origin} forecasting, where we generate one-step-ahead forecasts (\(\tilde{\bm{y}}_{t+1}\) where \(t\) changes). This mimics the scenario where data are refreshed every time period. In the second \emph{fixed origin} method, forecasts are generated at a fixed time \(t\) for \(h\) steps ahead: \(\tilde{\bm{y}}_{t+1}, \tilde{\bm{y}}_{t+2},\dots, \tilde{\bm{y}}_{t+h}\) (we replace lagged values of \(y\) by their forecasts if they occur at periods after the forecast origin; See algorithms \ref{alg:algorithmrolling} and \ref{alg:algorithmfixed} for more details of the rolling and fixed origin approaches).
\begin{algorithm}[h]
\begin{algorithmic}[1]
\caption{Hierarchical and grouped time series rolling origin OLS forecast reconciliation}\label{alg:algorithmrolling}
\STATE $data \gets matrix(y_{1:T, 1:N}, x_{1:T, 1:N, 1:p})$
\FOR{$i \in \{1,\dots,N\}$}
\FOR{$k \in \{1,\dots,h\}$}
\STATE $data \gets data_{(1:(T-h) +(k-1)),i}$
\STATE $newdata \gets data_{((T-h)+k),i}$
\STATE $fit \gets lm(y_{t,i} \sim x_{t,i,1} + x_{t,i,2} + \dots + x_{t,i,p}, data = data)$
\STATE $\hat{y}_{t+k,i} \gets predict(fit, newdata = newdata)$
\ENDFOR
\ENDFOR
\STATE ${\bm{S}} \gets summing~matrix(groups_{data})$
\STATE $\Lambda \gets diag(\bm{S1})$
\STATE $adjust \gets \bm{S}(\bm{S'}\Lambda\bm{S})^{-1}\bm{S}\Lambda$
\FOR{$i \in {1, \dots,h}$}
\STATE $\tilde{y}_{(T-h)+i,1:N} \gets adjust \times \hat{y}_{(T-h)+i, 1:N}$
\ENDFOR
\STATE \bf{return} $\tilde{y}_{((T-h):T,1:N}$
\end{algorithmic}
\end{algorithm}
\begin{algorithm}[h]
\begin{algorithmic}[1]
\caption{Hierarchical and grouped time series fixed origin OLS forecast reconciliation}\label{alg:algorithmfixed}
\STATE $data \gets matrix(y_{1:T, 1:N}, x_{1:T, 1:N, 1:p})$
\FOR{$i \in \{1,\dots,N\}$}
\STATE $data \gets data_{(1:(T-h),i}$
\STATE $fit \gets lm(y_{t,i} \sim x_{t,i,1} + x_{t,i,2} + \dots + x_{t,i,p}, data = data)$
\STATE $newdata \gets data_{((T-h)+1),i}$ \FOR{$k \in \{1,\dots,h\}$}
\STATE $\hat{y}_{t+k,i} \gets predict(fit, newdata = newdata)$
\STATE $newdata \gets \hat {data}_{((T-h)+k),i}$
\ENDFOR
\ENDFOR
\STATE ${\bm{S}} \gets summing~matrix(groups_{data})$
\STATE $\Lambda \gets diag(\bm{S1})$
\STATE $adjust \gets \bm{S}(\bm{S'}\Lambda\bm{S})^{-1}\bm{S}\Lambda$
\FOR{$i \in {1, \dots,h}$}
\STATE $\tilde{y}_{(T-h)+i,1:N} \gets adjust \times \hat{y}_{(T-h)+i, 1:N}$
\ENDFOR
\STATE \bf{return} $\tilde{y}_{((T-h):T,1:N}$
\end{algorithmic}
\end{algorithm}
\hypertarget{australian-domestic-tourism}{%
\subsection{Australian domestic tourism}\label{australian-domestic-tourism}}
This dataset has 19 years of monthly visitor nights in Australia by Australian tourists, a measure used as an indicator of tourism activity \autocite{mint2018}. The data were collected by computer-assisted telephone interviews with 120,000 Australians aged 15 and over \autocite{researchAustralia2005}. The dataset includes 304 time series each of length 228 observations. The hierarchy and grouping structure for this dataset is made using geographic and purpose of travel information.
\begingroup\fontsize{9}{11}\selectfont
\begin{longtable}[t]{rllrll}
\caption{\label{tab:Australiageographicaldivision}Australian geographic hierarchical structure.}\\
\toprule
Series & Name & Label & Series & Name & Label\\
\midrule
Total & & & Region & & \\
1 & Australia & Total & 55 & Lakes & BCA\\
State & & & 56 & Gippsland & BCB\\
2 & NSW & A & 57 & Phillip Island & BCC\\
3 & VIC & B & 58 & General Murray & BDA\\
4 & QLD & C & 59 & Goulburn & BDB\\
5 & SA & D & 60 & High Country & BDC\\
6 & WA & E & 61 & Melbourne East & BDD\\
7 & TAS & F & 62 & Upper Yarra & BDE\\
8 & NT & G & 63 & Murray East & BDF\\
Zone & & & 64 & Wimmera+Mallee & BEA\\
9 & Metro NSW & AA & 65 & Western Grampians & BEB\\
10 & Nth Coast NSW & AB & 66 & Bendigo Loddon & BEC\\
11 & Sth Coast NSW & AC & 67 & Macedon & BED\\
12 & Sth NSW & AD & 68 & Spa Country & BEE\\
13 & Nth NSW & AE & 69 & Ballarat & BEF\\
14 & ACT & AF & 70 & Central Highlands & BEG\\
15 & Metro VIC & BA & 71 & Gold Coast & CAA\\
16 & West Coast VIC & BB & 72 & Brisbane & CAB\\
17 & East Coast VIC & BC & 73 & Sunshine Coast & CAC\\
18 & Nth East VIC & BD & 74 & Central Queensland & CBA\\
19 & Nth West VIC & BE & 75 & Bundaberg & CBB\\
20 & Metro QLD & CA & 76 & Fraser Coast & CBC\\
21 & Central Coast QLD & CB & 77 & Mackay & CBD\\
22 & Nth Coast QLD & CC & 78 & Whitsundays & CCA\\
23 & Inland QLD & CD & 79 & Northern & CCB\\
24 & Metro SA & DA & 80 & Tropical North Queensland & CCC\\
25 & Sth Coast SA & DB & 81 & Darling Downs & CDA\\
26 & Inland SA & DC & 82 & Outback & CDB\\
27 & West Coast SA & DD & 83 & Adelaide & DAA\\
28 & West Coast WA & EA & 84 & Barossa & DAB\\
29 & Nth WA & EB & 85 & Adelaide Hills & DAC\\
30 & Sth WA & EC & 86 & Limestone Coast & DBA\\
31 & Sth TAS & FA & 87 & Fleurieu Peninsula & DBB\\
32 & Nth East TAS & FB & 88 & Kangaroo Island & DBC\\
33 & Nth West TAS & FC & 89 & Murraylands & DCA\\
34 & Nth Coast NT & GA & 90 & Riverland & DCB\\
35 & Central NT & GB & 91 & Clare Valley & DCC\\
Region & & & 92 & Flinders Range and Outback & DCD\\
36 & Sydney & AAA & 93 & Eyre Peninsula & DDA\\
37 & Central Coast & AAB & 94 & Yorke Peninsula & DDB\\
38 & Hunter & ABA & 95 & Australia's Coral Coast & EAA\\
39 & North Coast NSW & ABB & 96 & Experience Perth & EAB\\
40 & Northern Rivers Tropical NSW & ABC & 97 & Australia's SouthWest & EAC\\
41 & South Coast & ACA & 98 & Australia's North West & EBA\\
42 & Snowy Mountains & ADA & 99 & Australia's Golden Outback & ECA\\
43 & Capital Country & ADB & 100 & Hobart and the South & FAA\\
44 & The Murray & ADC & 101 & East Coast & FBA\\
45 & Riverina & ADD & 102 & Launceston, Tamar and the North & FBB\\
46 & Central NSW & AEA & 103 & North West & FCA\\
47 & New England North West & AEB & 104 & Wilderness West & FCB\\
48 & Outback NSW & AEC & 105 & Darwin & GAA\\
49 & Blue Mountains & AED & 106 & Kakadu Arnhem & GAB\\
50 & Canberra & AFA & 107 & Katherine Daly & GAC\\
51 & Melbourne & BAA & 108 & Barkly & GBA\\
52 & Peninsula & BAB & 109 & Lasseter & GBB\\
53 & Geelong & BAC & 110 & Alice Springs & GBC\\
54 & Western & BBA & 111 & MacDonnell & GBD\\
\bottomrule
\end{longtable}
\endgroup{}
\begin{figure}
{\centering \includegraphics[width=450px,height=150px]{Paper-Figures/Australian_hierarchy_structure}
}
\caption{Australian geographic hierarchical structure.}\label{fig:Australiahierarchystructure}
\end{figure}
\begin{figure}
{\centering \includegraphics[width=450px,height=360px]{Paper-Figures/ausTurRegionsBW}
}
\caption{Australia map, showing the eight states.}\label{fig:Australiahierarchystructuremap}
\end{figure}
In this dataset we have three levels of geographic divisions in Australia. In the first level, Australia is divided into seven ``States'' including New South Wales (NSW), Victoria (VIC), Queensland (QLD), South Australia (SA), Western Australia (WA), Tasmania (TAS) and Northern Territory (NT). In the second and third levels, it is divided into 27 ``Zones'' and 76 ``Regions'' (for details about Australia geographic divisions see Figure \ref{fig:Australiahierarchystructure} and Table \ref{tab:Australiageographicaldivision} and also Figure \ref{fig:Australiahierarchystructuremap} which shows Australia map divided by tourism region and colored by states\footnote{Reproduced from: \url{www.tra.gov.au/tra/2016/Tourism_Region_Profiles/Region_profiles/index.html}}).
We have four purposes of travel: Holiday (Hol), Visiting friends and relatives (Vis), Business (Bus) and Other (Oth). So there are \(76\times4 = 304\) series at the most disaggregate level. Based on the geographic hierarchy and purpose grouping, we end up with 8 aggregation levels with 555 series in total as shown in Table \ref{tab:Australiageographicalpurposedivision}.
\begin{table}[!h]
\caption{\label{tab:Australiageographicalpurposedivision}Number of Australian domestic tourism series at each aggregation level.}
\centering
\begin{tabular}[t]{lr}
\toprule
Division & Series\\
\midrule
Australia & 1\\
State & 7\\
Zone & 27\\
Region & 76\\
Purpose & 4\\
State x Purpose & 28\\
Zone x Purpose & 108\\
Region x Purpose & 304\\
\hline
Total & 555\\
\bottomrule
\end{tabular}
\end{table}
We report the forecast results for all these aggregation levels, as well as the average RMSE across all the levels of the hierarchy. We used the same predictors in the OLS predictor
matrix for the rolling and fixed origin approaches. For the rolling and fixed origin model, we include a quadratic trend, 11 dummy variables, and lags 1 and 12. This is intended to capture the monthly seasonality. In addition, before running the model, we partition the data into training and test sets, with the last 24 months (2 years) as our test set, and the rest as our training set.
\begin{table}[!h]
\caption{\label{tab:Tourismdataresulrolling}Mean(RMSE) on 24 months test set for ETS, ARIMA and OLS with and without reconciliation - Rolling origin.}
\centering
\begin{tabular}[t]{lrrrrrr}
\toprule
\multicolumn{1}{c}{} & \multicolumn{3}{c}{Unreconciled} & \multicolumn{3}{c}{Reconciled} \\
\cmidrule(l{3pt}r{3pt}){2-4} \cmidrule(l{3pt}r{3pt}){5-7}
Level & ETS & ARIMA & OLS & ETS & ARIMA & OLS\\
\midrule
Total & 1516 & 1504 & 1634 & 1733 & 1840 & 1864\\
State & 511 & 501 & 498 & 497 & 482 & 509\\
Zone & 215 & 217 & 213 & 211 & 210 & 213\\
Region & 123 & 125 & 117 & 118 & 120 & 117\\
Purpose & 676 & 674 & 682 & 673 & 713 & 713\\
State x Purpose & 213 & 217 & 213 & 208 & 209 & 213\\
Zone x Purpose & 98 & 103 & 98 & 96 & 99 & 97\\
Region x Purpose & 56 & 58 & 56 & 56 & 57 & 56\\
\bottomrule
\end{tabular}
\end{table}
\begin{table}
\caption{\label{tab:TourismdataresultRMSE}Mean(RMSE) on 24 months test set for ETS, ARIMA and OLS with and without reconciliation - Fixed origin.}
\centering
\begin{tabular}[t]{lrrrrrr}
\toprule
\multicolumn{1}{c}{} & \multicolumn{3}{c}{Unreconciled} & \multicolumn{3}{c}{Reconciled} \\
\cmidrule(l{3pt}r{3pt}){2-4} \cmidrule(l{3pt}r{3pt}){5-7}
Level & ETS & ARIMA & OLS & ETS & ARIMA & OLS\\
\midrule
Total & 2239 & 3433 & 2529 & 2492 & 2744 & 2819\\
State & 594 & 610 & 597 & 573 & 583 & 612\\
Zone & 240 & 230 & 243 & 237 & 234 & 243\\
Region & 133 & 132 & 127 & 127 & 127 & 126\\
Purpose & 767 & 829 & 876 & 822 & 889 & 921\\
State x Purpose & 227 & 233 & 237 & 222 & 226 & 236\\
Zone x Purpose & 103 & 106 & 105 & 102 & 102 & 104\\
Region x Purpose & 59 & 59 & 59 & 58 & 58 & 58\\
\bottomrule
\end{tabular}
\end{table}
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/boxplotrollingtourism-1}
}
\caption{Box plots of rolling origin forecast errors from reconciled and unreconciled ETS, ARIMA and OLS methods at each hierarchical level.}\label{fig:boxplotrollingtourism}
\end{figure}
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/boxplottourism-1}
}
\caption{Box plots of fixed origin forecast errors for reconciled and unreconciled ETS, ARIMA and OLS methods at each hierarchical level.}\label{fig:boxplottourism}
\end{figure}
In Figures \ref{fig:boxplotrollingtourism} and \ref{fig:boxplottourism} we display the error box plots for both reconciled and unreconciled forecasts using all three methods, for the rolling origin and fixed origin forecasts. In these figures we see the error distributions across all the models.
Together with Tables \ref{tab:Tourismdataresulrolling} and \ref{tab:TourismdataresultRMSE}, results show that our proposed OLS forecasting model produces forecast accuracy similar to ETS and ARIMA, which are computationally heavy for many time series (see Table \ref{tab:Tourismdatacomputationtime}). We also see the usefulness of the reconciliation in decreasing the average RMSE in all three methods. Except for the total series, reconciliation improves forecasts in all the hierarchy levels. Also, because the higher level series have higher counts, the errors are larger in magnitude (\protect\hyperlink{appendixA}{Appendix A} shows the box plots with scaled errors\footnote{Scaled errors are computed by subtracting the mean and dividing by the standard deviation.}, to better compare errors across all the hierarchy levels). In addition, we see that (as expected) by applying rolling origin 1-step-ahead forecasts, the error densities are closer and more tightly distributed around zero than the fixed origin multi-step-ahead forecasts.
Figures \ref{fig:forecstrolling24tourismtotal} and \ref{fig:forecstrolling24tourism} show the rolling and fixed origin forecast results for the total series and one of the bottom level series, AAAVis (Sydney - Business). In these plots we have both reconciled (dashed lines) and unreconciled (dotted lines) forecasts and we see that the reconciliation step improves the forecasts in this series. We also see that the OLS model forecast accuracy is similar to the other two methods.
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/forecstrolling24tourismtotal-1}
}
\caption{Comparing ETS, ARIMA and OLS forecasts (reconciled and unreconciled) for 'Total' series. (Top: rolling origin, Bottom: fixed origin).}\label{fig:forecstrolling24tourismtotal}
\end{figure}
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/forecstrolling24tourism-1}
}
\caption{Comparing ETS, ARIMA and OLS forecasts (reconciled and unreconciled) for 'AAAVis' bottom-level series. (Top: rolling origin, Bottom: fixed origin).}\label{fig:forecstrolling24tourism}
\end{figure}
Figures \ref{fig:predinttotal} and \ref{fig:predintAAAVis} display the prediction interval for the OLS approach, with and without reconciliation forecasts for the total series and one of the bottom level series, AAAVis (Sydney - Visiting).
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/predinttotal-1}
}
\caption{Comparing ETS, ARIMA and OLS reconciled forecasts and prediction intervals for 'Total' series. (Top: rolling origin, Bottom: fixed origin).}\label{fig:predinttotal}
\end{figure}
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/predintAAAVis-1}
}
\caption{Comparing ETS, ARIMA and OLS reconciled forecasts and prediction intervals for 'AAAVis' bottom-level series. (Top: rolling origin, Bottom: fixed origin).}\label{fig:predintAAAVis}
\end{figure}
Table \ref{tab:Tourismdatacomputationtime} compares the computation time of the three methods for rolling and fixed origin forecasting. We see that the OLS forecasting model is much faster compared to the other methods. Also, since reconciliation is a linear process, in all methods it is very fast and does not affect computation time significantly.
Since we are using a linear model, we can easily include exogenous variables which can often be helpful in improving forecast accuracy. In this application, we tried including an ``Easter'' dummy variable indicating the timing of Easter, but its affect on forecast accuracy was minimal, so it was omitted in the model reported here.
Finally, Table \ref{tab:Tourismdatacomputationtimeappendix} shows that, as mentioned in Section \ref{sec:computationalconsiderations}, computation is faster using separate regression models compared to the matrix approach (even using sparse matrix algebra).
\begin{table}
\caption{\label{tab:Tourismdatacomputationtime}Computation time (seconds) for ETS, ARIMA and OLS with reconciliation, for Rolling and fixed origin forecasts, on a 24 months test set.}
\centering
\begin{tabular}[t]{lrr}
\toprule
& Rolling origin & Fixed origin\\
\midrule
ETS & 14648 & 618\\
ARIMA & 30346 & 1085\\
OLS & 48 & 18\\
\bottomrule
\end{tabular}
\end{table}
\begin{table}
\caption{\label{tab:Tourismdatacomputationtimeappendix}Computation time (seconds) for OLS using the matrix approach and separate regression models, with reconciliation, for rolling and fixed origin, on a 12 months test set.}
\centering
\begin{tabular}[t]{lrr}
\toprule
& Rolling origin & Fixed origin\\
\midrule
Matrix approach & 210 & 106\\
Separate models & 48 & 18\\
\bottomrule
\end{tabular}
\end{table}
\hypertarget{australian-domestic-tourism-simulation-study}{%
\subsection{Australian domestic tourism simulation study}\label{australian-domestic-tourism-simulation-study}}
We provide results from two simulation studies based on the Australian domestic tourism dataset, to evaluate the sensitivity of our results to several factors. In the first study, we simulate bottom-level series similar to the real bottom-level series of the tourism data, with the same number of series and the same length. We then generate forecasts for four forecast horizons (12, 24, 36 and 48 months) with four different noise levels (standard deviation=0.01, 0.1, 0.5 and 1)\footnote{Since the level of the series are different, we first scale the simulated series (subtracting by mean and dividing by standard deviation), add the white noise series and then we rescale the series.}.
Tables \ref{tab:TourismdatasimrollingnoiseFH} and \ref{tab:TourismdatasimfixnoiseFH} display the average of the RMSEs for 12 to 48 month-ahead forecasts with different noise levels. Results are shown for the base and the reconciled forecasts for both rolling and fixed origin approaches. The results show that, as expected, by increasing the forecast horizon and/or noise level, the average RMSE increases in all the three methods. Also, the proposed OLS approach shows similar results compared with ETS and ARIMA. It should be noted that for both rolling and fixed origin forecasts in the OLS approach we use the same set of predictors as the Australian domestic tourism example.
\begin{table}[!h]
\caption{\label{tab:TourismdatasimrollingnoiseFH}Mean RMSE of rolling origin forecasts for simulated data (304 bottom-level series and 8 levels of hierarchy), for different error levels, methods (ETS, ARIMA, OLS), forecast horizons, with/without reconciliation.}
\centering
\begin{tabular}[t]{lrrrrl}
\toprule
Reconciliation & Error & Forecast horizon & ETS & ARIMA & OLS\\
\midrule
rec & 0.01 & 12 & 123.8 & 119.8 & 130.4\\
rec & 0.01 & 24 & 122.6 & 119.4 & 128.8\\
rec & 0.01 & 36 & 124.6 & 122.3 & 131.8\\
rec & 0.01 & 48 & 122.1 & 120.4 & 129.0\\
rec & 0.10 & 12 & 123.7 & 120.1 & 129.7\\
rec & 0.10 & 24 & 122.9 & 120.3 & 129.2\\
rec & 0.10 & 36 & 125.8 & 123.7 & 133.1\\
rec & 0.10 & 48 & 123.5 & 122.1 & 130.5\\
rec & 0.50 & 12 & 143.9 & 143.2 & 146.9\\
rec & 0.50 & 24 & 149.9 & 146.3 & 153.2\\
rec & 0.50 & 36 & 155.8 & 151.5 & 160.0\\
rec & 0.50 & 48 & 154.5 & 150.8 & 159.7\\
rec & 1.00 & 12 & 192.9 & 198.1 & 193.9\\
rec & 1.00 & 24 & 207.1 & 209.0 & 209.3\\
rec & 1.00 & 36 & 215.7 & 215.4 & 218.0\\
rec & 1.00 & 48 & 218.4 & 217.5 & 220.9\\
unrec & 0.01 & 12 & 132.2 & 125.8 & 132.9\\
unrec & 0.01 & 24 & 128.5 & 125.4 & 131.0\\
unrec & 0.01 & 36 & 130.2 & 128.5 & 134.0\\
unrec & 0.01 & 48 & 127.8 & 126.6 & 131.7\\
unrec & 0.10 & 12 & 132.0 & 126.2 & 132.2\\
unrec & 0.10 & 24 & 129.5 & 126.0 & 131.4\\
unrec & 0.10 & 36 & 131.9 & 129.9 & 135.1\\
unrec & 0.10 & 48 & 129.5 & 128.3 & 133.0\\
unrec & 0.50 & 12 & 149.5 & 149.7 & 148.2\\
unrec & 0.50 & 24 & 156.5 & 153.7 & 154.5\\
unrec & 0.50 & 36 & 163.3 & 158.1 & 160.7\\
unrec & 0.50 & 48 & 161.9 & 157.7 & 160.7\\
unrec & 1.00 & 12 & 200.9 & 205.4 & 194.5\\
unrec & 1.00 & 24 & 214.9 & 216.3 & 210.1\\
unrec & 1.00 & 36 & 222.0 & 221.9 & 218.1\\
unrec & 1.00 & 48 & 223.6 & 223.5 & 221.2\\
\bottomrule
\end{tabular}
\end{table}
\begin{table}[!h]
\caption{\label{tab:TourismdatasimfixnoiseFH}Mean RMSE of fixed origin forecasts for simulated data (304 bottom-level series and 8 levels of hierarchy), for different error levels, methods (ETS, ARIMA, OLS), forecast horizons, with/without reconciliation.}
\centering
\begin{tabular}[t]{lrrrrr}
\toprule
Reconciliation & Error & Forecast horizon & ETS & ARIMA & OLS\\
\midrule
rec & 0.01 & 12 & 123.6 & 119.8 & 131.4\\
rec & 0.01 & 24 & 126.5 & 124.2 & 137.3\\
rec & 0.01 & 36 & 133.4 & 131.9 & 148.6\\
rec & 0.01 & 48 & 133.5 & 133.9 & 152.3\\
rec & 0.10 & 12 & 124.8 & 120.3 & 130.7\\
rec & 0.10 & 24 & 128.0 & 124.5 & 137.2\\
rec & 0.10 & 36 & 135.3 & 132.2 & 148.9\\
rec & 0.10 & 48 & 135.5 & 134.3 & 152.6\\
rec & 0.50 & 12 & 144.6 & 143.4 & 147.8\\
rec & 0.50 & 24 & 154.5 & 151.6 & 158.2\\
rec & 0.50 & 36 & 162.8 & 160.7 & 170.0\\
rec & 0.50 & 48 & 164.1 & 163.5 & 174.3\\
rec & 1.00 & 12 & 194.1 & 197.7 & 194.7\\
rec & 1.00 & 24 & 212.4 & 213.3 & 213.0\\
rec & 1.00 & 36 & 221.9 & 222.1 & 224.8\\
rec & 1.00 & 48 & 225.8 & 227.1 & 231.0\\
unrec & 0.01 & 12 & 133.1 & 125.3 & 133.3\\
unrec & 0.01 & 24 & 136.3 & 129.7 & 139.1\\
unrec & 0.01 & 36 & 143.4 & 138.9 & 150.2\\
unrec & 0.01 & 48 & 143.8 & 141.0 & 153.8\\
unrec & 0.10 & 12 & 133.4 & 126.2 & 132.5\\
unrec & 0.10 & 24 & 136.9 & 130.7 & 138.9\\
unrec & 0.10 & 36 & 144.4 & 140.0 & 150.4\\
unrec & 0.10 & 48 & 145.0 & 142.4 & 154.0\\
unrec & 0.50 & 12 & 150.3 & 147.8 & 148.7\\
unrec & 0.50 & 24 & 161.0 & 156.6 & 159.2\\
unrec & 0.50 & 36 & 170.4 & 167.5 & 170.8\\
unrec & 0.50 & 48 & 172.2 & 171.0 & 175.0\\
unrec & 1.00 & 12 & 202.2 & 204.2 & 195.0\\
unrec & 1.00 & 24 & 220.9 & 220.4 & 213.5\\
unrec & 1.00 & 36 & 230.1 & 229.1 & 225.1\\
unrec & 1.00 & 48 & 233.8 & 233.3 & 231.3\\
\bottomrule
\end{tabular}
\end{table}
Figures \ref{fig:TourismdatasimPIrollingnoiseFH} and \ref{fig:TourismdatasimPIfixnoiseFH} display one of the bottom level series with its ARIMA, ETS, and OLS 12 to 48 months ahead reconciled forecasts and prediction intervals, while changing the noise levels. From these two figures we see the OLS approach prediction intervals are almost identical to those from ARIMA and ETS.
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/TourismdatasimPIrollingnoiseFH-1}
}
\caption{Comparing reconciled 'rolling origin' forecasts and prediction intervals for a sample bottom-level series across different error levels (different panels).}\label{fig:TourismdatasimPIrollingnoiseFH}
\end{figure}
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/TourismdatasimPIfixnoiseFH-1}
}
\caption{Comparing reconciled 'fixed origin' forecasts and prediction intervals for a sample bottom-level series across different error levels (different panels).}\label{fig:TourismdatasimPIfixnoiseFH}
\end{figure}
Figures \ref{fig:TourismsimcomputationtimerollingnoiseFH} and \ref{fig:TourismsimcomputationtimefixednoiseFH} show the computation time (seconds) for ETS, ARIMA and OLS methods on rolling and fixed origin forecasts. From these figures we see that increasing the forecast horizon from one to 48 months increases computation time almost linearly, while the noise level does not change the computation time. Also, the computation time for ARIMA and ETS is much longer than OLS. Note that the computation time for the reconciliation step is less than a second and therefore that would be similar for base and reconciled forecasts.
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/TourismsimcomputationtimerollingnoiseFH-1}
}
\caption{Computation time (seconds) for rolling origin reconciled forecasts using ETS, ARIMA and OLS, by horizon and by different error values (panels), for 304 bottom-level series and 8 levels of hierarchy.}\label{fig:TourismsimcomputationtimerollingnoiseFH}
\end{figure}
\begin{figure}
{\centering \includegraphics[width=1\linewidth]{lhf_files/figure-latex/TourismsimcomputationtimefixednoiseFH-1}
}