forked from pghysels/STRUMPACK
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG
383 lines (326 loc) · 14.7 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
Version 7.0.1 Released Oct 2022
===============================
- Require C++14 instead of 17
Version 7.0.0 Released Oct 2022
===============================
- Many bugfixes and general improvements.
- Important fixes in the GPU code, and in the usage of SLATE
(GPU capable ScaLAPACK replacement).
- The default ordering now uses METIS_NodeND, instead of the
(undocumented) METIS_NodeNDP routine. This can impact performance,
or for some problems lead to stack overflow, but for others it
drastically reduces memory usage. The old behavior can be restored
with --sp_enable_METIS_NodeNDP.
- Improvements to error handling, mostly related to zero pivots.
- Added new ordering options: AND, MMD, AMD
- Require C++-17
Version 6.3.1 Released March 2022
=================================
- Fix for setting CUDA/HIP device when there are multiple,
but MPI was not initialized
- Memory leak fix in distributed memory GPU code
- Fixed small memory leaks from MPI datatypes
- Change in BLR algorithm selection options
- Changed default blocksize for 2D block cyclic distribution
when using SLATE to 512
- Add 64bit support in the matching (MC64)
- Fix installation of Fortran modules
Version 6.3.0 Released February 2022
====================================
- Change default sparsity reducing ordering to use METIS_NodeNDP
(from METIS_NodeND)
- Significant performance improvements in the GPU code for the
direct solver, from the NERSC December 2021 Hackathon
- Performance fix in symbolic phase
(affecting only some MPI implementations)
- Bump minimum CMake version to 3.17
- CMake fix for Perlmutter
- Compilation fix for GCC 8
- Add support for single precision HODLR/Butterfly
(now requiring ButterflyPACK >= 2.1.0)
Version 6.2.1 Released November 2021
====================================
- HIP compilation fix
- Change default minimum front size for compression to very large
values, use minimum separator size instead.
Version 6.2.0 Released November 2021
====================================
Bugfix release, with some changes in the C interface.
- Additions to the C interface for the sparse solver:
. Solve with multiple RHS
. Specify grid dimensions, stencil, for geometric ordering
. Fix typo in GPU enabling routine
. Set lossy precision
- Bugfix in passing grid via options object
- Bugfix in LOSSY/LOSSLESS compression using ZFP
- Compilation fix for nvcc not finding MPI when the CXX/C
compilers are set to the MPI wrappers
- Fix Fortran module case in CMake when using CCE
- Disable GPU if CUDA/HIP are not enabled
Version 6.1.0 Released October 2021
===================================
- Change default BLR blocksize, and in the BLR preconditioner use
right-looking algorithm as the default. This has higher peak
memory usage, but is much faster and more robust.
- Fixes in printing of compression statistics.
- Always build the test when doing 'make', you no longer have to
run 'make tests'.
- Setup CI at github and gitlab, remove Travis testing.
Version 6.0.0 Released September 2021
=====================================
- Block Low-Rank compression is now supported in the sparse
solver, resulting in an efficient, scalable and robust
precoditioner.
- Improved GPU performance for the sparse direct solver
- Add the StructuredMatrix class, for a general interface to
different rank structured formats. This also comes with a C and a
Fortran interface.
- Several performance improvements
Version 5.1.1 Released January 22, 2021
=======================================
- Small change in the build script for using HIP/ROCm. See
example_build tulip for an example of how to build with HIP/ROCm
support
Version 5.1.0 Released January 21, 2021
=======================================
- Improvements in the distributed BLR code, this now works very
well as a general purpose preconditioner.
- Added a mixed precision solver class, taking input and output in
double precision, but doing the factorization in single. Thanks to
Michael Neuder.
- Several bug fixes.
- Small changes in the HIP/ROCm code.
Version 5.0.0 Released October 12, 2020
=======================================
- Added distributed memory BLR-based preconditioner. For many
problems, this is much more robust than the HSS or HODLR/HODBF
preconditioners.
- Several bugfixes in the GPU code.
- Added HIP support (for the direct solver).
- Other bugfixes.
Version 4.0.0 Released August 17, 2020
======================================
- The CMake build system has been overhauled completely, now using
modern CMake.
- Added a Fortran interface for the sparse solver.
- Updated C interface to the sparse solver.
- Improved HODLR, with butterfly, preconditioning. Now relying on
Butterflypack 1.2.0.
- Much improved CUDA support in the sparse direct solver. CUDA
acceleration is not working yet for the rank-structured solvers.
- Added matrix equilibration to the sparse solver. This slightly
improves numerical accuracy in many cases.
- Added Lossy compression: --sp_compression lossy, using ZFP.
Make sure to configure with ZFP support!
- Removed some options: --sp_enable HSS, --sp_HSS_min_sep_size,
etc.. Now one should use: --sp_compression HSS,
--sp_compression_min_sep_size ...
- Fix for compilation without MPI.
- Several bug fixes and performance improvements
- Renamed FC_GLOBAL Fortran to C macro to STRUMPACK_FC_GLOBAL,
in order to avoid conflicts with SuperLU.
- Large refactoring of the headers into cpp files for faster
compilation.
- Update minimum required version of CombBLAS to 2.0.
Version 3.3.0 Released November 7, 2019
=======================================
- Initial support for cuBLAS and cuSOLVER.
- Improved performance in the HODLR (+ butterfly) preconditioner
- Added a Helmholtz example
- Change default HSS and HODLR leaf sizes to 512 (from 128)
- Only print command line option descriptions from the root (of
MPI_COMM_WORLD)
- Add ANOVA kernel support
- Fix 32 bit overflow when calling MPI_Alltoall
- Rewrite permutations using xlapmr, using manual loops, for
faster solve
- Many other fixes and improvements
Version 3.2.0 Released August 15, 2019
======================================
- Added interface to Hierarchically Off-Diagonal Low-Rank and
Butterfly matrix codes from ButterflyPACK, see
https://github.com/liuyangzhuan/ButterflyPACK
Version 3.1.1 Released October 25, 2018
=======================================
- Fix check for libatomic, needs to be linked explicitly on some
versions of Clang
- Do not call the IDEAS: xSDK standards module
Version 3.1.0 Released October 18, 2018
=======================================
- Changes to the build system for xSDK compliance.
Third party packages are now enabled using -DTPL_ENABLE_<package>=ON,
and libraries and includes specified as
-DTPL_<package>_INCLUDE_DIRS=.. -DTPL_<package>_LIBRARIES=..
- Always generate position independent code.
- Suppress many warnings when using clang.
- Fix possible memory leak.
- Fix linking issue whith atomic operations.
Version 3.0.3 Released October 4, 2018
======================================
- Check for support of the OpenMP priority clause (OpenMP 4.5)
Version 3.0.2 Released October 3, 2018
======================================
- Small fixes in the C interface
Version 3.0.1 Released October 2, 2018
======================================
- Small bugfixes.
Version 3.0.0 Released September 28, 2018
=========================================
- The scalability of the sparse HSS preconditioner has been
drastically improved. Extraction of elements (for the diagonal
blocks and the B_ij generators) is now done for multiple blocks
concurrently.
- Add option to use Combinatorial BLAS approximate weight perfect
matching instead of MC64. Only works in parallel, requires a square
number of processes. This currently requires a special version of
CombBLAS.
- Fix integer overflow in the counting of nonzeros in the factors.
- Allow compilation without MPI (and ScaLAPACK, ParMetis..)
- Add experimental support for Block Low-Rank compression, for both
sparse (preconditioning) and dense. For now, this is only shared
memory.
- Improvements in the build system: In the CMake script, check
whether OpenMP task dependencies and OpenMP taskloops are
supported.
- Removed the CSCMatrix class for compressed sparse column storage,
this was not tested/used.
- Revamped website (thanks Lucy), and doxygen documentation/manual
- Added lots of doxygen documentation/comments. Removed the pdf
manual, all the documentation is now online at:
http://portal.nersc.gov/project/sparse/strumpack/master/index.html
http://portal.nersc.gov/project/sparse/strumpack/v3.0.0/index.html
- Several bugfixes!
Version 2.2.0 Released March 31, 2018
=====================================
- Changes in the build system
- ParMETIS, (PT)Scotch are now optional
- Work to make STRUMPACK compatible with the xSDK policies
- Moved header files in subfolders
- Made interface const correct
- Support for multiple right-hand sides
- Improved threading in HSS code, improved performance for the
hybrid MPI+OpenMP code: many unnecessary matrix copies are
now avoided
- Flops are now counted/reported correctly when running with
MPI and or OpenMP.
- geometric nested dissection now supports wider stencils and
multiple degrees of freedom per node (TODO add in MPI code)
- Many performance and several bug fixes
Version 2.1.0 Released October 26, 2017
=======================================
- Cleanup in the C interface.
- Removed (set_)HSS_min_front_size and --sp_hss_min_front_size from
the manual, it is not supported (yet)
- In some of the examples, the PBLAS blocksize was set to 3 (for
debugging). This has been removes, and the default blocksize of 32
is now used, leading to much better performance for those examples.
- Small bugfix in the adaptive HSS compression stopping criterion.
- Disable replacement of tiny pivots by default. It can lead to
convergence problems for large matrices.
Version 2.0.1 Released October 7, 2017
======================================
- Critical bug fix in HSS Schur complement update.
- Some minor performance improvements.
- Valgrind fixes, compiler warning fixes (when not compiling with
OpenMP).
Version 2.0.0 Released October 1, 2017
======================================
This is a major revision. From now on STRUMPACK is released as a
single library, including both the sparse and the dense
components. Unfortunately, the dense code is not documented yet.
Additionally, the development version of the code is now available
from the public github repository
https://github.com/pghysels/STRUMPACK
The main changes since release 1.1.1 are:
- The options for StrumpackSparseSolver are now set through an object
of type SPOptions, stored in the StrumpackSparseSolver class.
- The template parameter for the real type has been removed. It is
now derived from the scalar type.
- The HSS code has been completely rewritten, performance is much
improved, memory leaks and valgrind errors have been fixed.
- A new, more robust adaptive HSS compression algorithm has been
implemented. This was developed in collaboration with Theo Mary
from Université Toulouse.
- We have automated testing of the code.
- Many bugs have been fixed. However, some bugs probably still
remain. If you happen to encounter any problems while running
STRUMPACK, please do not hesitate to contact us.
Version 1.1.1 Released July 16, 2017
====================================
- Add function STRUMPACK_set_from_options_no_warning_unrecognized(..)
suppress warning
Version 1.1: Released November 8, 2016
======================================
- Rewrite reordering code, can now use sequential METIS and SCOTCH
from distributed interface.
- Change default minimum HSS front size to 2500 (as used in ipdps17 paper).
- Performance improvements in HSS code, mainly HSS compression.
- Add RCM ordering.
- Adaptive HSS compression in MPI code. This changes the default
HSS preconditioning strategy to ADAPTIVE.
- Add option to choose between METIS_NodeND and METIS_NodeNDP.
- Add a program to read matrix market file and print out binary file.
- Several other bug fixes.
Version 1.0.4: Released August 4, 2016
======================================
- Moved examples to the exmples/ folder, deleted test folder.
- Add pde900.mtx test matrix market file.
- Add README to examples.
- Fixes for memory leaks!
- Fix bug in separator reordering and another fix in distributed
separator reordering.
- Print message if LU fails, tell user to try to enable MC64 if not
already enabled.
- Include missing stdlib.h.
- Small performance enhancements in extend-add.
- Small improvement in proportional mapping.
- Various performance improvements throughout the code:
Several HSS algorithms were done serially, first one child, then the other.
- Added some timers for profiling, enable with -DUSE_TASK_TIMER.
- Performance improvements in front_multiply_2d (for random sampling
of front).
- Add a faq in the manual.
- Describe all command line options in the manual.
- Avoid recursion in the e-tree.
- Change the default relative compression tolerance.
Version 1.0.3: Released June 6, 2016
====================================
- Fix for building a shared library (thanks to Barry Smith)
- Fix for complex numbers (thanks to Barry Smith)
- Add an example folder with an example simple Makefile
Version 1.0.2: Released June 2, 2016
====================================
- Explain how to tune the preconditioner in the manual.
- Suppress output from DenseMPI.
- Print warnings/errors to cerr iso cout.
- Allow compilation without OpenMP.
- Fix some compiler warnings.
- Fizes for (Apple)Clang.
- Improve CMake detection of ScaLAPACK.
Version 1.0.1: Released May 23, 2016
====================================
- Small fix for GCC 6.1
- Update TODO
- FIX: set min_rand_HSS equal to number of columns in R.
- Code to print some stats about front size and nr random vectors and ranks.
- Change to the default rank pattern strategy.
- Check fgets return code.
- Update build script, check for ScaLAPACK.
- Print clear warning if BLAS/LAPACK not found.
- Update STRUMPACK README.
- Print correct Metis error code.
- Micro optimizations.
- Fix MPI communicator bug.
- Fix memory leaks.
- Fix very slow separator reordering for MPIDist interface.
- Add description of block-distributed CSR to the manual.
- Use CSR iso triplets in prop dist sparse matrix.
- Missing return statements.
- Do not use __gnu_parallel.
- Fix bug in Redistribute (integer_t -> float).
- Add 64 bit MPIDist test/example.
- Add 64 integer test.
- Check PETSc input for NULL.
Version 1.0.0: Released May 4, 2016
===================================
- Initial release of the STRUMPACK-sparse code with support for MPI+OpenMP