Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a custom allocator #694

Open
K20shores opened this issue Dec 16, 2024 · 0 comments
Open

Implement a custom allocator #694

K20shores opened this issue Dec 16, 2024 · 0 comments

Comments

@K20shores
Copy link
Collaborator

Much of the memory in the state is used in the hot path to solve chemistry. In our performance benchmarks, we see a large amount of variance compared to some fortran code which solve the same systems with the same numerical routines. The large variance could be due to locality. The fortran data is all stack data allocated in data segment of the program, while the c++ variables are allocated on the heap which could be from each other.

DenseMatrixPolicy variables_;
/// @brief Rate paramters particular to user-defined rate constants, may vary in time
DenseMatrixPolicy custom_rate_parameters_;
/// @brief The reaction rates, may vary in time
DenseMatrixPolicy rate_constants_;
/// @brief Atmospheric conditions, varies in time
std::vector<Conditions> conditions_;
/// @brief The jacobian structure, varies for each solve
SparseMatrixPolicy jacobian_;
/// @brief Immutable data required for the state
std::map<std::string, std::size_t> variable_map_;
std::map<std::string, std::size_t> custom_rate_parameter_map_;
std::vector<std::string> variable_names_{};
SparseMatrixPolicy lower_matrix_;
SparseMatrixPolicy upper_matrix_;
std::size_t state_size_;
std::size_t number_of_grid_cells_;
std::unique_ptr<TemporaryVariables> temporary_variables_;

Also, our LU decomposition routines keep a list of ids used to index into various arrays at runtime. All of these are heap-allocated and may not be close to each other.

std::vector<std::pair<std::size_t, std::size_t>> niLU_;
/// True when A[i][k] is non-zero for each iteration of the middle (k) loop for the upper
/// triangular matrix; False otherwise. Used data type char instead of bool because vector<bool> representation
/// does not support easy retrieval of memory address using data() function.
std::vector<char> do_aik_;
/// Index in A.data_ for A[i][k] for each iteration of the middle (k) loop for the upper
/// triangular matrix when A[i][k] is non-zero
std::vector<std::size_t> aik_;
/// Index in U.data_ for U[i][k] for each iteration of the middle (k) loop for the upper
/// triangular matrix when U[i][k] is non-zero, and the corresponding number of elements
/// in the inner (j) loop
std::vector<std::pair<std::size_t, std::size_t>> uik_nkj_;
/// Index in L.data_ for L[i][j], and in U.data_ for U[j][k] in the upper inner (j) loop
/// when L[i][j] and U[j][k] are both non-zero.
std::vector<std::pair<std::size_t, std::size_t>> lij_ujk_;
/// True when A[k][i] is non-zero for each iteration of the middle (k) loop for the lower
/// triangular matrix; False otherwise. Used data type char instead of bool because vector<bool> representation
/// does not suppor easy retrieval of memory address using data() function.
std::vector<char> do_aki_;
/// Index in A.data_ for A[k][i] for each iteration of the middle (k) loop for the lower
/// triangular matrix when A[k][i] is non-zero.
std::vector<std::size_t> aki_;
/// Index in L.data_ for L[k][i] for each iteration of the middle (k) loop for the lower
/// triangular matrix when L[k][i] is non-zero, and the corresponding number of elements
/// in the inner (j) loop
std::vector<std::pair<std::size_t, std::size_t>> lki_nkj_;
/// Index in L.data_ for L[k][j], and in U.data_ for U[j][i] in the lower inner (j) loop
/// when L[k][j] and U[j][i] are both non-zero.
std::vector<std::pair<std::size_t, std::size_t>> lkj_uji_;
/// Index in U.data_ for U[i][i] for each interation in the middle (k) loop for the lower
/// triangular matrix when L[k][i] is non-zero
std::vector<std::size_t> uii_;

std::vector<std::tuple<std::size_t, std::size_t, std::size_t>> lii_nuij_nlij_;
/// Index in U.data_ and A.data_ for U[i][j] and A[i][j] for each iteration of the inner (j) loop
/// used to set the initial value for the U matrix
std::vector<std::pair<std::size_t, std::size_t>> uij_aij_;
/// Index in L.data_ and A.data_ for L[i][j] and A[i][j] for each iteration of the inner (j) loop
/// used to set the initial value for the L matrix
std::vector<std::pair<std::size_t, std::size_t>> lij_aij_;
/// Index in U.data_ for each non-zero element in U that is zero in A
std::vector<std::size_t> fill_uij_;
/// Index in L.data_ for each non-zero element in L that is zero in A
std::vector<std::size_t> fill_lij_;
/// Index in U.data_ for U[i][i] and the number of elements in the middle (j) and (k) loops
/// for each iteration of the outer (i) loop
std::vector<std::tuple<std::size_t, std::size_t, std::size_t>> uii_nj_nk_;
/// Index in L.data_ for L[j][i] for each iteration of the middle (j) loop
/// for the lower triangular matrix
std::vector<std::size_t> lji_;
/// Number of elements in the inner (j) loops for each iteration of the middle (k) loop for the
/// upper and lower triangular matrices, and the index in U.data_ for U[j][k] for each iteration
std::vector<std::tuple<std::size_t, std::size_t, std::size_t>> nujk_nljk_uik_;
/// Index in U.data_ for U[j][k] and in L.data_ for L[j][i]
/// for each iteration of the inner (j) loop for the upper triangular matrix
std::vector<std::pair<std::size_t, std::size_t>> ujk_lji_;
/// Index in L.data_ for L[j][k] and in L.data_ for L[j][i]
/// for each iteration of the inner (j) loop for the lower triangular matrix
std::vector<std::pair<std::size_t, std::size_t>> ljk_lji_;

Fortunately, for any given state, the size of these data members (in both the state and LU routines) will always be the same size, and, after configuration, we should be able to calculate these sizes.

We should consider using a custom allocator which will place all of the members of state next to each other in memory and all the members of the LU routines near each other in memory.

Acceptance criteria

  • After configuration, the total number of bytes needed for LU and the state are able to be calculated
  • A custom allocator template is added which can dole memory out to the state and LU

Ideas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant