Functions
cudss
CUDSS.cudss
— Functioncudss(phase::String, solver::CudssSolver{T}, x::CuVector{T}, b::CuVector{T})
cudss(phase::String, solver::CudssSolver{T}, X::CuMatrix{T}, B::CuMatrix{T})
cudss(phase::String, solver::CudssSolver{T}, X::CudssMatrix{T}, B::CudssMatrix{T})
cudss(phase::String, solver::CudssBatchedSolver{T}, x::Vector{CuVector{T}}, b::Vector{CuVector{T}})
cudss(phase::String, solver::CudssBatchedSolver{T}, X::Vector{CuMatrix{T}}, B::Vector{CuMatrix{T}})
cudss(phase::String, solver::CudssBatchedSolver{T}, X::CudssBatchedMatrix{T}, B::CudssBatchedMatrix{T})
The parameter type T
is restricted to Float32
, Float64
, ComplexF32
, or ComplexF64
.
The available phases are:
"reordering"
: Reordering;"symbolic_factorization"
: Symbolic factorization;"analysis"
: Reordering and symbolic factorization combined;"factorization"
: Numerical factorization;"refactorization"
: Numerical re-factorization;"solve_fwd_perm"
: Applying reordering permutation to the right hand side before the forward substitution;"solve_fwd"
: Forward substitution sub-step of the solving phase, including the local permutation due to partial pivoting;"solve_diag"
: Diagonal solve sub-step of the solving phase (only needed for symmetric / hermitian indefinite matrices);"solve_bwd"
: Backward substitution sub-step of the solving phase, including the local permutation due to partial pivoting;"solve_bwd_perm"
: Applying inverse reordering permutation to the intermediate solution after the backward substitution. If matching (and scaling) is enabled, this phase also includes applying the inverse matching permutation and inverse scaling (as the matching permutation and scalings were used to modify the matrix before the factorization);"solve_refinement"
: Iterative refinement;"solve"
: Full solving phase, combining all sub-phases and (optional) iterative refinement.
When the Schur complement mode is enabled (option "schur_mode"
set to 1
), a specific combination of phases is required. For that reason, we added shorthand phases:
"solve_fwd_schur"
: combines the phases"solve_fwd_perm"
,"solve_fwd"
, and"solve_diag"
;"solve_bwd_schur"
: combines the phases"solve_bwd"
and"solve_bwd_perm"
.
cudss_update
CUDSS.cudss_update
— Functioncudss_update(solver::CudssSolver{T,INT}, A::CuSparseMatrixCSR{T,INT})
cudss_update(solver::CudssSolver{T,INT}, rowPtr::CuVector{INT}, colVal::CuVector{INT}, nzVal::CuVector{T})
cudss_update(solver::CudssBatchedSolver{T,INT}, A::Vector{CuSparseMatrixCSR{T,INT}})
cudss_update(matrix::CudssMatrix{T}, b::CuVector{T})
cudss_update(matrix::CudssMatrix{T}, B::CuMatrix{T})
cudss_update(matrix::CudssMatrix{T,INT}, A::CuSparseMatrixCSR{T,INT})
cudss_update(matrix::CudssMatrix{T,INT}, rowPtr::CuVector{INT}, colVal::CuVector{INT}, nzVal::CuVector{T})
cudss_update(matrix::CudssBatchedMatrix{T}, b::Vector{CuVector{T}})
cudss_update(matrix::CudssBatchedMatrix{T}, B::Vector{CuMatrix{T}})
cudss_update(matrix::CudssBatchedMatrix{T,INT}, A::Vector{CuSparseMatrixCSR{T,INT}})
The parameter type T
is restricted to Float32
, Float64
, ComplexF32
, or ComplexF64
, while INT
is restricted to Int32
or Int64
.
Update the contents of a CudssMatrix
– CudssBatchedMatrix
or CudssSolver
– CudssBatchedSolver
with new numerical values.
cudss_set
CUDSS.cudss_set
— Functioncudss_set(solver::CudssSolver, parameter::String, value)
cudss_set(solver::CudssBatchedSolver, parameter::String, value)
The available configuration parameters are:
"reordering_alg"
: Algorithm for the reordering phase ("default"
,"algo1"
,"algo2"
,"algo3"
,"algo4"
, or"algo5"
);"factorization_alg"
: Algorithm for the factorization phase ("default"
,"algo1"
,"algo2"
,"algo3"
,"algo4"
, or"algo5"
);"solve_alg"
: Algorithm for the solving phase ("default"
,"algo1"
,"algo2"
,"algo3"
,"algo4"
, or"algo5"
);"use_matching"
: A flag to enable (1
) or disable (0
) the matching;"matching_alg"
: Algorithm for the matching;"solve_mode"
: Potential modificator on the system matrix (transpose or adjoint);"ir_n_steps"
: Number of steps during the iterative refinement;"ir_tol"
: Iterative refinement tolerance;"pivot_type"
: Type of pivoting ('C'
,'R'
or'N'
);"pivot_threshold"
: Pivoting threshold which is used to determine if digonal element is subject to pivoting;"pivot_epsilon"
: Pivoting epsilon, absolute value to replace singular diagonal elements;"max_lu_nnz"
: Upper limit on the number of nonzero entries in LU factors for non-symmetric matrices;"hybrid_memory_mode"
: Hybrid memory mode –0
(default = device-only) or1
(hybrid = host/device);"hybrid_device_memory_limit"
: User-defined device memory limit (number of bytes) for the hybrid memory mode;"use_cuda_register_memory"
: A flag to enable (1
) or disable (0
) usage ofcudaHostRegister()
by the hybrid memory mode;"host_nthreads"
: Number of threads to be used by cuDSS in multi-threaded mode;"hybrid_execute_mode"
: Hybrid execute mode –0
(default = device-only) or1
(hybrid = host/device);"pivot_epsilon_alg"
: Algorithm for the pivot epsilon calculation;"nd_nlevels"
: Minimum number of levels for the nested dissection reordering;"ubatch_size"
: The number of matrices in a uniform batch of systems to be processed by cuDSS;"ubatch_index"
: Use-1
(default) to process all matrices in the uniform batch, or a 0-based index to process a single matrix during the factorization or solve phase;"use_superpanels"
: Use superpanel optimization –1
(default = enabled) or0
(disabled);"device_count"
: Device count in case of multiple device;"device_indices"
: A list of device indices as an integer array;"schur_mode"
: Schur complement mode –0
(default = disabled) or1
(enabled);"deterministic_mode"
: Enable deterministic mode –0
(default = disabled) or1
(enabled).
The available data parameters are:
"info"
: Device-side error information;"user_perm"
: User permutation to be used instead of running the reordering algorithms;"comm"
: Communicator for Multi-GPU multi-node mode;"user_elimination_tree"
: User provided elimination tree information, which is used instead of running the reordering algorithm;"user_schur_indices"
: User-provided Schur complement indices. The provided buffer should be an integer array of sizen
, wheren
is the dimension of the matrix. The values should be equal to1
for the rows / columns which are part of the Schur complement and0
for the rest;"user_host_interrupt"
: User-provided host interrupt pointer;"schur_matrix"
: Schur complement matrix passed as acudssMatrix_t
object.
The data parameter "info"
must be restored to 0
if a Cholesky factorization fails due to indefiniteness and refactorization is performed on an updated matrix.
Note that for the data parameters "perm_reorder_row"
, "perm_row"
, "scale_row"
, "perm_reorder_col"
, "perm_col"
, "scale_col"
, "perm_matching"
, "diag"
, and "memory_estimates"
, this function only specifies which vector to update for a subsequent call to cudss_get
.
cudss_get
CUDSS.cudss_get
— Functionvalue = cudss_get(solver::CudssSolver, parameter::String)
value = cudss_get(solver::CudssBatchedSolver, parameter::String)
The available configuration parameters are:
"reordering_alg"
: Algorithm for the reordering phase;"factorization_alg"
: Algorithm for the factorization phase;"solve_alg"
: Algorithm for the solving phase;"use_matching"
: A flag to enable (1
) or disable (0
) the matching;"matching_alg"
: Algorithm for the matching;"solve_mode"
: Potential modificator on the system matrix (transpose or adjoint);"ir_n_steps"
: Number of steps during the iterative refinement;"ir_tol"
: Iterative refinement tolerance;"pivot_type"
: Type of pivoting;"pivot_threshold"
: Pivoting threshold which is used to determine if digonal element is subject to pivoting;"pivot_epsilon"
: Pivoting epsilon, absolute value to replace singular diagonal elements;"max_lu_nnz"
: Upper limit on the number of nonzero entries in LU factors for non-symmetric matrices;"hybrid_memory_mode"
: Hybrid memory mode –0
(default = device-only) or1
(hybrid = host/device);"hybrid_device_memory_limit"
: User-defined device memory limit (number of bytes) for the hybrid memory mode;"use_cuda_register_memory"
: A flag to enable (1
) or disable (0
) usage ofcudaHostRegister()
by the hybrid memory mode;"host_nthreads"
: Number of threads to be used by cuDSS in multi-threaded mode;"hybrid_execute_mode"
: Hybrid execute mode –0
(default = device-only) or1
(hybrid = host/device);"pivot_epsilon_alg"
: Algorithm for the pivot epsilon calculation;"nd_nlevels"
: Minimum number of levels for the nested dissection reordering;"ubatch_size"
: The number of matrices in a uniform batch of systems to be processed by cuDSS;"ubatch_index"
: Use-1
(default) to process all matrices in the uniform batch, or a 0-based index to process a single matrix during the factorization or solve phase;"use_superpanels"
: Use superpanel optimization –1
(default = enabled) or0
(disabled);"device_count"
: Device count in case of multiple device;"device_indices"
: A list of device indices as an integer array;"schur_mode"
: Schur complement mode –0
(default = disabled) or1
(enabled);"deterministic_mode"
: Enable deterministic mode –0
(default = disabled) or1
(enabled).
The available data parameters are:
"info"
: Device-side error information;"lu_nnz"
: Number of non-zero entries in LU factors;"npivots"
: Number of pivots encountered during factorization;"inertia"
: Tuple of positive and negative indices of inertia for symmetric / hermitian indefinite matrices;"perm_reorder_row"
: Reordering permutation for the rows;"perm_reorder_col"
: Reordering permutation for the columns;"perm_row"
: Final row permutation (which includes effects of both reordering and pivoting);"perm_col"
: Final column permutation (which includes effects of both reordering and pivoting);"perm_matching"
: Matching (column) permutation Q such that A[:,Q] is reordered and then factorized;"scale_row"
: A vector of scaling factors applied to the rows of the factorized matrix;"scale_col"
: A vector of scaling factors applied to the columns of the factorized matrix;"diag"
: Diagonal of the factorized matrix;"hybrid_device_memory_min"
: Minimal amount of device memory (number of bytes) required in the hybrid memory mode;"memory_estimates"
: Memory estimates (in bytes) for host and device memory required for the chosen memory mode;"nsuperpanels"
: Number of superpanels in the matrix;"schur_shape"
: Shape of the Schur complement matrix as a triplet (nrows, ncols, nnz);"schur_matrix"
: Retrieve the Schur complement matrix;"elimination_tree"
: User provided elimination tree information, which is used instead of running the reordering algorithm. It must be used in combination with"user_perm"
to have an effect.
The data parameters "info"
, "lu_nnz"
, "perm_reorder_row"
, "perm_reorder_col"
, "perm_matching"
, "scale_row"
, "scale_col"
, "hybrid_device_memory_min"
and "memory_estimates"
require the phase "analyse"
performed by cudss
. The data parameters "npivots"
, "inertia"
and "diag"
require the phases "analyse"
and "factorization"
performed by cudss
. The data parameters "perm_matching"
, "scale_row"
, and "scale_col"
require matching to be enabled (the configuration parameter "use_matching"
must be set to 1
).
Note that for the data parameters "perm_reorder_row"
, "perm_row"
, "scale_row"
, "perm_reorder_col"
, "perm_col"
, "scale_col"
, "perm_matching"
, "diag"
, and "memory_estimates"
, a call to cudss_set
is required beforehand to specify which vector to update.