Functions

cudss

CUDSS.cudssFunction
cudss(phase::String, solver::CudssSolver{T}, x::CuVector{T}, b::CuVector{T})
cudss(phase::String, solver::CudssSolver{T}, X::CuMatrix{T}, B::CuMatrix{T})
cudss(phase::String, solver::CudssSolver{T}, X::CudssMatrix{T}, B::CudssMatrix{T})
cudss(phase::String, solver::CudssBatchedSolver{T}, x::Vector{CuVector{T}}, b::Vector{CuVector{T}})
cudss(phase::String, solver::CudssBatchedSolver{T}, X::Vector{CuMatrix{T}}, B::Vector{CuMatrix{T}})
cudss(phase::String, solver::CudssBatchedSolver{T}, X::CudssBatchedMatrix{T}, B::CudssBatchedMatrix{T})

The parameter type T is restricted to Float32, Float64, ComplexF32, or ComplexF64.

The available phases are:

  • "reordering": Reordering;
  • "symbolic_factorization": Symbolic factorization;
  • "analysis": Reordering and symbolic factorization combined;
  • "factorization": Numerical factorization;
  • "refactorization": Numerical re-factorization;
  • "solve_fwd_perm": Applying reordering permutation to the right hand side before the forward substitution;
  • "solve_fwd": Forward substitution sub-step of the solving phase, including the local permutation due to partial pivoting;
  • "solve_diag": Diagonal solve sub-step of the solving phase (only needed for symmetric / hermitian indefinite matrices);
  • "solve_bwd": Backward substitution sub-step of the solving phase, including the local permutation due to partial pivoting;
  • "solve_bwd_perm": Applying inverse reordering permutation to the intermediate solution after the backward substitution. If matching (and scaling) is enabled, this phase also includes applying the inverse matching permutation and inverse scaling (as the matching permutation and scalings were used to modify the matrix before the factorization);
  • "solve_refinement": Iterative refinement;
  • "solve": Full solving phase, combining all sub-phases and (optional) iterative refinement.

When the Schur complement mode is enabled (option "schur_mode" set to 1), a specific combination of phases is required. For that reason, we added shorthand phases:

  • "solve_fwd_schur": combines the phases "solve_fwd_perm", "solve_fwd", and "solve_diag";
  • "solve_bwd_schur": combines the phases "solve_bwd" and "solve_bwd_perm".
source

cudss_update

CUDSS.cudss_updateFunction
cudss_update(solver::CudssSolver{T,INT}, A::CuSparseMatrixCSR{T,INT})
cudss_update(solver::CudssSolver{T,INT}, rowPtr::CuVector{INT}, colVal::CuVector{INT}, nzVal::CuVector{T})
cudss_update(solver::CudssBatchedSolver{T,INT}, A::Vector{CuSparseMatrixCSR{T,INT}})
cudss_update(matrix::CudssMatrix{T}, b::CuVector{T})
cudss_update(matrix::CudssMatrix{T}, B::CuMatrix{T})
cudss_update(matrix::CudssMatrix{T,INT}, A::CuSparseMatrixCSR{T,INT})
cudss_update(matrix::CudssMatrix{T,INT}, rowPtr::CuVector{INT}, colVal::CuVector{INT}, nzVal::CuVector{T})
cudss_update(matrix::CudssBatchedMatrix{T}, b::Vector{CuVector{T}})
cudss_update(matrix::CudssBatchedMatrix{T}, B::Vector{CuMatrix{T}})
cudss_update(matrix::CudssBatchedMatrix{T,INT}, A::Vector{CuSparseMatrixCSR{T,INT}})

The parameter type T is restricted to Float32, Float64, ComplexF32, or ComplexF64, while INT is restricted to Int32 or Int64.

Update the contents of a CudssMatrixCudssBatchedMatrix or CudssSolverCudssBatchedSolver with new numerical values.

source

cudss_set

CUDSS.cudss_setFunction
cudss_set(solver::CudssSolver, parameter::String, value)
cudss_set(solver::CudssBatchedSolver, parameter::String, value)

The available configuration parameters are:

  • "reordering_alg": Algorithm for the reordering phase ("default", "algo1", "algo2", "algo3", "algo4", or "algo5");
  • "factorization_alg": Algorithm for the factorization phase ("default", "algo1", "algo2", "algo3", "algo4", or "algo5");
  • "solve_alg": Algorithm for the solving phase ("default", "algo1", "algo2", "algo3", "algo4", or "algo5");
  • "use_matching": A flag to enable (1) or disable (0) the matching;
  • "matching_alg": Algorithm for the matching;
  • "solve_mode": Potential modificator on the system matrix (transpose or adjoint);
  • "ir_n_steps": Number of steps during the iterative refinement;
  • "ir_tol": Iterative refinement tolerance;
  • "pivot_type": Type of pivoting ('C', 'R' or 'N');
  • "pivot_threshold": Pivoting threshold which is used to determine if digonal element is subject to pivoting;
  • "pivot_epsilon": Pivoting epsilon, absolute value to replace singular diagonal elements;
  • "max_lu_nnz": Upper limit on the number of nonzero entries in LU factors for non-symmetric matrices;
  • "hybrid_memory_mode": Hybrid memory mode – 0 (default = device-only) or 1 (hybrid = host/device);
  • "hybrid_device_memory_limit": User-defined device memory limit (number of bytes) for the hybrid memory mode;
  • "use_cuda_register_memory": A flag to enable (1) or disable (0) usage of cudaHostRegister() by the hybrid memory mode;
  • "host_nthreads": Number of threads to be used by cuDSS in multi-threaded mode;
  • "hybrid_execute_mode": Hybrid execute mode – 0 (default = device-only) or 1 (hybrid = host/device);
  • "pivot_epsilon_alg": Algorithm for the pivot epsilon calculation;
  • "nd_nlevels": Minimum number of levels for the nested dissection reordering;
  • "ubatch_size": The number of matrices in a uniform batch of systems to be processed by cuDSS;
  • "ubatch_index": Use -1 (default) to process all matrices in the uniform batch, or a 0-based index to process a single matrix during the factorization or solve phase;
  • "use_superpanels": Use superpanel optimization – 1 (default = enabled) or 0 (disabled);
  • "device_count": Device count in case of multiple device;
  • "device_indices": A list of device indices as an integer array;
  • "schur_mode": Schur complement mode – 0 (default = disabled) or 1 (enabled);
  • "deterministic_mode": Enable deterministic mode – 0 (default = disabled) or 1 (enabled).

The available data parameters are:

  • "info": Device-side error information;
  • "user_perm": User permutation to be used instead of running the reordering algorithms;
  • "comm": Communicator for Multi-GPU multi-node mode;
  • "user_elimination_tree": User provided elimination tree information, which is used instead of running the reordering algorithm;
  • "user_schur_indices": User-provided Schur complement indices. The provided buffer should be an integer array of size n, where n is the dimension of the matrix. The values should be equal to 1 for the rows / columns which are part of the Schur complement and 0 for the rest;
  • "user_host_interrupt": User-provided host interrupt pointer;
  • "schur_matrix": Schur complement matrix passed as a cudssMatrix_t object.

The data parameter "info" must be restored to 0 if a Cholesky factorization fails due to indefiniteness and refactorization is performed on an updated matrix.

Note that for the data parameters "perm_reorder_row", "perm_row", "scale_row", "perm_reorder_col", "perm_col", "scale_col", "perm_matching", "diag", and "memory_estimates", this function only specifies which vector to update for a subsequent call to cudss_get.

source

cudss_get

CUDSS.cudss_getFunction
value = cudss_get(solver::CudssSolver, parameter::String)
value = cudss_get(solver::CudssBatchedSolver, parameter::String)

The available configuration parameters are:

  • "reordering_alg": Algorithm for the reordering phase;
  • "factorization_alg": Algorithm for the factorization phase;
  • "solve_alg": Algorithm for the solving phase;
  • "use_matching": A flag to enable (1) or disable (0) the matching;
  • "matching_alg": Algorithm for the matching;
  • "solve_mode": Potential modificator on the system matrix (transpose or adjoint);
  • "ir_n_steps": Number of steps during the iterative refinement;
  • "ir_tol": Iterative refinement tolerance;
  • "pivot_type": Type of pivoting;
  • "pivot_threshold": Pivoting threshold which is used to determine if digonal element is subject to pivoting;
  • "pivot_epsilon": Pivoting epsilon, absolute value to replace singular diagonal elements;
  • "max_lu_nnz": Upper limit on the number of nonzero entries in LU factors for non-symmetric matrices;
  • "hybrid_memory_mode": Hybrid memory mode – 0 (default = device-only) or 1 (hybrid = host/device);
  • "hybrid_device_memory_limit": User-defined device memory limit (number of bytes) for the hybrid memory mode;
  • "use_cuda_register_memory": A flag to enable (1) or disable (0) usage of cudaHostRegister() by the hybrid memory mode;
  • "host_nthreads": Number of threads to be used by cuDSS in multi-threaded mode;
  • "hybrid_execute_mode": Hybrid execute mode – 0 (default = device-only) or 1 (hybrid = host/device);
  • "pivot_epsilon_alg": Algorithm for the pivot epsilon calculation;
  • "nd_nlevels": Minimum number of levels for the nested dissection reordering;
  • "ubatch_size": The number of matrices in a uniform batch of systems to be processed by cuDSS;
  • "ubatch_index": Use -1 (default) to process all matrices in the uniform batch, or a 0-based index to process a single matrix during the factorization or solve phase;
  • "use_superpanels": Use superpanel optimization – 1 (default = enabled) or 0 (disabled);
  • "device_count": Device count in case of multiple device;
  • "device_indices": A list of device indices as an integer array;
  • "schur_mode": Schur complement mode – 0 (default = disabled) or 1 (enabled);
  • "deterministic_mode": Enable deterministic mode – 0 (default = disabled) or 1 (enabled).

The available data parameters are:

  • "info": Device-side error information;
  • "lu_nnz": Number of non-zero entries in LU factors;
  • "npivots": Number of pivots encountered during factorization;
  • "inertia": Tuple of positive and negative indices of inertia for symmetric / hermitian indefinite matrices;
  • "perm_reorder_row": Reordering permutation for the rows;
  • "perm_reorder_col": Reordering permutation for the columns;
  • "perm_row": Final row permutation (which includes effects of both reordering and pivoting);
  • "perm_col": Final column permutation (which includes effects of both reordering and pivoting);
  • "perm_matching": Matching (column) permutation Q such that A[:,Q] is reordered and then factorized;
  • "scale_row": A vector of scaling factors applied to the rows of the factorized matrix;
  • "scale_col": A vector of scaling factors applied to the columns of the factorized matrix;
  • "diag": Diagonal of the factorized matrix;
  • "hybrid_device_memory_min": Minimal amount of device memory (number of bytes) required in the hybrid memory mode;
  • "memory_estimates": Memory estimates (in bytes) for host and device memory required for the chosen memory mode;
  • "nsuperpanels": Number of superpanels in the matrix;
  • "schur_shape": Shape of the Schur complement matrix as a triplet (nrows, ncols, nnz);
  • "schur_matrix": Retrieve the Schur complement matrix;
  • "elimination_tree": User provided elimination tree information, which is used instead of running the reordering algorithm. It must be used in combination with "user_perm" to have an effect.

The data parameters "info", "lu_nnz", "perm_reorder_row", "perm_reorder_col", "perm_matching", "scale_row", "scale_col", "hybrid_device_memory_min" and "memory_estimates" require the phase "analyse" performed by cudss. The data parameters "npivots", "inertia" and "diag" require the phases "analyse" and "factorization" performed by cudss. The data parameters "perm_matching", "scale_row", and "scale_col" require matching to be enabled (the configuration parameter "use_matching" must be set to 1).

Note that for the data parameters "perm_reorder_row", "perm_row", "scale_row", "perm_reorder_col", "perm_col", "scale_col", "perm_matching", "diag", and "memory_estimates", a call to cudss_set is required beforehand to specify which vector to update.

source