mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2024-11-25 03:49:14 +08:00
59 lines
2.0 KiB
Plaintext
59 lines
2.0 KiB
Plaintext
|
step 1.1: preparation
|
||
|
step 1.1: read matrix market format
|
||
|
GPU Device 0: "Hopper" with compute capability 9.0
|
||
|
|
||
|
Using default input file [../../../../Samples/4_CUDA_Libraries/cuSolverRf/lap2D_5pt_n100.mtx]
|
||
|
WARNING: cusolverRf only works for base-0
|
||
|
sparse matrix A is 10000 x 10000 with 49600 nonzeros, base=0
|
||
|
step 1.2: set right hand side vector (b) to 1
|
||
|
step 2: reorder the matrix to reduce zero fill-in
|
||
|
Q = symrcm(A) or Q = symamd(A)
|
||
|
step 3: B = Q*A*Q^T
|
||
|
step 4: solve A*x = b by LU(B) in cusolverSp
|
||
|
step 4.1: create opaque info structure
|
||
|
step 4.2: analyze LU(B) to know structure of Q and R, and upper bound for nnz(L+U)
|
||
|
step 4.3: workspace for LU(B)
|
||
|
step 4.4: compute Ppivot*B = L*U
|
||
|
step 4.5: check if the matrix is singular
|
||
|
step 4.6: solve A*x = b
|
||
|
i.e. solve B*(Qx) = Q*b
|
||
|
step 4.7: evaluate residual r = b - A*x (result on CPU)
|
||
|
(CPU) |b - A*x| = 4.547474E-12
|
||
|
(CPU) |A| = 8.000000E+00
|
||
|
(CPU) |x| = 7.513384E+02
|
||
|
(CPU) |b - A*x|/(|A|*|x|) = 7.565621E-16
|
||
|
step 5: extract P, Q, L and U from P*B*Q^T = L*U
|
||
|
L has implicit unit diagonal
|
||
|
nnzL = 671550, nnzU = 681550
|
||
|
step 6: form P*A*Q^T = L*U
|
||
|
step 6.1: P = Plu*Qreroder
|
||
|
step 6.2: Q = Qlu*Qreorder
|
||
|
step 7: create cusolverRf handle
|
||
|
step 8: set parameters for cusolverRf
|
||
|
step 9: assemble P*A*Q = L*U
|
||
|
step 10: analyze to extract parallelism
|
||
|
step 11: import A to cusolverRf
|
||
|
step 12: refactorization
|
||
|
step 13: solve A*x = b
|
||
|
step 14: evaluate residual r = b - A*x (result on GPU)
|
||
|
(GPU) |b - A*x| = 4.320100E-12
|
||
|
(GPU) |A| = 8.000000E+00
|
||
|
(GPU) |x| = 7.513384E+02
|
||
|
(GPU) |b - A*x|/(|A|*|x|) = 7.187340E-16
|
||
|
===== statistics
|
||
|
nnz(A) = 49600, nnz(L+U) = 1353100, zero fill-in ratio = 27.280242
|
||
|
|
||
|
===== timing profile
|
||
|
reorder A : 0.003304 sec
|
||
|
B = Q*A*Q^T : 0.000761 sec
|
||
|
|
||
|
cusolverSp LU analysis: 0.000188 sec
|
||
|
cusolverSp LU factor : 0.069354 sec
|
||
|
cusolverSp LU solve : 0.001780 sec
|
||
|
cusolverSp LU extract : 0.005654 sec
|
||
|
|
||
|
cusolverRf assemble : 0.002426 sec
|
||
|
cusolverRf reset : 0.000021 sec
|
||
|
cusolverRf refactor : 0.097122 sec
|
||
|
cusolverRf solve : 0.123813 sec
|