cuda-samples/bin/x86_64/linux/release/APM_cuSolverRf.txt

59 lines
2.0 KiB
Plaintext
Raw Normal View History

2023-03-01 09:41:29 +08:00
step 1.1: preparation
step 1.1: read matrix market format
GPU Device 0: "Hopper" with compute capability 9.0
Using default input file [../../../../Samples/4_CUDA_Libraries/cuSolverRf/lap2D_5pt_n100.mtx]
WARNING: cusolverRf only works for base-0
sparse matrix A is 10000 x 10000 with 49600 nonzeros, base=0
step 1.2: set right hand side vector (b) to 1
step 2: reorder the matrix to reduce zero fill-in
Q = symrcm(A) or Q = symamd(A)
step 3: B = Q*A*Q^T
step 4: solve A*x = b by LU(B) in cusolverSp
step 4.1: create opaque info structure
step 4.2: analyze LU(B) to know structure of Q and R, and upper bound for nnz(L+U)
step 4.3: workspace for LU(B)
step 4.4: compute Ppivot*B = L*U
step 4.5: check if the matrix is singular
step 4.6: solve A*x = b
i.e. solve B*(Qx) = Q*b
step 4.7: evaluate residual r = b - A*x (result on CPU)
(CPU) |b - A*x| = 4.547474E-12
(CPU) |A| = 8.000000E+00
(CPU) |x| = 7.513384E+02
(CPU) |b - A*x|/(|A|*|x|) = 7.565621E-16
step 5: extract P, Q, L and U from P*B*Q^T = L*U
L has implicit unit diagonal
nnzL = 671550, nnzU = 681550
step 6: form P*A*Q^T = L*U
step 6.1: P = Plu*Qreroder
step 6.2: Q = Qlu*Qreorder
step 7: create cusolverRf handle
step 8: set parameters for cusolverRf
step 9: assemble P*A*Q = L*U
step 10: analyze to extract parallelism
step 11: import A to cusolverRf
step 12: refactorization
step 13: solve A*x = b
step 14: evaluate residual r = b - A*x (result on GPU)
(GPU) |b - A*x| = 4.320100E-12
(GPU) |A| = 8.000000E+00
(GPU) |x| = 7.513384E+02
(GPU) |b - A*x|/(|A|*|x|) = 7.187340E-16
===== statistics
nnz(A) = 49600, nnz(L+U) = 1353100, zero fill-in ratio = 27.280242
===== timing profile
reorder A : 0.003304 sec
B = Q*A*Q^T : 0.000761 sec
cusolverSp LU analysis: 0.000188 sec
cusolverSp LU factor : 0.069354 sec
cusolverSp LU solve : 0.001780 sec
cusolverSp LU extract : 0.005654 sec
cusolverRf assemble : 0.002426 sec
cusolverRf reset : 0.000021 sec
cusolverRf refactor : 0.097122 sec
cusolverRf solve : 0.123813 sec