step 1.1: preparation step 1.1: read matrix market format GPU Device 0: "Hopper" with compute capability 9.0 Using default input file [../../../../Samples/4_CUDA_Libraries/cuSolverRf/lap2D_5pt_n100.mtx] WARNING: cusolverRf only works for base-0 sparse matrix A is 10000 x 10000 with 49600 nonzeros, base=0 step 1.2: set right hand side vector (b) to 1 step 2: reorder the matrix to reduce zero fill-in Q = symrcm(A) or Q = symamd(A) step 3: B = Q*A*Q^T step 4: solve A*x = b by LU(B) in cusolverSp step 4.1: create opaque info structure step 4.2: analyze LU(B) to know structure of Q and R, and upper bound for nnz(L+U) step 4.3: workspace for LU(B) step 4.4: compute Ppivot*B = L*U step 4.5: check if the matrix is singular step 4.6: solve A*x = b i.e. solve B*(Qx) = Q*b step 4.7: evaluate residual r = b - A*x (result on CPU) (CPU) |b - A*x| = 4.547474E-12 (CPU) |A| = 8.000000E+00 (CPU) |x| = 7.513384E+02 (CPU) |b - A*x|/(|A|*|x|) = 7.565621E-16 step 5: extract P, Q, L and U from P*B*Q^T = L*U L has implicit unit diagonal nnzL = 671550, nnzU = 681550 step 6: form P*A*Q^T = L*U step 6.1: P = Plu*Qreroder step 6.2: Q = Qlu*Qreorder step 7: create cusolverRf handle step 8: set parameters for cusolverRf step 9: assemble P*A*Q = L*U step 10: analyze to extract parallelism step 11: import A to cusolverRf step 12: refactorization step 13: solve A*x = b step 14: evaluate residual r = b - A*x (result on GPU) (GPU) |b - A*x| = 4.320100E-12 (GPU) |A| = 8.000000E+00 (GPU) |x| = 7.513384E+02 (GPU) |b - A*x|/(|A|*|x|) = 7.187340E-16 ===== statistics nnz(A) = 49600, nnz(L+U) = 1353100, zero fill-in ratio = 27.280242 ===== timing profile reorder A : 0.003304 sec B = Q*A*Q^T : 0.000761 sec cusolverSp LU analysis: 0.000188 sec cusolverSp LU factor : 0.069354 sec cusolverSp LU solve : 0.001780 sec cusolverSp LU extract : 0.005654 sec cusolverRf assemble : 0.002426 sec cusolverRf reset : 0.000021 sec cusolverRf refactor : 0.097122 sec cusolverRf solve : 0.123813 sec