Class LibMatrixCUDA
- java.lang.Object
-
- org.apache.sysds.runtime.matrix.data.LibMatrixCUDA
-
- Direct Known Subclasses:
LibMatrixCuDNN
,LibMatrixCuDNNInputRowFetcher
,LibMatrixCuMatMult
public class LibMatrixCUDA extends Object
All CUDA kernels and library calls are redirected through this class- See Also:
GPUContext
,GPUObject
-
-
Field Summary
Fields Modifier and Type Field Description static CudaSupportFunctions
cudaSupportFunctions
static String
customKernelSuffix
static int
sizeOfDataType
-
Constructor Summary
Constructors Constructor Description LibMatrixCUDA()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
abs(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "abs" operation on a matrix on the GPUstatic void
acos(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "acos" operation on a matrix on the GPUstatic void
asin(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "asin" operation on a matrix on the GPUstatic void
atan(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "atan" operation on a matrix on the GPUstatic void
axpy(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName, double constant)
Performs daxpy operationstatic void
biasAdd(GPUContext gCtx, String instName, MatrixObject input, MatrixObject bias, MatrixObject outputBlock)
Performs the operation corresponding to the DML script: ones = matrix(1, rows=1, cols=Hout*Wout) output = input + matrix(bias %*% ones, rows=1, cols=F*Hout*Wout) This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in functionstatic void
biasMultiply(GPUContext gCtx, String instName, MatrixObject input, MatrixObject bias, MatrixObject outputBlock)
Performs the operation corresponding to the DML script: ones = matrix(1, rows=1, cols=Hout*Wout) output = input * matrix(bias %*% ones, rows=1, cols=F*Hout*Wout) This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in functionstatic void
cbind(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName)
static void
ceil(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "ceil" operation on a matrix on the GPUstatic void
channelSums(GPUContext gCtx, String instName, MatrixObject input, MatrixObject outputBlock, long C, long HW)
Perform channel_sums operations: out = rowSums(matrix(colSums(A), rows=C, cols=HW))static int
computeNNZ(GPUContext gCtx, jcuda.Pointer densePtr, int length)
Utility to compute number of non-zeroes on the GPUstatic void
cos(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "cos" operation on a matrix on the GPUstatic void
cosh(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "cosh" operation on a matrix on the GPUstatic void
cumulativeScan(ExecutionContext ec, GPUContext gCtx, String instName, String kernelFunction, MatrixObject in, String outputName)
Cumulative scanstatic void
cumulativeSumProduct(ExecutionContext ec, GPUContext gCtx, String instName, String kernelFunction, MatrixObject in, String outputName)
Cumulative sum-product kernel cascade invokationstatic void
denseTranspose(ExecutionContext ec, GPUContext gCtx, String instName, jcuda.Pointer A, jcuda.Pointer C, long numRowsA, long numColsA)
Computes C = t(A)static void
deviceCopy(String instName, jcuda.Pointer src, jcuda.Pointer dest, int rlen, int clen)
Performs a deep copy of input device double pointer corresponding to matrixstatic jcuda.Pointer
double2float(GPUContext gCtx, jcuda.Pointer A, jcuda.Pointer ret, int numElems)
static void
exp(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "exp" operation on a matrix on the GPUstatic jcuda.Pointer
float2double(GPUContext gCtx, jcuda.Pointer A, jcuda.Pointer ret, int numElems)
static void
floor(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "floor" operation on a matrix on the GPUstatic JCudaKernels
getCudaKernels(GPUContext gCtx)
static MatrixObject
getDenseMatrixOutputForGPUInstruction(ExecutionContext ec, String instName, String name, long numRows, long numCols)
Helper method to get the output block (allocated on the GPU) Also records performance information intoStatistics
static MatrixObject
getDenseMatrixOutputForGPUInstruction(ExecutionContext ec, String instName, String name, long numRows, long numCols, boolean initialize)
static jcuda.Pointer
getDensePointer(GPUContext gCtx, MatrixObject input, String instName)
Convenience method to get jcudaDenseMatrixPtr.static long
getNnz(GPUContext gCtx, String instName, MatrixObject mo, boolean recomputeDenseNNZ)
Note: if the matrix is in dense format, it explicitly re-computes the number of nonzeros.static boolean
isInSparseFormat(GPUContext gCtx, MatrixObject mo)
static void
log(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "log" operation on a matrix on the GPUstatic void
matmultTSMM(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject left, String outputName, boolean isLeftTransposed)
Performs tsmm, A %*% A' or A' %*% A, on GPU by exploiting cublasDsyrk(...)static void
matrixMatrixArithmetic(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName, boolean isLeftTransposed, boolean isRightTransposed, BinaryOperator op)
Performs elementwise arithmetic operation specified by op of two input matrices in1 and in2static void
matrixMatrixRelational(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName, BinaryOperator op)
Performs elementwise operation relational specified by op of two input matrices in1 and in2static void
matrixScalarArithmetic(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName, boolean isInputTransposed, ScalarOperator op)
Entry point to perform elementwise matrix-scalar arithmetic operation specified by opstatic void
matrixScalarOp(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName, boolean isInputTransposed, ScalarOperator op)
Utility to do matrix-scalar operation kernelstatic void
matrixScalarRelational(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName, ScalarOperator op)
Entry point to perform elementwise matrix-scalar relational operation specified by opstatic jcuda.Pointer
one()
Convenience method to get a pointer to value '1.0' on device.static void
rbind(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName)
static void
reluBackward(GPUContext gCtx, String instName, MatrixObject input, MatrixObject dout, MatrixObject outputBlock)
This method computes the backpropagation errors for previous layer of relu operationstatic void
resetFloatingPointPrecision()
Sets the internal state based on the DMLScript.DATA_TYPEstatic void
round(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "round" operation on a matrix on the GPUstatic void
sigmoid(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sigmoid" operation on a matrix on the GPUstatic void
sign(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sign" operation on a matrix on the GPUstatic void
sin(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sin" operation on a matrix on the GPUstatic void
sinh(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sinh" operation on a matrix on the GPUstatic void
sliceOperations(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, IndexRange ixrange, String outputName)
Method to perform rightIndex operation for a given lower and upper bounds in row and column dimensions.static void
solve(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName)
Implements the "solve" function for systemds Ax = B (A is of size m*n, B is of size m*1, x is of size n*1)static void
sqrt(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sqrt" operation on a matrix on the GPUstatic void
tan(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "tan" operation on a matrix on the GPUstatic void
tanh(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "tanh" operation on a matrix on the GPUstatic int
toInt(long num)
static void
transpose(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName)
Transposes the input matrix using cublasDgeamstatic void
unaryAggregate(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String output, AggregateUnaryOperator op)
Entry point to perform Unary aggregate operations on the GPU.static jcuda.Pointer
zero()
Convenience method to get a pointer to value '0.0f' on device.
-
-
-
Field Detail
-
cudaSupportFunctions
public static CudaSupportFunctions cudaSupportFunctions
-
sizeOfDataType
public static int sizeOfDataType
-
customKernelSuffix
public static String customKernelSuffix
-
-
Method Detail
-
resetFloatingPointPrecision
public static void resetFloatingPointPrecision()
Sets the internal state based on the DMLScript.DATA_TYPE
-
isInSparseFormat
public static boolean isInSparseFormat(GPUContext gCtx, MatrixObject mo)
-
getNnz
public static long getNnz(GPUContext gCtx, String instName, MatrixObject mo, boolean recomputeDenseNNZ)
Note: if the matrix is in dense format, it explicitly re-computes the number of nonzeros.- Parameters:
gCtx
- a valid GPU contextinstName
- instruction namemo
- matrix objectrecomputeDenseNNZ
- recompute NNZ if dense- Returns:
- number of non-zeroes
-
getCudaKernels
public static JCudaKernels getCudaKernels(GPUContext gCtx) throws DMLRuntimeException
- Throws:
DMLRuntimeException
-
double2float
public static jcuda.Pointer double2float(GPUContext gCtx, jcuda.Pointer A, jcuda.Pointer ret, int numElems)
-
float2double
public static jcuda.Pointer float2double(GPUContext gCtx, jcuda.Pointer A, jcuda.Pointer ret, int numElems)
-
one
public static jcuda.Pointer one()
Convenience method to get a pointer to value '1.0' on device. Instead of allocating and deallocating it for every kernel invocation.- Returns:
- jcuda pointer
-
zero
public static jcuda.Pointer zero()
Convenience method to get a pointer to value '0.0f' on device. Instead of allocating and deallocating it for every kernel invocation.- Returns:
- jcuda pointer
-
getDensePointer
public static jcuda.Pointer getDensePointer(GPUContext gCtx, MatrixObject input, String instName) throws DMLRuntimeException
Convenience method to get jcudaDenseMatrixPtr. This method explicitly converts sparse to dense format, so use it judiciously.- Parameters:
gCtx
- a validGPUContext
input
- input matrix objectinstName
- the invoking instruction's name for recordStatistics
.- Returns:
- jcuda pointer
- Throws:
DMLRuntimeException
-
reluBackward
public static void reluBackward(GPUContext gCtx, String instName, MatrixObject input, MatrixObject dout, MatrixObject outputBlock)
This method computes the backpropagation errors for previous layer of relu operation- Parameters:
gCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.input
- input imagedout
- next layer error propogationoutputBlock
- output
-
channelSums
public static void channelSums(GPUContext gCtx, String instName, MatrixObject input, MatrixObject outputBlock, long C, long HW)
Perform channel_sums operations: out = rowSums(matrix(colSums(A), rows=C, cols=HW))- Parameters:
gCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.input
- input imageoutputBlock
- outputC
- number of channelsHW
- height*width
-
biasMultiply
public static void biasMultiply(GPUContext gCtx, String instName, MatrixObject input, MatrixObject bias, MatrixObject outputBlock)
Performs the operation corresponding to the DML script: ones = matrix(1, rows=1, cols=Hout*Wout) output = input * matrix(bias %*% ones, rows=1, cols=F*Hout*Wout) This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in function- Parameters:
gCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.input
- input imagebias
- biasoutputBlock
- output
-
biasAdd
public static void biasAdd(GPUContext gCtx, String instName, MatrixObject input, MatrixObject bias, MatrixObject outputBlock)
Performs the operation corresponding to the DML script: ones = matrix(1, rows=1, cols=Hout*Wout) output = input + matrix(bias %*% ones, rows=1, cols=F*Hout*Wout) This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in function- Parameters:
gCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.input
- input imagebias
- biasoutputBlock
- output
-
matmultTSMM
public static void matmultTSMM(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject left, String outputName, boolean isLeftTransposed)
Performs tsmm, A %*% A' or A' %*% A, on GPU by exploiting cublasDsyrk(...)Memory Usage - If dense, input space - rows * cols, no intermediate memory, output - Max(rows*rows, cols*cols) If sparse, calls matmult
- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.left
- input matrix, as in a tsmm expression like A %*% A' or A' %*% A, we just need to check whether the left one is transposed or not, I named it 'left'outputName
- output matrix nameisLeftTransposed
- if true, left transposed
-
unaryAggregate
public static void unaryAggregate(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String output, AggregateUnaryOperator op)
Entry point to perform Unary aggregate operations on the GPU. The execution context object is used to allocate memory for the GPU.- Parameters:
ec
- Instance ofExecutionContext
, from which the output variable will be allocatedgCtx
- a validGPUContext
instName
- name of the invoking instruction to recordStatistics
.in1
- input matrixoutput
- output matrix/scalar nameop
- Instance ofAggregateUnaryOperator
which encapsulates the direction of reduction/aggregation and the reduction operation.
-
matrixScalarRelational
public static void matrixScalarRelational(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName, ScalarOperator op)
Entry point to perform elementwise matrix-scalar relational operation specified by op- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in
- input matrixoutputName
- output matrix nameop
- scalar operator
-
matrixScalarArithmetic
public static void matrixScalarArithmetic(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName, boolean isInputTransposed, ScalarOperator op)
Entry point to perform elementwise matrix-scalar arithmetic operation specified by op- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in
- input matrixoutputName
- output matrix nameisInputTransposed
- true if input transposedop
- scalar operator
-
matrixMatrixRelational
public static void matrixMatrixRelational(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName, BinaryOperator op)
Performs elementwise operation relational specified by op of two input matrices in1 and in2- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrix 1in2
- input matrix 2outputName
- output matrix nameop
- binary operator
-
matrixMatrixArithmetic
public static void matrixMatrixArithmetic(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName, boolean isLeftTransposed, boolean isRightTransposed, BinaryOperator op)
Performs elementwise arithmetic operation specified by op of two input matrices in1 and in2- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrix 1in2
- input matrix 2outputName
- output matrix nameisLeftTransposed
- true if left-transposedisRightTransposed
- true if right-transposedop
- binary operator
-
matrixScalarOp
public static void matrixScalarOp(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName, boolean isInputTransposed, ScalarOperator op)
Utility to do matrix-scalar operation kernel- Parameters:
gCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.ec
- execution contextin
- input matrixoutputName
- output variable nameisInputTransposed
- true if input is transposedop
- operator
-
deviceCopy
public static void deviceCopy(String instName, jcuda.Pointer src, jcuda.Pointer dest, int rlen, int clen)
Performs a deep copy of input device double pointer corresponding to matrix- Parameters:
instName
- the invoking instruction's name for recordStatistics
.src
- source matrixdest
- destination matrixrlen
- number of rowsclen
- number of columns
-
denseTranspose
public static void denseTranspose(ExecutionContext ec, GPUContext gCtx, String instName, jcuda.Pointer A, jcuda.Pointer C, long numRowsA, long numColsA) throws DMLRuntimeException
Computes C = t(A)- Parameters:
ec
- execution contextgCtx
- gpu contextinstName
- name of the instructionA
- pointer to the input matrixC
- pointer to the output matrixnumRowsA
- number of rows of the input matrixnumColsA
- number of columns of the output matrix- Throws:
DMLRuntimeException
- if error
-
transpose
public static void transpose(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName)
Transposes the input matrix using cublasDgeam- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in
- input matrixoutputName
- output matrix name
-
toInt
public static int toInt(long num)
-
sliceOperations
public static void sliceOperations(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, IndexRange ixrange, String outputName)
Method to perform rightIndex operation for a given lower and upper bounds in row and column dimensions.- Parameters:
ec
- current execution contextgCtx
- current gpu contextinstName
- name of the instruction for maintaining statisticsin1
- input matrix objectixrange
- index range (0-based)outputName
- output matrix object
-
cbind
public static void cbind(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName)
-
rbind
public static void rbind(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName)
-
exp
public static void exp(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "exp" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
sqrt
public static void sqrt(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sqrt" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
round
public static void round(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "round" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
abs
public static void abs(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "abs" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
log
public static void log(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "log" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
floor
public static void floor(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "floor" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
ceil
public static void ceil(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "ceil" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
sin
public static void sin(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sin" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
cos
public static void cos(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "cos" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
tan
public static void tan(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "tan" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
sinh
public static void sinh(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sinh" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
cosh
public static void cosh(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "cosh" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
tanh
public static void tanh(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "tanh" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
asin
public static void asin(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "asin" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
acos
public static void acos(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "acos" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
atan
public static void atan(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "atan" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
sign
public static void sign(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sign" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
sigmoid
public static void sigmoid(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)
Performs an "sigmoid" operation on a matrix on the GPU- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrixoutputName
- output matrix name
-
cumulativeScan
public static void cumulativeScan(ExecutionContext ec, GPUContext gCtx, String instName, String kernelFunction, MatrixObject in, String outputName)
Cumulative scan- Parameters:
ec
- valid execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.kernelFunction
- The name of the cuda kernel to callin
- input matrixoutputName
- output matrix name
-
cumulativeSumProduct
public static void cumulativeSumProduct(ExecutionContext ec, GPUContext gCtx, String instName, String kernelFunction, MatrixObject in, String outputName)
Cumulative sum-product kernel cascade invokation- Parameters:
ec
- valid execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.kernelFunction
- The name of the cuda kernel to callin
- input matrixoutputName
- output matrix name
-
axpy
public static void axpy(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName, double constant)
Performs daxpy operation- Parameters:
ec
- execution contextgCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrix 1in2
- input matrix 2outputName
- output matrix nameconstant
- pointer constant
-
solve
public static void solve(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, MatrixObject in2, String outputName)
Implements the "solve" function for systemds Ax = B (A is of size m*n, B is of size m*1, x is of size n*1)- Parameters:
ec
- a validExecutionContext
gCtx
- a validGPUContext
instName
- the invoking instruction's name for recordStatistics
.in1
- input matrix Ain2
- input matrix BoutputName
- name of the output matrix
-
getDenseMatrixOutputForGPUInstruction
public static MatrixObject getDenseMatrixOutputForGPUInstruction(ExecutionContext ec, String instName, String name, long numRows, long numCols)
Helper method to get the output block (allocated on the GPU) Also records performance information intoStatistics
- Parameters:
ec
- activeExecutionContext
instName
- the invoking instruction's name for recordStatistics
.name
- name of input matrix (that theExecutionContext
is aware of)numRows
- number of rows of output matrix objectnumCols
- number of columns of output matrix object- Returns:
- the matrix object
-
getDenseMatrixOutputForGPUInstruction
public static MatrixObject getDenseMatrixOutputForGPUInstruction(ExecutionContext ec, String instName, String name, long numRows, long numCols, boolean initialize)
-
computeNNZ
public static int computeNNZ(GPUContext gCtx, jcuda.Pointer densePtr, int length)
Utility to compute number of non-zeroes on the GPU- Parameters:
gCtx
- the associated GPUContextdensePtr
- device pointer to the dense matrixlength
- length of the dense pointer- Returns:
- the number of non-zeroes
-
-