Class GPUContext
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.gpu.context.GPUContext
-
public class GPUContext extends Object
Represents a context per GPU accessible through the same JVM. Each context holds cublas, cusparse, cudnn... handles which are separate for each GPU.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description jcuda.Pointer
allocate(String instructionName, long size)
Default behavior for gpu memory allocation (init to zero)jcuda.Pointer
allocate(String instructionName, long size, boolean initialize)
Invokes memory manager's malloc methodvoid
clearMemory()
Clears all memory used by thisGPUContext
.void
clearTemporaryMemory()
GPUObject
createGPUObject(MatrixObject mo)
Instantiates a newGPUObject
initialized with the givenMatrixObject
.void
cudaFreeHelper(String instructionName, jcuda.Pointer toFree, boolean eager)
Does cudaFree calls, lazily.static int
cudaGetDevice()
Returns which device is currently being used.void
destroy()
Destroys this GPUContext object.void
ensureComputeCapability()
Makes sure that GPU that SystemDS is trying to use has the minimum compute capability needed.long
getAvailableMemory()
Gets the available memory on GPU that SystemDS can use.jcuda.jcublas.cublasHandle
getCublasHandle()
Returns cublasHandle for BLAS operations on the GPU.jcuda.jcudnn.cudnnHandle
getCudnnHandle()
Returns the cudnnHandle for Deep Neural Network operations on the GPU.jcuda.jcusolver.cusolverDnHandle
getCusolverDnHandle()
Returns cusolverDnHandle for invoking solve() function on dense matrices on the GPU.jcuda.jcusparse.cusparseHandle
getCusparseHandle()
Returns cusparseHandle for certain sparse BLAS operations on the GPU.int
getDeviceNum()
Returns which device is assigned to this GPUContext instance.jcuda.runtime.cudaDeviceProp
getGPUProperties()
Gets the device properties for the active GPU (set with cudaSetDevice()).JCudaKernels
getKernels()
Returns utility class used to launch custom CUDA kernel, specific to the active GPU for this GPUContext.int
getMaxBlocks()
Gets the maximum number of blocks supported by the active cuda device.long
getMaxSharedMemory()
Gets the shared memory per block supported by the active cuda device.int
getMaxThreadsPerBlock()
Gets the maximum number of threads per block for "active" GPU.GPUMemoryManager
getMemoryManager()
int
getWarpSize()
Gets the warp size supported by the active cuda device.void
initializeThread()
Sets the device for the calling thread.void
printMemoryInfo(String opcode)
Print information of memory usage.GPUObject
shallowCopyGPUObject(GPUObject source, MatrixObject mo)
Shallow copy the given sourceGPUObject
to a newGPUObject
and assign that to the givenMatrixObject
.String
toString()
-
-
-
Method Detail
-
getMemoryManager
public GPUMemoryManager getMemoryManager()
-
cudaGetDevice
public static int cudaGetDevice()
Returns which device is currently being used.- Returns:
- the current device for the calling host thread
-
printMemoryInfo
public void printMemoryInfo(String opcode)
Print information of memory usage.- Parameters:
opcode
- opcode of caller
-
getDeviceNum
public int getDeviceNum()
Returns which device is assigned to this GPUContext instance.- Returns:
- active device assigned to this GPUContext instance
-
initializeThread
public void initializeThread()
Sets the device for the calling thread. This method must be called afterExecutionContext.getGPUContext(int)
If in a multithreaded environment like parfor, this method must be called when in the appropriate thread.
-
allocate
public jcuda.Pointer allocate(String instructionName, long size, boolean initialize)
Invokes memory manager's malloc method- Parameters:
instructionName
- name of instruction for which to record per instruction performance statistics, null if you don't want to recordsize
- size of data (in bytes) to allocateinitialize
- if cudaMemset() should be called- Returns:
- jcuda pointer
-
allocate
public jcuda.Pointer allocate(String instructionName, long size)
Default behavior for gpu memory allocation (init to zero)- Parameters:
instructionName
- Name of the instruction calling allocatesize
- size in bytes- Returns:
- jcuda pointer
-
cudaFreeHelper
public void cudaFreeHelper(String instructionName, jcuda.Pointer toFree, boolean eager)
Does cudaFree calls, lazily.- Parameters:
instructionName
- name of the instruction for which to record per instruction free time, null if you do not want to recordtoFree
-Pointer
instance to be freedeager
- true if to be done eagerly
-
getAvailableMemory
public long getAvailableMemory()
Gets the available memory on GPU that SystemDS can use.- Returns:
- the available memory in bytes
-
ensureComputeCapability
public void ensureComputeCapability()
Makes sure that GPU that SystemDS is trying to use has the minimum compute capability needed.
-
createGPUObject
public GPUObject createGPUObject(MatrixObject mo)
Instantiates a newGPUObject
initialized with the givenMatrixObject
.- Parameters:
mo
- aMatrixObject
that represents a matrix- Returns:
- a new
GPUObject
instance
-
shallowCopyGPUObject
public GPUObject shallowCopyGPUObject(GPUObject source, MatrixObject mo)
Shallow copy the given sourceGPUObject
to a newGPUObject
and assign that to the givenMatrixObject
. This copy doesn't memcopy the device memory.- Parameters:
source
- aGPUObject
which is the source of the copymo
- aMatrixObject
to associate with the newGPUObject
- Returns:
- a new
GPUObject
instance
-
getGPUProperties
public jcuda.runtime.cudaDeviceProp getGPUProperties()
Gets the device properties for the active GPU (set with cudaSetDevice()).- Returns:
- the device properties
-
getMaxThreadsPerBlock
public int getMaxThreadsPerBlock()
Gets the maximum number of threads per block for "active" GPU.- Returns:
- the maximum number of threads per block
-
getMaxBlocks
public int getMaxBlocks()
Gets the maximum number of blocks supported by the active cuda device.- Returns:
- the maximum number of blocks supported
-
getMaxSharedMemory
public long getMaxSharedMemory()
Gets the shared memory per block supported by the active cuda device.- Returns:
- the shared memory per block
-
getWarpSize
public int getWarpSize()
Gets the warp size supported by the active cuda device.- Returns:
- the warp size
-
getCudnnHandle
public jcuda.jcudnn.cudnnHandle getCudnnHandle()
Returns the cudnnHandle for Deep Neural Network operations on the GPU.- Returns:
- cudnnHandle for current thread
-
getCublasHandle
public jcuda.jcublas.cublasHandle getCublasHandle()
Returns cublasHandle for BLAS operations on the GPU.- Returns:
- cublasHandle for current thread
-
getCusparseHandle
public jcuda.jcusparse.cusparseHandle getCusparseHandle()
Returns cusparseHandle for certain sparse BLAS operations on the GPU.- Returns:
- cusparseHandle for current thread
-
getCusolverDnHandle
public jcuda.jcusolver.cusolverDnHandle getCusolverDnHandle()
Returns cusolverDnHandle for invoking solve() function on dense matrices on the GPU.- Returns:
- cusolverDnHandle for current thread
-
getKernels
public JCudaKernels getKernels()
Returns utility class used to launch custom CUDA kernel, specific to the active GPU for this GPUContext.- Returns:
JCudaKernels
for current thread
-
destroy
public void destroy()
Destroys this GPUContext object.
-
clearMemory
public void clearMemory()
Clears all memory used by thisGPUContext
. Be careful to ensure that no memory is currently being used in the temporary memory before invoking this. If memory is being used between MLContext invocations, they are pointed to by aGPUObject
instance which would be part of theMatrixObject
. The cleanup of thatMatrixObject
instance will cause the memory associated with that block on the GPU to be freed up.
-
clearTemporaryMemory
public void clearTemporaryMemory()
-
-