Using SystemML with GPU
User Guide
To use SystemML on GPUs, please ensure that CUDA 9 and CuDNN 7 is installed on your system.
Python users
Please install SystemML using pip:
- For released version: pip install systemml
- For bleeding edge version: pip install https://sparktc.ibmcloud.com/repo/latest/systemml-1.2.0-SNAPSHOT-python.tar.gz
Then you can use the setGPU(True)
method of MLContext and
MLLearn APIs to enable the GPU usage.
python
from systemml.mllearn import Caffe2DML
lenet = Caffe2DML(spark, solver='lenet_solver.proto', input_shape=(1, 28, 28))
lenet.setGPU(True)
To skip memory-checking and force all GPU-enabled operations on the GPU, please use the setForceGPU(True)
method after setGPU(True)
method.
python
from systemml.mllearn import Caffe2DML
lenet = Caffe2DML(spark, solver='lenet_solver.proto', input_shape=(1, 28, 28))
lenet.setGPU(True).setForceGPU(True)
Command-line users
To enable the GPU backend via command-line, please provide systemml-1.*-extra.jar
in the classpath and -gpu
flag.
spark-submit --jars systemml-1.*-extra.jar SystemML.jar -f myDML.dml -gpu
To skip memory-checking and force all GPU-enabled operations on the GPU, please provide force
option to the -gpu
flag.
spark-submit --jars systemml-1.*-extra.jar SystemML.jar -f myDML.dml -gpu force
Scala users
To enable the GPU backend via command-line, please provide systemml-1.*-extra.jar
in the classpath and use
the setGPU(True)
method of MLContext API to enable the GPU usage.
spark-shell --jars systemml-1.*-extra.jar,SystemML.jar
Troubleshooting guide
- If you have older gcc (< 5.0) and if you get
libstdc++.so.6: version CXXABI_1.3.8 not found
error, please upgrade to gcc v5+. On Centos 5, you may have to compile gcc from the source:
sudo yum install libmpc-devel mpfr-devel gmp-devel zlib-devel*
curl ftp://ftp.gnu.org/pub/gnu/gcc/gcc-5.3.0/gcc-5.3.0.tar.bz2 -O
tar xvfj gcc-5.3.0.tar.bz2
cd gcc-5.3.0
./configure --with-system-zlib --disable-multilib --enable-languages=c,c++
num_cores=`grep -c ^processor /proc/cpuinfo`
make -j $num_cores
sudo make install
Advanced Configuration
Using single precision
By default, SystemML uses double precision to store its matrices in the GPU memory. To use single precision, the user needs to set the configuration property ‘sysml.floating.point.precision’ to ‘single’. However, with exception of BLAS operations, SystemML always performs all CPU operations in double precision.
Training very deep network
Shadow buffer
To train very deep network with double precision, no additional configurations are necessary.
But to train very deep network with single precision, the user can speed up the eviction by
using shadow buffer. The fraction of the driver memory to be allocated to the shadow buffer can
be set by using the configuration property ‘sysml.gpu.eviction.shadow.bufferSize’.
In the current version, the shadow buffer is currently not guarded by SystemML
and can potentially lead to OOM if the network is deep as well as wide.
Unified memory allocator
By default, SystemML uses CUDA’s memory allocator and performs on-demand eviction using the eviction policy set by the configuration property ‘sysml.gpu.eviction.policy’. To use CUDA’s unified memory allocator that performs page-level eviction instead, please set the configuration property ‘sysml.gpu.memory.allocator’ to ‘unified_memory’.