Class SparkExecutionContext
- java.lang.Object
- 
- org.apache.sysds.runtime.controlprogram.context.ExecutionContext
- 
- org.apache.sysds.runtime.controlprogram.context.SparkExecutionContext
 
 
- 
 public class SparkExecutionContext extends ExecutionContext 
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classSparkExecutionContext.SparkClusterConfigCaptures relevant spark cluster configuration properties, e.g., memory budgets and degree of parallelism.
 - 
Field SummaryFields Modifier and Type Field Description static booleanFAIR_SCHEDULER_MODE
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddLineage(String varParent, String varChild, boolean broadcast)voidaddLineageBroadcast(String varParent, String varChild)Adds a child broadcast object to the lineage of a parent rdd.voidaddLineageRDD(String varParent, String varChild)Adds a child rdd object to the lineage of a parent rdd.org.apache.spark.broadcast.Broadcast<CacheBlock>broadcastVariable(CacheableData<CacheBlock> cd)voidcacheMatrixObject(String var)static voidcleanupBroadcastVariable(org.apache.spark.broadcast.Broadcast<?> bvar)This call destroys a broadcast variable at all executors and the driver.voidcleanupCacheableData(CacheableData<?> mo)static voidcleanupRDDVariable(org.apache.spark.api.java.JavaPairRDD<?,?> rvar)This call removes an rdd variable from executor memory and disk if required.voidcleanupThreadLocalSchedulerPool(int pool)voidclose()static org.apache.spark.SparkConfcreateSystemDSSparkConf()Sets up a SystemDS-preferred Spark configuration based on the implicit default configuration (as passed via configurations from outside).org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>getBinaryMatrixBlockRDDHandleForVariable(String varname)Spark instructions should call this for all matrix inputs except broadcast variables.org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>getBinaryMatrixBlockRDDHandleForVariable(String varname, int numParts, boolean inclEmpty)org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock>getBinaryTensorBlockRDDHandleForVariable(String varname)Spark instructions should call this for all tensor inputs except broadcast variables.org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock>getBinaryTensorBlockRDDHandleForVariable(String varname, int numParts, boolean inclEmpty)PartitionedBroadcast<FrameBlock>getBroadcastForFrameVariable(String varname)PartitionedBroadcast<MatrixBlock>getBroadcastForMatrixObject(MatrixObject mo)PartitionedBroadcast<TensorBlock>getBroadcastForTensorObject(TensorObject to)PartitionedBroadcast<TensorBlock>getBroadcastForTensorVariable(String varname)PartitionedBroadcast<MatrixBlock>getBroadcastForVariable(String varname)static doublegetBroadcastMemoryBudget()Obtains the available memory budget for broadcast variables in bytes.static doublegetDataMemoryBudget(boolean min, boolean refresh)Obtain the available memory budget for data storage in bytes.static intgetDefaultParallelism(boolean refresh)Obtain the default degree of parallelism (cores in the cluster).org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>getFrameBinaryBlockRDDHandleForVariable(String varname)Spark instructions should call this for all frame inputs except broadcast variables.static intgetNumExecutors()Obtain the number of executors in the cluster (excluding the driver).org.apache.spark.api.java.JavaPairRDD<?,?>getRDDHandleForFrameObject(FrameObject fo, Types.FileFormat fmt)FIXME: currently this implementation assumes matrix representations but frame signature in order to support the old transform implementation.org.apache.spark.api.java.JavaPairRDD<?,?>getRDDHandleForMatrixObject(MatrixObject mo, Types.FileFormat fmt)org.apache.spark.api.java.JavaPairRDD<?,?>getRDDHandleForMatrixObject(MatrixObject mo, Types.FileFormat fmt, int numParts, boolean inclEmpty)org.apache.spark.api.java.JavaPairRDD<?,?>getRDDHandleForTensorObject(TensorObject to, Types.FileFormat fmt, int numParts, boolean inclEmpty)org.apache.spark.api.java.JavaPairRDD<?,?>getRDDHandleForVariable(String varname, Types.FileFormat fmt, int numParts, boolean inclEmpty)static SparkExecutionContext.SparkClusterConfiggetSparkClusterConfig()Obtains the lazily analyzed spark cluster configuration.org.apache.spark.api.java.JavaSparkContextgetSparkContext()Returns the used singleton spark context.static org.apache.spark.api.java.JavaSparkContextgetSparkContextStatic()static booleanisLazySparkContextCreation()static booleanisLocalMaster()booleanisRDDCached(int rddID)static booleanisSparkContextCreated()Indicates if the spark context has been created or has been passed in from outside.voidrepartitionAndCacheMatrixObject(String var)static voidresetSparkContextStatic()voidsetBroadcastHandle(MatrixObject mo)voidsetRDDHandleForVariable(String varname, org.apache.spark.api.java.JavaPairRDD<?,?> rdd)Keep the output rdd of spark rdd operations as meta data of matrix/frame objects in the symbol table.intsetThreadLocalSchedulerPool()static FrameBlocktoFrameBlock(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> rdd, Types.ValueType[] schema, int rlen, int clen)static FrameBlocktoFrameBlock(RDDObject rdd, Types.ValueType[] schema, int rlen, int clen)static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>toFrameJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, FrameBlock src)static MatrixBlocktoMatrixBlock(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> rdd, int rlen, int clen, int blen, long nnz)Utility method for creating a single matrix block out of a binary block RDD.static MatrixBlocktoMatrixBlock(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> rdd, int rlen, int clen, long nnz)Utility method for creating a single matrix block out of a binary cell RDD.static MatrixBlocktoMatrixBlock(RDDObject rdd, int rlen, int clen, int blen, long nnz)This method is a generic abstraction for calls from the buffer pool.static MatrixBlocktoMatrixBlock(RDDObject rdd, int rlen, int clen, long nnz)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>toMatrixJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, MatrixBlock src, int blen)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>toMatrixJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, MatrixBlock src, int blen, int numParts, boolean inclEmpty)static PartitionedBlock<MatrixBlock>toPartitionedMatrixBlock(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> rdd, int rlen, int clen, int blen, long nnz)static TensorBlocktoTensorBlock(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> rdd, DataCharacteristics dc)static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock>toTensorJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, TensorBlock src, int blen)static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock>toTensorJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, TensorBlock src, int blen, int numParts, boolean inclEmpty)static voidwriteFrameRDDtoHDFS(RDDObject rdd, String path, Types.FileFormat fmt)static longwriteMatrixRDDtoHDFS(RDDObject rdd, String path, Types.FileFormat fmt)- 
Methods inherited from class org.apache.sysds.runtime.controlprogram.context.ExecutionContextaddTmpParforFunction, allocateGPUMatrixObject, cleanupDataObject, containsVariable, containsVariable, createCacheableData, createFrameObject, createFrameObject, createMatrixObject, createMatrixObject, getCacheableData, getCacheableData, getDataCharacteristics, getDenseMatrixOutputForGPUInstruction, getDenseMatrixOutputForGPUInstruction, getFrameInput, getFrameObject, getFrameObject, getGPUContext, getGPUContexts, getGPUDensePointerAddress, getGPUSparsePointerAddress, getLineage, getLineageItem, getLineageItem, getListObject, getListObject, getMatrixInput, getMatrixInput, getMatrixInputForGPUInstruction, getMatrixInputs, getMatrixInputs, getMatrixLineagePair, getMatrixLineagePair, getMatrixObject, getMatrixObject, getMetaData, getNumGPUContexts, getOrCreateLineageItem, getProgram, getScalarInput, getScalarInput, getScalarInputs, getSealClient, getSparseMatrixOutputForGPUInstruction, getSparseMatrixOutputForGPUInstruction, getTensorInput, getTensorObject, getTID, getTmpParforFunctions, getVariable, getVariable, getVariables, getVarList, getVarListPartitioned, isAutoCreateVars, isFederated, isFederated, isFrameObject, isMatrixObject, maintainLineageDebuggerInfo, pinVariables, releaseCacheableData, releaseFrameInput, releaseMatrixInput, releaseMatrixInput, releaseMatrixInputForGPUInstruction, releaseMatrixInputs, releaseMatrixInputs, releaseMatrixOutputForGPUInstruction, releaseTensorInput, releaseTensorInput, removeVariable, setAutoCreateVars, setFrameOutput, setGPUContexts, setLineage, setMatrixOutput, setMatrixOutput, setMatrixOutputAndLineage, setMetaData, setMetaData, setProgram, setScalarOutput, setSealClient, setTensorOutput, setTID, setVariable, setVariables, toString, traceLineage, unpinVariables
 
- 
 
- 
- 
- 
Field Detail- 
FAIR_SCHEDULER_MODEpublic static final boolean FAIR_SCHEDULER_MODE - See Also:
- Constant Field Values
 
 
- 
 - 
Method Detail- 
getSparkContextpublic org.apache.spark.api.java.JavaSparkContext getSparkContext() Returns the used singleton spark context. In case of lazy spark context creation, this methods blocks until the spark context is created.- Returns:
- java spark context
 
 - 
getSparkContextStaticpublic static org.apache.spark.api.java.JavaSparkContext getSparkContextStatic() 
 - 
isSparkContextCreatedpublic static boolean isSparkContextCreated() Indicates if the spark context has been created or has been passed in from outside.- Returns:
- true if spark context created
 
 - 
resetSparkContextStaticpublic static void resetSparkContextStatic() 
 - 
closepublic void close() 
 - 
isLazySparkContextCreationpublic static boolean isLazySparkContextCreation() 
 - 
createSystemDSSparkConfpublic static org.apache.spark.SparkConf createSystemDSSparkConf() Sets up a SystemDS-preferred Spark configuration based on the implicit default configuration (as passed via configurations from outside).- Returns:
- spark configuration
 
 - 
isLocalMasterpublic static boolean isLocalMaster() 
 - 
getBinaryMatrixBlockRDDHandleForVariablepublic org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> getBinaryMatrixBlockRDDHandleForVariable(String varname) Spark instructions should call this for all matrix inputs except broadcast variables.- Parameters:
- varname- variable name
- Returns:
- JavaPairRDD of MatrixIndexes-MatrixBlocks
 
 - 
getBinaryMatrixBlockRDDHandleForVariablepublic org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> getBinaryMatrixBlockRDDHandleForVariable(String varname, int numParts, boolean inclEmpty) 
 - 
getBinaryTensorBlockRDDHandleForVariablepublic org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> getBinaryTensorBlockRDDHandleForVariable(String varname) Spark instructions should call this for all tensor inputs except broadcast variables.- Parameters:
- varname- variable name
- Returns:
- JavaPairRDD of TensorIndexes-HomogTensors
 
 - 
getBinaryTensorBlockRDDHandleForVariablepublic org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> getBinaryTensorBlockRDDHandleForVariable(String varname, int numParts, boolean inclEmpty) 
 - 
getFrameBinaryBlockRDDHandleForVariablepublic org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> getFrameBinaryBlockRDDHandleForVariable(String varname) Spark instructions should call this for all frame inputs except broadcast variables.- Parameters:
- varname- variable name
- Returns:
- JavaPairRDD of Longs-FrameBlocks
 
 - 
getRDDHandleForVariablepublic org.apache.spark.api.java.JavaPairRDD<?,?> getRDDHandleForVariable(String varname, Types.FileFormat fmt, int numParts, boolean inclEmpty) 
 - 
getRDDHandleForMatrixObjectpublic org.apache.spark.api.java.JavaPairRDD<?,?> getRDDHandleForMatrixObject(MatrixObject mo, Types.FileFormat fmt) 
 - 
getRDDHandleForMatrixObjectpublic org.apache.spark.api.java.JavaPairRDD<?,?> getRDDHandleForMatrixObject(MatrixObject mo, Types.FileFormat fmt, int numParts, boolean inclEmpty) 
 - 
getRDDHandleForTensorObjectpublic org.apache.spark.api.java.JavaPairRDD<?,?> getRDDHandleForTensorObject(TensorObject to, Types.FileFormat fmt, int numParts, boolean inclEmpty) 
 - 
getRDDHandleForFrameObjectpublic org.apache.spark.api.java.JavaPairRDD<?,?> getRDDHandleForFrameObject(FrameObject fo, Types.FileFormat fmt) FIXME: currently this implementation assumes matrix representations but frame signature in order to support the old transform implementation.- Parameters:
- fo- frame object
- fmt- file format type
- Returns:
- JavaPairRDD handle for a frame object
 
 - 
broadcastVariablepublic org.apache.spark.broadcast.Broadcast<CacheBlock> broadcastVariable(CacheableData<CacheBlock> cd) 
 - 
getBroadcastForMatrixObjectpublic PartitionedBroadcast<MatrixBlock> getBroadcastForMatrixObject(MatrixObject mo) 
 - 
setBroadcastHandlepublic void setBroadcastHandle(MatrixObject mo) 
 - 
getBroadcastForTensorObjectpublic PartitionedBroadcast<TensorBlock> getBroadcastForTensorObject(TensorObject to) 
 - 
getBroadcastForVariablepublic PartitionedBroadcast<MatrixBlock> getBroadcastForVariable(String varname) 
 - 
getBroadcastForTensorVariablepublic PartitionedBroadcast<TensorBlock> getBroadcastForTensorVariable(String varname) 
 - 
getBroadcastForFrameVariablepublic PartitionedBroadcast<FrameBlock> getBroadcastForFrameVariable(String varname) 
 - 
setRDDHandleForVariablepublic void setRDDHandleForVariable(String varname, org.apache.spark.api.java.JavaPairRDD<?,?> rdd) Keep the output rdd of spark rdd operations as meta data of matrix/frame objects in the symbol table.- Parameters:
- varname- variable name
- rdd- JavaPairRDD handle for variable
 
 - 
toMatrixJavaPairRDDpublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> toMatrixJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, MatrixBlock src, int blen) 
 - 
toMatrixJavaPairRDDpublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> toMatrixJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, MatrixBlock src, int blen, int numParts, boolean inclEmpty) 
 - 
toTensorJavaPairRDDpublic static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> toTensorJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, TensorBlock src, int blen) 
 - 
toTensorJavaPairRDDpublic static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> toTensorJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, TensorBlock src, int blen, int numParts, boolean inclEmpty) 
 - 
toFrameJavaPairRDDpublic static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> toFrameJavaPairRDD(org.apache.spark.api.java.JavaSparkContext sc, FrameBlock src) 
 - 
toMatrixBlockpublic static MatrixBlock toMatrixBlock(RDDObject rdd, int rlen, int clen, int blen, long nnz) This method is a generic abstraction for calls from the buffer pool.- Parameters:
- rdd- rdd object
- rlen- number of rows
- clen- number of columns
- blen- block length
- nnz- number of non-zeros
- Returns:
- matrix block
 
 - 
toMatrixBlockpublic static MatrixBlock toMatrixBlock(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> rdd, int rlen, int clen, int blen, long nnz) Utility method for creating a single matrix block out of a binary block RDD. Note that this collect call might trigger execution of any pending transformations. NOTE: This is an unguarded utility function, which requires memory for both the output matrix and its collected, blocked representation.- Parameters:
- rdd- JavaPairRDD for matrix block
- rlen- number of rows
- clen- number of columns
- blen- block length
- nnz- number of non-zeros
- Returns:
- Local matrix block
 
 - 
toMatrixBlockpublic static MatrixBlock toMatrixBlock(RDDObject rdd, int rlen, int clen, long nnz) 
 - 
toMatrixBlockpublic static MatrixBlock toMatrixBlock(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> rdd, int rlen, int clen, long nnz) Utility method for creating a single matrix block out of a binary cell RDD. Note that this collect call might trigger execution of any pending transformations.- Parameters:
- rdd- JavaPairRDD for matrix block
- rlen- number of rows
- clen- number of columns
- nnz- number of non-zeros
- Returns:
- matrix block
 
 - 
toTensorBlockpublic static TensorBlock toTensorBlock(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> rdd, DataCharacteristics dc) 
 - 
toPartitionedMatrixBlockpublic static PartitionedBlock<MatrixBlock> toPartitionedMatrixBlock(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> rdd, int rlen, int clen, int blen, long nnz) 
 - 
toFrameBlockpublic static FrameBlock toFrameBlock(RDDObject rdd, Types.ValueType[] schema, int rlen, int clen) 
 - 
toFrameBlockpublic static FrameBlock toFrameBlock(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> rdd, Types.ValueType[] schema, int rlen, int clen) 
 - 
writeMatrixRDDtoHDFSpublic static long writeMatrixRDDtoHDFS(RDDObject rdd, String path, Types.FileFormat fmt) 
 - 
writeFrameRDDtoHDFSpublic static void writeFrameRDDtoHDFS(RDDObject rdd, String path, Types.FileFormat fmt) 
 - 
addLineageRDDpublic void addLineageRDD(String varParent, String varChild) Adds a child rdd object to the lineage of a parent rdd.- Parameters:
- varParent- parent variable
- varChild- child variable
 
 - 
addLineageBroadcastpublic void addLineageBroadcast(String varParent, String varChild) Adds a child broadcast object to the lineage of a parent rdd.- Parameters:
- varParent- parent variable
- varChild- child variable
 
 - 
cleanupCacheableDatapublic void cleanupCacheableData(CacheableData<?> mo) - Overrides:
- cleanupCacheableDatain class- ExecutionContext
 
 - 
cleanupBroadcastVariablepublic static void cleanupBroadcastVariable(org.apache.spark.broadcast.Broadcast<?> bvar) This call destroys a broadcast variable at all executors and the driver. Hence, it is intended to be used on rmvar only. Depending on the ASYNCHRONOUS_VAR_DESTROY configuration, this is asynchronous or not.- Parameters:
- bvar- broadcast variable
 
 - 
cleanupRDDVariablepublic static void cleanupRDDVariable(org.apache.spark.api.java.JavaPairRDD<?,?> rvar) This call removes an rdd variable from executor memory and disk if required. Hence, it is intended to be used on rmvar only. Depending on the ASYNCHRONOUS_VAR_DESTROY configuration, this is asynchronous or not.- Parameters:
- rvar- rdd variable to remove
 
 - 
repartitionAndCacheMatrixObjectpublic void repartitionAndCacheMatrixObject(String var) 
 - 
cacheMatrixObjectpublic void cacheMatrixObject(String var) 
 - 
setThreadLocalSchedulerPoolpublic int setThreadLocalSchedulerPool() 
 - 
cleanupThreadLocalSchedulerPoolpublic void cleanupThreadLocalSchedulerPool(int pool) 
 - 
isRDDCachedpublic boolean isRDDCached(int rddID) 
 - 
getSparkClusterConfigpublic static SparkExecutionContext.SparkClusterConfig getSparkClusterConfig() Obtains the lazily analyzed spark cluster configuration.- Returns:
- spark cluster configuration
 
 - 
getBroadcastMemoryBudgetpublic static double getBroadcastMemoryBudget() Obtains the available memory budget for broadcast variables in bytes.- Returns:
- broadcast memory budget
 
 - 
getDataMemoryBudgetpublic static double getDataMemoryBudget(boolean min, boolean refresh)Obtain the available memory budget for data storage in bytes.- Parameters:
- min- flag for minimum data budget
- refresh- flag for refresh with spark context
- Returns:
- data memory budget
 
 - 
getNumExecutorspublic static int getNumExecutors() Obtain the number of executors in the cluster (excluding the driver).- Returns:
- number of executors
 
 - 
getDefaultParallelismpublic static int getDefaultParallelism(boolean refresh) Obtain the default degree of parallelism (cores in the cluster).- Parameters:
- refresh- flag for refresh with spark context
- Returns:
- default degree of parallelism
 
 
- 
 
-