Class SparkExecutionContext

    • Field Detail

    • Method Detail

      • getSparkContext

        public org.apache.spark.api.java.JavaSparkContext getSparkContext()
        Returns the used singleton spark context. In case of lazy spark context creation, this methods blocks until the spark context is created.
        Returns:
        java spark context
      • getSparkContextStatic

        public static org.apache.spark.api.java.JavaSparkContext getSparkContextStatic()
      • isSparkContextCreated

        public static boolean isSparkContextCreated()
        Indicates if the spark context has been created or has been passed in from outside.
        Returns:
        true if spark context created
      • resetSparkContextStatic

        public static void resetSparkContextStatic()
      • close

        public void close()
      • isLazySparkContextCreation

        public static boolean isLazySparkContextCreation()
      • handleIllegalReflectiveAccessSpark

        public static void handleIllegalReflectiveAccessSpark()
      • createSystemDSSparkConf

        public static org.apache.spark.SparkConf createSystemDSSparkConf()
        Sets up a SystemDS-preferred Spark configuration based on the implicit default configuration (as passed via configurations from outside).
        Returns:
        spark configuration
      • isLocalMaster

        public static boolean isLocalMaster()
      • getBinaryMatrixBlockRDDHandleForVariable

        public org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> getBinaryMatrixBlockRDDHandleForVariable​(String varname)
        Spark instructions should call this for all matrix inputs except broadcast variables.
        Parameters:
        varname - variable name
        Returns:
        JavaPairRDD of MatrixIndexes-MatrixBlocks
      • getBinaryMatrixBlockRDDHandleForVariable

        public org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> getBinaryMatrixBlockRDDHandleForVariable​(String varname,
                                                                                                                               int numParts,
                                                                                                                               boolean inclEmpty)
      • getBinaryTensorBlockRDDHandleForVariable

        public org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​TensorBlock> getBinaryTensorBlockRDDHandleForVariable​(String varname)
        Spark instructions should call this for all tensor inputs except broadcast variables.
        Parameters:
        varname - variable name
        Returns:
        JavaPairRDD of TensorIndexes-HomogTensors
      • getBinaryTensorBlockRDDHandleForVariable

        public org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​TensorBlock> getBinaryTensorBlockRDDHandleForVariable​(String varname,
                                                                                                                               int numParts,
                                                                                                                               boolean inclEmpty)
      • getFrameBinaryBlockRDDHandleForVariable

        public org.apache.spark.api.java.JavaPairRDD<Long,​FrameBlock> getFrameBinaryBlockRDDHandleForVariable​(String varname)
        Spark instructions should call this for all frame inputs except broadcast variables.
        Parameters:
        varname - variable name
        Returns:
        JavaPairRDD of Longs-FrameBlocks
      • getRDDHandleForVariable

        public org.apache.spark.api.java.JavaPairRDD<?,​?> getRDDHandleForVariable​(String varname,
                                                                                        Types.FileFormat fmt,
                                                                                        int numParts,
                                                                                        boolean inclEmpty)
      • getRDDHandleForMatrixObject

        public org.apache.spark.api.java.JavaPairRDD<?,​?> getRDDHandleForMatrixObject​(MatrixObject mo,
                                                                                            Types.FileFormat fmt)
      • getRDDHandleForMatrixObject

        public org.apache.spark.api.java.JavaPairRDD<?,​?> getRDDHandleForMatrixObject​(MatrixObject mo,
                                                                                            Types.FileFormat fmt,
                                                                                            int numParts,
                                                                                            boolean inclEmpty)
      • getRDDHandleForTensorObject

        public org.apache.spark.api.java.JavaPairRDD<?,​?> getRDDHandleForTensorObject​(TensorObject to,
                                                                                            Types.FileFormat fmt,
                                                                                            int numParts,
                                                                                            boolean inclEmpty)
      • getRDDHandleForFrameObject

        public org.apache.spark.api.java.JavaPairRDD<?,​?> getRDDHandleForFrameObject​(FrameObject fo,
                                                                                           Types.FileFormat fmt)
        FIXME: currently this implementation assumes matrix representations but frame signature in order to support the old transform implementation.
        Parameters:
        fo - frame object
        fmt - file format type
        Returns:
        JavaPairRDD handle for a frame object
      • setBroadcastHandle

        public void setBroadcastHandle​(MatrixObject mo)
      • setRDDHandleForVariable

        public void setRDDHandleForVariable​(String varname,
                                            org.apache.spark.api.java.JavaPairRDD<?,​?> rdd)
        Keep the output rdd of spark rdd operations as meta data of matrix/frame objects in the symbol table.
        Parameters:
        varname - variable name
        rdd - JavaPairRDD handle for variable
      • setRDDHandleForVariable

        public void setRDDHandleForVariable​(String varname,
                                            RDDObject rddhandle)
      • toMatrixJavaPairRDD

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> toMatrixJavaPairRDD​(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                                 MatrixBlock src,
                                                                                                                 int blen)
      • toMatrixJavaPairRDD

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> toMatrixJavaPairRDD​(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                                 MatrixBlock src,
                                                                                                                 int blen,
                                                                                                                 int numParts,
                                                                                                                 boolean inclEmpty)
      • toTensorJavaPairRDD

        public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​TensorBlock> toTensorJavaPairRDD​(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                                 TensorBlock src,
                                                                                                                 int blen)
      • toTensorJavaPairRDD

        public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​TensorBlock> toTensorJavaPairRDD​(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                                 TensorBlock src,
                                                                                                                 int blen,
                                                                                                                 int numParts,
                                                                                                                 boolean inclEmpty)
      • toFrameJavaPairRDD

        public static org.apache.spark.api.java.JavaPairRDD<Long,​FrameBlock> toFrameJavaPairRDD​(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                      FrameBlock src)
      • toMatrixBlock

        public static MatrixBlock toMatrixBlock​(RDDObject rdd,
                                                int rlen,
                                                int clen,
                                                int blen,
                                                long nnz)
        This method is a generic abstraction for calls from the buffer pool.
        Parameters:
        rdd - rdd object
        rlen - number of rows
        clen - number of columns
        blen - block length
        nnz - number of non-zeros
        Returns:
        matrix block
      • toMatrixBlock

        public static MatrixBlock toMatrixBlock​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> rdd,
                                                int rlen,
                                                int clen,
                                                int blen,
                                                long nnz)
        Utility method for creating a single matrix block out of a binary block RDD. Note that this collect call might trigger execution of any pending transformations. NOTE: This is an unguarded utility function, which requires memory for both the output matrix and its collected, blocked representation.
        Parameters:
        rdd - JavaPairRDD for matrix block
        rlen - number of rows
        clen - number of columns
        blen - block length
        nnz - number of non-zeros
        Returns:
        Local matrix block
      • toMatrixBlock

        public static MatrixBlock toMatrixBlock​(RDDObject rdd,
                                                int rlen,
                                                int clen,
                                                long nnz)
      • toMatrixBlock

        public static MatrixBlock toMatrixBlock​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixCell> rdd,
                                                int rlen,
                                                int clen,
                                                long nnz)
        Utility method for creating a single matrix block out of a binary cell RDD. Note that this collect call might trigger execution of any pending transformations.
        Parameters:
        rdd - JavaPairRDD for matrix block
        rlen - number of rows
        clen - number of columns
        nnz - number of non-zeros
        Returns:
        matrix block
      • addLineageRDD

        public void addLineageRDD​(String varParent,
                                  String varChild)
        Adds a child rdd object to the lineage of a parent rdd.
        Parameters:
        varParent - parent variable
        varChild - child variable
      • addLineageBroadcast

        public void addLineageBroadcast​(String varParent,
                                        String varChild)
        Adds a child broadcast object to the lineage of a parent rdd.
        Parameters:
        varParent - parent variable
        varChild - child variable
      • addLineage

        public void addLineage​(String varParent,
                               String varChild,
                               boolean broadcast)
      • cleanupBroadcastVariable

        public static void cleanupBroadcastVariable​(org.apache.spark.broadcast.Broadcast<?> bvar)
        This call destroys a broadcast variable at all executors and the driver. Hence, it is intended to be used on rmvar only. Depending on the ASYNCHRONOUS_VAR_DESTROY configuration, this is asynchronous or not.
        Parameters:
        bvar - broadcast variable
      • cleanupRDDVariable

        public static void cleanupRDDVariable​(org.apache.spark.api.java.JavaPairRDD<?,​?> rvar)
        This call removes an rdd variable from executor memory and disk if required. Hence, it is intended to be used on rmvar only. Depending on the ASYNCHRONOUS_VAR_DESTROY configuration, this is asynchronous or not.
        Parameters:
        rvar - rdd variable to remove
      • repartitionAndCacheMatrixObject

        public void repartitionAndCacheMatrixObject​(String var)
      • cacheMatrixObject

        public void cacheMatrixObject​(String var)
      • setThreadLocalSchedulerPool

        public int setThreadLocalSchedulerPool()
      • cleanupThreadLocalSchedulerPool

        public void cleanupThreadLocalSchedulerPool​(int pool)
      • isRDDCached

        public boolean isRDDCached​(int rddID)
      • getBroadcastMemoryBudget

        public static double getBroadcastMemoryBudget()
        Obtains the available memory budget for broadcast variables in bytes.
        Returns:
        broadcast memory budget
      • getDataMemoryBudget

        public static double getDataMemoryBudget​(boolean min,
                                                 boolean refresh)
        Obtain the available memory budget for data storage in bytes.
        Parameters:
        min - flag for minimum data budget
        refresh - flag for refresh with spark context
        Returns:
        data memory budget
      • getNumExecutors

        public static int getNumExecutors()
        Obtain the number of executors in the cluster (excluding the driver).
        Returns:
        number of executors
      • getDefaultParallelism

        public static int getDefaultParallelism​(boolean refresh)
        Obtain the default degree of parallelism (cores in the cluster).
        Parameters:
        refresh - flag for refresh with spark context
        Returns:
        default degree of parallelism