Class SparkUtils
- java.lang.Object
- 
- org.apache.sysds.runtime.instructions.spark.utils.SparkUtils
 
- 
 public class SparkUtils extends Object 
- 
- 
Field SummaryFields Modifier and Type Field Description static org.apache.spark.storage.StorageLevelDEFAULT_TMP
 - 
Constructor SummaryConstructors Constructor Description SparkUtils()
 - 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell>cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)static voidcheckSparsity(String varname, ExecutionContext ec)static DataCharacteristicscomputeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)Utility to compute dimensions and non-zeros in a given RDD of binary cells.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep)Creates a partitioning-preserving copy of the input matrix RDD.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in)Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep)Creates a partitioning-preserving copy of the input tensor RDD.static List<scala.Tuple2<Long,FrameBlock>>fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in)static scala.Tuple2<Long,FrameBlock>fromIndexedFrameBlock(Pair<Long,FrameBlock> in)static List<scala.Tuple2<MatrixIndexes,MatrixBlock>>fromIndexedMatrixBlock(List<IndexedMatrixValue> in)static scala.Tuple2<MatrixIndexes,MatrixBlock>fromIndexedMatrixBlock(IndexedMatrixValue in)static List<Pair<MatrixIndexes,MatrixBlock>>fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in)static Pair<MatrixIndexes,MatrixBlock>fromIndexedMatrixBlockToPair(IndexedMatrixValue in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc)Creates an RDD of empty blocks according to the given matrix characteristics.static longgetNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input)static longgetNonZeros(MatrixObject mo)static intgetNumPreferredPartitions(DataCharacteristics dc)static intgetNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks)static intgetNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in)static StringgetPrefixFromSparkDebugInfo(String line)static StringgetStartLineFromSparkDebugInfo(String line)static booleanisHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in)Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner.static voidpostprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut)static Pair<Long,FrameBlock>toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in)static List<Pair<Long,Long>>toIndexedLong(List<scala.Tuple2<Long,Long>> in)static IndexedMatrixValuetoIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb)static IndexedMatrixValuetoIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in)static IndexedTensorBlocktoIndexedTensorBlock(TensorIndexes ix, TensorBlock mb)static IndexedTensorBlocktoIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in)
 
- 
- 
- 
Method Detail- 
toIndexedMatrixBlockpublic static IndexedMatrixValue toIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in) 
 - 
toIndexedMatrixBlockpublic static IndexedMatrixValue toIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb) 
 - 
toIndexedTensorBlockpublic static IndexedTensorBlock toIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in) 
 - 
toIndexedTensorBlockpublic static IndexedTensorBlock toIndexedTensorBlock(TensorIndexes ix, TensorBlock mb) 
 - 
fromIndexedMatrixBlockpublic static scala.Tuple2<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlock(IndexedMatrixValue in) 
 - 
fromIndexedMatrixBlockpublic static List<scala.Tuple2<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlock(List<IndexedMatrixValue> in) 
 - 
fromIndexedMatrixBlockToPairpublic static Pair<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlockToPair(IndexedMatrixValue in) 
 - 
fromIndexedMatrixBlockToPairpublic static List<Pair<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in) 
 - 
fromIndexedFrameBlockpublic static scala.Tuple2<Long,FrameBlock> fromIndexedFrameBlock(Pair<Long,FrameBlock> in) 
 - 
fromIndexedFrameBlockpublic static List<scala.Tuple2<Long,FrameBlock>> fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in) 
 - 
toIndexedLongpublic static List<Pair<Long,Long>> toIndexedLong(List<scala.Tuple2<Long,Long>> in) 
 - 
toIndexedFrameBlockpublic static Pair<Long,FrameBlock> toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in) 
 - 
isHashPartitionedpublic static boolean isHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in) Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner.- Parameters:
- in- input JavaPairRDD
- Returns:
- true if input is hash partitioned
 
 - 
getNumPreferredPartitionspublic static int getNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in) 
 - 
getNumPreferredPartitionspublic static int getNumPreferredPartitions(DataCharacteristics dc) 
 - 
getNumPreferredPartitionspublic static int getNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks) 
 - 
copyBinaryBlockMatrixpublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in) Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.- Parameters:
- in- matrix as- JavaPairRDD<MatrixIndexes,MatrixBlock>
- Returns:
- matrix as JavaPairRDD<MatrixIndexes,MatrixBlock>
 
 - 
copyBinaryBlockMatrixpublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep) Creates a partitioning-preserving copy of the input matrix RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
- in- matrix as- JavaPairRDD<MatrixIndexes,MatrixBlock>
- deep- if true, perform deep copy
- Returns:
- matrix as JavaPairRDD<MatrixIndexes,MatrixBlock>
 
 - 
copyBinaryBlockTensorpublic static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in) Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.- Parameters:
- in- tensor as- JavaPairRDD<TensorIndexes,HomogTensor>
- Returns:
- tensor as JavaPairRDD<TensorIndexes,HomogTensor>
 
 - 
copyBinaryBlockTensorpublic static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep) Creates a partitioning-preserving copy of the input tensor RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
- in- tensor as- JavaPairRDD<TensorIndexes,HomogTensor>
- deep- if true, perform deep copy
- Returns:
- tensor as JavaPairRDD<TensorIndexes,HomogTensor>
 
 - 
checkSparsitypublic static void checkSparsity(String varname, ExecutionContext ec) 
 - 
getEmptyBlockRDDpublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc) Creates an RDD of empty blocks according to the given matrix characteristics. This is done in a scalable manner by parallelizing block ranges and generating empty blocks in a distributed manner, under awareness of preferred output partition sizes.- Parameters:
- sc- spark context
- mc- matrix characteristics
- Returns:
- pair rdd of empty matrix blocks
 
 - 
cacheBinaryCellRDDpublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input) 
 - 
computeDataCharacteristicspublic static DataCharacteristics computeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input) Utility to compute dimensions and non-zeros in a given RDD of binary cells.- Parameters:
- input- matrix as- JavaPairRDD<MatrixIndexes, MatrixCell>
- Returns:
- matrix characteristics
 
 - 
getNonZerospublic static long getNonZeros(MatrixObject mo) 
 - 
getNonZerospublic static long getNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input) 
 - 
postprocessUltraSparseOutputpublic static void postprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut) 
 
- 
 
-