Class SparkUtils
- java.lang.Object
 - 
- org.apache.sysds.runtime.instructions.spark.utils.SparkUtils
 
 
- 
public class SparkUtils extends Object
 
- 
- 
Field Summary
Fields Modifier and Type Field Description static org.apache.spark.storage.StorageLevelDEFAULT_TMP 
- 
Constructor Summary
Constructors Constructor Description SparkUtils() 
- 
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell>cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)static voidcheckSparsity(String varname, ExecutionContext ec)static DataCharacteristicscomputeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)Utility to compute dimensions and non-zeros in a given RDD of binary cells.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep)Creates a partitioning-preserving copy of the input matrix RDD.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in)Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep)Creates a partitioning-preserving copy of the input tensor RDD.static List<scala.Tuple2<Long,FrameBlock>>fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in)static scala.Tuple2<Long,FrameBlock>fromIndexedFrameBlock(Pair<Long,FrameBlock> in)static List<scala.Tuple2<MatrixIndexes,MatrixBlock>>fromIndexedMatrixBlock(List<IndexedMatrixValue> in)static scala.Tuple2<MatrixIndexes,MatrixBlock>fromIndexedMatrixBlock(IndexedMatrixValue in)static List<Pair<MatrixIndexes,MatrixBlock>>fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in)static Pair<MatrixIndexes,MatrixBlock>fromIndexedMatrixBlockToPair(IndexedMatrixValue in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc)Creates an RDD of empty blocks according to the given matrix characteristics.static longgetNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input)static longgetNonZeros(MatrixObject mo)static intgetNumPreferredPartitions(DataCharacteristics dc)static intgetNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks)static intgetNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in)static StringgetPrefixFromSparkDebugInfo(String line)static StringgetStartLineFromSparkDebugInfo(String line)static booleanisHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in)Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner.static voidpostprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut)static Pair<Long,FrameBlock>toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in)static List<Pair<Long,Long>>toIndexedLong(List<scala.Tuple2<Long,Long>> in)static IndexedMatrixValuetoIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb)static IndexedMatrixValuetoIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in)static IndexedTensorBlocktoIndexedTensorBlock(TensorIndexes ix, TensorBlock mb)static IndexedTensorBlocktoIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in) 
 - 
 
- 
- 
Method Detail
- 
toIndexedMatrixBlock
public static IndexedMatrixValue toIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in)
 
- 
toIndexedMatrixBlock
public static IndexedMatrixValue toIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb)
 
- 
toIndexedTensorBlock
public static IndexedTensorBlock toIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in)
 
- 
toIndexedTensorBlock
public static IndexedTensorBlock toIndexedTensorBlock(TensorIndexes ix, TensorBlock mb)
 
- 
fromIndexedMatrixBlock
public static scala.Tuple2<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlock(IndexedMatrixValue in)
 
- 
fromIndexedMatrixBlock
public static List<scala.Tuple2<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlock(List<IndexedMatrixValue> in)
 
- 
fromIndexedMatrixBlockToPair
public static Pair<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlockToPair(IndexedMatrixValue in)
 
- 
fromIndexedMatrixBlockToPair
public static List<Pair<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in)
 
- 
fromIndexedFrameBlock
public static scala.Tuple2<Long,FrameBlock> fromIndexedFrameBlock(Pair<Long,FrameBlock> in)
 
- 
fromIndexedFrameBlock
public static List<scala.Tuple2<Long,FrameBlock>> fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in)
 
- 
toIndexedLong
public static List<Pair<Long,Long>> toIndexedLong(List<scala.Tuple2<Long,Long>> in)
 
- 
toIndexedFrameBlock
public static Pair<Long,FrameBlock> toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in)
 
- 
isHashPartitioned
public static boolean isHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in)
Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner.- Parameters:
 in- input JavaPairRDD- Returns:
 - true if input is hash partitioned
 
 
- 
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in)
 
- 
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc)
 
- 
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks)
 
- 
copyBinaryBlockMatrix
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.- Parameters:
 in- matrix asJavaPairRDD<MatrixIndexes,MatrixBlock>- Returns:
 - matrix as 
JavaPairRDD<MatrixIndexes,MatrixBlock> 
 
- 
copyBinaryBlockMatrix
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input matrix RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
 in- matrix asJavaPairRDD<MatrixIndexes,MatrixBlock>deep- if true, perform deep copy- Returns:
 - matrix as 
JavaPairRDD<MatrixIndexes,MatrixBlock> 
 
- 
copyBinaryBlockTensor
public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in)
Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.- Parameters:
 in- tensor asJavaPairRDD<TensorIndexes,HomogTensor>- Returns:
 - tensor as 
JavaPairRDD<TensorIndexes,HomogTensor> 
 
- 
copyBinaryBlockTensor
public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input tensor RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
 in- tensor asJavaPairRDD<TensorIndexes,HomogTensor>deep- if true, perform deep copy- Returns:
 - tensor as 
JavaPairRDD<TensorIndexes,HomogTensor> 
 
- 
checkSparsity
public static void checkSparsity(String varname, ExecutionContext ec)
 
- 
getEmptyBlockRDD
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc)
Creates an RDD of empty blocks according to the given matrix characteristics. This is done in a scalable manner by parallelizing block ranges and generating empty blocks in a distributed manner, under awareness of preferred output partition sizes.- Parameters:
 sc- spark contextmc- matrix characteristics- Returns:
 - pair rdd of empty matrix blocks
 
 
- 
cacheBinaryCellRDD
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
 
- 
computeDataCharacteristics
public static DataCharacteristics computeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
Utility to compute dimensions and non-zeros in a given RDD of binary cells.- Parameters:
 input- matrix asJavaPairRDD<MatrixIndexes, MatrixCell>- Returns:
 - matrix characteristics
 
 
- 
getNonZeros
public static long getNonZeros(MatrixObject mo)
 
- 
getNonZeros
public static long getNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input)
 
- 
postprocessUltraSparseOutput
public static void postprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut)
 
 - 
 
 -