Class RDDAggregateUtils
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.spark.utils.RDDAggregateUtils
-
public class RDDAggregateUtils extends Object
Collection of utility methods for aggregating binary block rdds. As a general policy always call stable algorithms which maintain corrections over blocks per key. The performance overhead over a simple reducebykey is roughly 7-10% and with that acceptable.
-
-
Constructor Summary
Constructors Constructor Description RDDAggregateUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, boolean deepCopyCombiner)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, int numPartitions, boolean deepCopyCombiner)
static MatrixBlock
aggStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)
Single block aggregation over pair rdds with corrections for numerical stability.static MatrixBlock
aggStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in, AggregateOperator aop)
Single block aggregation over rdds with corrections for numerical stability.static TensorBlock
aggStableTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> in, AggregateOperator aop)
Single block aggregation over pair rdds with corrections for numerical stability.static TensorBlock
aggStableTensor(org.apache.spark.api.java.JavaRDD<TensorBlock> in, AggregateOperator aop)
Single block aggregation over rdds with corrections for numerical stability.static double
max(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)
Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)
Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
mergeRowsByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,RowMatrixBlock> in)
Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double>
sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double>
sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in, int numParts)
static MatrixBlock
sumStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
static MatrixBlock
sumStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in)
-
-
-
Method Detail
-
sumStable
public static MatrixBlock sumStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
-
sumStable
public static MatrixBlock sumStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in)
-
sumByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
-
sumByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)
-
sumByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)
-
sumCellsByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in)
-
sumCellsByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in, int numParts)
-
aggStable
public static MatrixBlock aggStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)
Single block aggregation over pair rdds with corrections for numerical stability.- Parameters:
in
- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>
aop
- aggregate operator- Returns:
- matrix block
-
aggStable
public static MatrixBlock aggStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in, AggregateOperator aop)
Single block aggregation over rdds with corrections for numerical stability.- Parameters:
in
- matrix asJavaRDD<MatrixBlock>
aop
- aggregate operator- Returns:
- matrix block
-
aggStableTensor
public static TensorBlock aggStableTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> in, AggregateOperator aop)
Single block aggregation over pair rdds with corrections for numerical stability.- Parameters:
in
- tensor asJavaPairRDD<TensorIndexes, TensorBlock>
aop
- aggregate operator- Returns:
- tensor block
-
aggStableTensor
public static TensorBlock aggStableTensor(org.apache.spark.api.java.JavaRDD<TensorBlock> in, AggregateOperator aop)
Single block aggregation over rdds with corrections for numerical stability.- Parameters:
in
- tensor asJavaRDD<TensorBlock>
aop
- aggregate operator- Returns:
- tensor block
-
aggByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)
-
aggByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, boolean deepCopyCombiner)
-
aggByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, int numPartitions, boolean deepCopyCombiner)
-
max
public static double max(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
-
mergeByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in
- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>
- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
mergeByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in
- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>
deepCopyCombiner
- indicator if the createCombiner functions needs to deep copy the input block- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
mergeByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in
- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>
numPartitions
- number of output partitionsdeepCopyCombiner
- indicator if the createCombiner functions needs to deep copy the input block- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
mergeRowsByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeRowsByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,RowMatrixBlock> in)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in
- matrix asJavaPairRDD<MatrixIndexes, RowMatrixBlock>
- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
-