public class OptimizerUtils extends Object
Modifier and Type | Class and Description |
---|---|
static class |
OptimizerUtils.OptimizationLevel
Optimization Types for Compilation
O0 STATIC - Decisions for scheduling operations on CP/MR are based on
predefined set of rules, which check if the dimensions are below a
fixed/static threshold (OLD Method of choosing between CP and MR).
|
Modifier and Type | Field and Description |
---|---|
static boolean |
ALLOW_ALGEBRAIC_SIMPLIFICATION |
static boolean |
ALLOW_AUTO_VECTORIZATION |
static boolean |
ALLOW_BRANCH_REMOVAL
Enables if-else branch removal for constant predicates (original literals or
results of constant folding).
|
static boolean |
ALLOW_CODE_MOTION
Enables a specific rewrite for code motion, i.e., hoisting loop invariant code
out of while, for, and parfor loops.
|
static boolean |
ALLOW_COMBINE_FILE_INPUT_FORMAT
Enables the use of CombineSequenceFileInputFormat with splitsize = 2x hdfs blocksize,
if sort buffer size large enough and parallelism not hurt.
|
static boolean |
ALLOW_COMMON_SUBEXPRESSION_ELIMINATION
Enables common subexpression elimination in dags.
|
static boolean |
ALLOW_CONSTANT_FOLDING
Enables constant folding in dags.
|
static boolean |
ALLOW_INTER_PROCEDURAL_ANALYSIS
Enables interprocedural analysis between main script and functions as well as functions
and other functions.
|
static boolean |
ALLOW_LOOP_UPDATE_IN_PLACE
Enables a specific rewrite that enables update in place for loop variables that are
only read/updated via cp leftindexing.
|
static boolean |
ALLOW_OPERATOR_FUSION |
static boolean |
ALLOW_RAND_JOB_RECOMPILE |
static boolean |
ALLOW_RUNTIME_PIGGYBACKING
Enables parfor runtime piggybacking of MR jobs into the packed jobs for
scan sharing.
|
static boolean |
ALLOW_SIZE_EXPRESSION_EVALUATION
Enables simple expression evaluation for datagen parameters 'rows', 'cols'.
|
static boolean |
ALLOW_SPLIT_HOP_DAGS
Enables a specific hop dag rewrite that splits hop dags after csv persistent reads with
unknown size in order to allow for recompile.
|
static boolean |
ALLOW_SUM_PRODUCT_REWRITES
Enables sum product rewrites such as mapmultchains.
|
static boolean |
ALLOW_WORSTCASE_SIZE_EXPRESSION_EVALUATION
Enables simple expression evaluation for datagen parameters 'rows', 'cols'.
|
static long |
BOOLEAN_SIZE |
static long |
CHAR_SIZE |
static int |
DEFAULT_BLOCKSIZE
Default blocksize if unspecified or for testing purposes
|
static int |
DEFAULT_FRAME_BLOCKSIZE
Default frame blocksize
|
static OptimizerUtils.OptimizationLevel |
DEFAULT_OPTLEVEL
Default optimization level if unspecified
|
static double |
DEFAULT_SIZE
Default memory size, which is used if the actual estimate can not be computed
e.g., when input/output dimensions are unknown.
|
static long |
DOUBLE_SIZE |
static long |
INT_SIZE |
static double |
INVALID_SIZE |
static int |
IPA_NUM_REPETITIONS
Number of inter-procedural analysis (IPA) repetitions.
|
static long |
MAX_NNZ_CP_SPARSE |
static long |
MAX_NUMCELLS_CP_DENSE |
static double |
MEM_UTIL_FACTOR
Utilization factor used in deciding whether an operation to be scheduled on CP or MR.
|
static double |
PARALLEL_CP_READ_PARALLELISM_MULTIPLIER
Specifies a multiplier computing the degree of parallelism of parallel
text read/write out of the available degree of parallelism.
|
static double |
PARALLEL_CP_WRITE_PARALLELISM_MULTIPLIER |
static long |
SAFE_REP_CHANGE_THRES |
Constructor and Description |
---|
OptimizerUtils() |
Modifier and Type | Method and Description |
---|---|
static boolean |
allowsToFilterEmptyBlockOutputs(Hop hop) |
static boolean |
checkSparkBroadcastMemoryBudget(double size) |
static boolean |
checkSparkBroadcastMemoryBudget(long rlen,
long clen,
long blen,
long nnz) |
static boolean |
checkSparkCollectMemoryBudget(DataCharacteristics dc,
long memPinned) |
static boolean |
checkSparkCollectMemoryBudget(DataCharacteristics dc,
long memPinned,
boolean checkBP) |
static boolean |
checkSparseBlockCSRConversion(DataCharacteristics dcIn) |
static CompilerConfig |
constructCompilerConfig(CompilerConfig cconf,
DMLConfig dmlconf) |
static CompilerConfig |
constructCompilerConfig(DMLConfig dmlconf) |
static long |
estimatePartitionedSizeExactSparsity(DataCharacteristics dc)
Estimates the footprint (in bytes) for a partitioned in-memory representation of a
matrix with the given matrix characteristics
|
static long |
estimatePartitionedSizeExactSparsity(long rlen,
long clen,
long blen,
double sp)
Estimates the footprint (in bytes) for a partitioned in-memory representation of a
matrix with dimensions=(nrows,ncols) and sparsity=sp.
|
static long |
estimatePartitionedSizeExactSparsity(long rlen,
long clen,
long blen,
long nnz)
Estimates the footprint (in bytes) for a partitioned in-memory representation of a
matrix with dimensions=(nrows,ncols) and number of non-zeros nnz.
|
static long |
estimateSize(DataCharacteristics dc) |
static long |
estimateSize(long nrows,
long ncols)
Similar to estimate() except that it provides worst-case estimates
when the optimization type is ROBUST.
|
static long |
estimateSizeEmptyBlock(long nrows,
long ncols) |
static long |
estimateSizeExactSparsity(DataCharacteristics dc) |
static long |
estimateSizeExactSparsity(long nrows,
long ncols,
double sp)
Estimates the footprint (in bytes) for an in-memory representation of a
matrix with dimensions=(nrows,ncols) and sparsity=sp.
|
static long |
estimateSizeExactSparsity(long nrows,
long ncols,
long nnz)
Estimates the footprint (in bytes) for an in-memory representation of a
matrix with dimensions=(nrows,ncols) and and number of non-zeros nnz.
|
static long |
estimateSizeTextOutput(int[] dims,
long nnz,
Types.FileFormat fmt) |
static long |
estimateSizeTextOutput(long rows,
long cols,
long nnz,
Types.FileFormat fmt) |
static boolean |
exceedsCachingThreshold(long dim2,
double outMem)
Indicates if the given matrix characteristics exceed the threshold for
caching, i.e., the matrix should be cached.
|
static double |
getBinaryOpSparsity(double sp1,
double sp2,
Types.OpOp2 op,
boolean worstcase)
Estimates the result sparsity for matrix-matrix binary operations (A op B)
|
static double |
getBinaryOpSparsityConditionalSparseSafe(double sp1,
Types.OpOp2 op,
LiteralOp lit) |
static int |
getConstrainedNumThreads(int maxNumThreads) |
static Types.ExecMode |
getDefaultExecutionMode() |
static int |
getDefaultFrameSize() |
static org.apache.log4j.Level |
getDefaultLogLevel() |
static long |
getDefaultSize() |
static double |
getLeftIndexingSparsity(long rlen1,
long clen1,
long nnz1,
long rlen2,
long clen2,
long nnz2) |
static double |
getLocalMemBudget()
Returns memory budget (according to util factor) in bytes
|
static long |
getMatMultNnz(double sp1,
double sp2,
long m,
long k,
long n,
boolean worstcase) |
static double |
getMatMultSparsity(double sp1,
double sp2,
long m,
long k,
long n,
boolean worstcase)
Estimates the result sparsity for Matrix Multiplication A %*% B.
|
static long |
getNnz(long dim1,
long dim2,
double sp) |
static long |
getNumIterations(ForProgramBlock fpb,
LocalVariableMap vars,
long defaultValue) |
static long |
getNumIterations(ForProgramBlock fpb,
long defaultValue) |
static int |
getNumMappers() |
static int |
getNumReducers(boolean configOnly)
Returns the number of reducers that potentially run in parallel.
|
static OptimizerUtils.OptimizationLevel |
getOptLevel() |
static long |
getOuterNonZeros(long n1,
long n2,
long nnz1,
long nnz2,
Types.OpOp2 op) |
static int |
getParallelBinaryReadParallelism() |
static int |
getParallelBinaryWriteParallelism() |
static int |
getParallelTextReadParallelism()
Returns the degree of parallelism used for parallel text read.
|
static int |
getParallelTextWriteParallelism()
Returns the degree of parallelism used for parallel text write.
|
static double |
getSparsity(DataCharacteristics dc) |
static double |
getSparsity(long[] dims,
long nnz) |
static double |
getSparsity(long dim1,
long dim2,
long nnz) |
static double |
getTotalMemEstimate(Hop[] in,
Hop out) |
static double |
getTotalMemEstimate(Hop[] in,
Hop out,
boolean denseOut) |
static String |
getUniqueTempFileName()
Wrapper over internal filename construction for external usage.
|
static boolean |
isBinaryOpConditionalSparseSafe(Types.OpOp2 op)
Determines if a given binary op is potentially conditional sparse safe.
|
static boolean |
isBinaryOpConditionalSparseSafeExact(Types.OpOp2 op,
LiteralOp lit)
Determines if a given binary op with scalar literal guarantee an output
sparsity which is exactly the same as its matrix input sparsity.
|
static boolean |
isBinaryOpSparsityConditionalSparseSafe(Types.OpOp2 op,
LiteralOp lit) |
static boolean |
isHybridExecutionMode() |
static boolean |
isIndexingRangeBlockAligned(IndexRange ixrange,
DataCharacteristics mc)
Indicates if the given indexing range is block aligned, i.e., it does not require
global aggregation of blocks.
|
static boolean |
isIndexingRangeBlockAligned(long rl,
long ru,
long cl,
long cu,
long blen)
Indicates if the given indexing range is block aligned, i.e., it does not require
global aggregation of blocks.
|
static boolean |
isMaxLocalParallelism(int k) |
static boolean |
isMemoryBasedOptLevel() |
static boolean |
isOptLevel(OptimizerUtils.OptimizationLevel level) |
static boolean |
isSparkExecutionMode() |
static boolean |
isTopLevelParFor() |
static boolean |
isValidCPDimensions(DataCharacteristics mc) |
static boolean |
isValidCPDimensions(long rows,
long cols)
Returns false if dimensions known to be invalid; other true
|
static boolean |
isValidCPDimensions(Types.ValueType[] schema,
String[] names)
Returns false if schema and names are not properly specified; other true
Length to be > 0, and length of both to be equal.
|
static boolean |
isValidCPMatrixSize(long rows,
long cols,
double sparsity)
Determines if valid matrix size to be represented in CP data structures.
|
static void |
resetDefaultSize() |
static void |
resetStaticCompilerFlags() |
static double |
rEvalSimpleDoubleExpression(Hop root,
HashMap<Long,Double> valMemo) |
static double |
rEvalSimpleDoubleExpression(Hop root,
HashMap<Long,Double> valMemo,
LocalVariableMap vars) |
static long |
rEvalSimpleLongExpression(Hop root,
HashMap<Long,Long> valMemo)
Function to evaluate simple size expressions over literals and now/ncol.
|
static long |
rEvalSimpleLongExpression(Hop root,
HashMap<Long,Long> valMemo,
LocalVariableMap vars) |
static String |
toMB(double inB) |
public static double MEM_UTIL_FACTOR
public static final int DEFAULT_BLOCKSIZE
public static final int DEFAULT_FRAME_BLOCKSIZE
public static final OptimizerUtils.OptimizationLevel DEFAULT_OPTLEVEL
public static double DEFAULT_SIZE
public static final long DOUBLE_SIZE
public static final long INT_SIZE
public static final long CHAR_SIZE
public static final long BOOLEAN_SIZE
public static final double INVALID_SIZE
public static final long MAX_NUMCELLS_CP_DENSE
public static final long MAX_NNZ_CP_SPARSE
public static final long SAFE_REP_CHANGE_THRES
public static boolean ALLOW_COMMON_SUBEXPRESSION_ELIMINATION
public static boolean ALLOW_CONSTANT_FOLDING
public static boolean ALLOW_ALGEBRAIC_SIMPLIFICATION
public static boolean ALLOW_OPERATOR_FUSION
public static boolean ALLOW_BRANCH_REMOVAL
public static boolean ALLOW_AUTO_VECTORIZATION
public static boolean ALLOW_SIZE_EXPRESSION_EVALUATION
public static boolean ALLOW_WORSTCASE_SIZE_EXPRESSION_EVALUATION
public static boolean ALLOW_RAND_JOB_RECOMPILE
public static boolean ALLOW_RUNTIME_PIGGYBACKING
public static boolean ALLOW_INTER_PROCEDURAL_ANALYSIS
public static int IPA_NUM_REPETITIONS
public static boolean ALLOW_SUM_PRODUCT_REWRITES
public static boolean ALLOW_SPLIT_HOP_DAGS
public static boolean ALLOW_LOOP_UPDATE_IN_PLACE
public static boolean ALLOW_CODE_MOTION
public static final double PARALLEL_CP_READ_PARALLELISM_MULTIPLIER
public static final double PARALLEL_CP_WRITE_PARALLELISM_MULTIPLIER
public static final boolean ALLOW_COMBINE_FILE_INPUT_FORMAT
public static OptimizerUtils.OptimizationLevel getOptLevel()
public static boolean isMemoryBasedOptLevel()
public static boolean isOptLevel(OptimizerUtils.OptimizationLevel level)
public static CompilerConfig constructCompilerConfig(DMLConfig dmlconf)
public static CompilerConfig constructCompilerConfig(CompilerConfig cconf, DMLConfig dmlconf)
public static void resetStaticCompilerFlags()
public static long getDefaultSize()
public static void resetDefaultSize()
public static int getDefaultFrameSize()
public static double getLocalMemBudget()
public static boolean isMaxLocalParallelism(int k)
public static boolean isTopLevelParFor()
public static boolean checkSparkBroadcastMemoryBudget(double size)
public static boolean checkSparkBroadcastMemoryBudget(long rlen, long clen, long blen, long nnz)
public static boolean checkSparkCollectMemoryBudget(DataCharacteristics dc, long memPinned)
public static boolean checkSparkCollectMemoryBudget(DataCharacteristics dc, long memPinned, boolean checkBP)
public static boolean checkSparseBlockCSRConversion(DataCharacteristics dcIn)
public static int getNumReducers(boolean configOnly)
configOnly
- true if configured valuepublic static int getNumMappers()
public static Types.ExecMode getDefaultExecutionMode()
public static boolean isSparkExecutionMode()
public static boolean isHybridExecutionMode()
public static int getParallelTextReadParallelism()
public static int getParallelBinaryReadParallelism()
public static int getParallelTextWriteParallelism()
public static int getParallelBinaryWriteParallelism()
public static long estimateSize(DataCharacteristics dc)
public static long estimateSizeExactSparsity(DataCharacteristics dc)
public static long estimateSizeExactSparsity(long nrows, long ncols, long nnz)
nrows
- number of rowsncols
- number of colsnnz
- number of non-zerospublic static long estimateSizeExactSparsity(long nrows, long ncols, double sp)
sp
is guaranteed to give worst-case estimate
(e.g., Rand with a fixed sparsity). In all other cases, estimateSize()
must be used so that worst-case estimates are computed, whenever
applicable.nrows
- number of rowsncols
- number of colssp
- sparsitypublic static long estimatePartitionedSizeExactSparsity(DataCharacteristics dc)
dc
- matrix characteristicspublic static long estimatePartitionedSizeExactSparsity(long rlen, long clen, long blen, long nnz)
rlen
- number of rowsclen
- number of colsblen
- rows/cols per blocknnz
- number of non-zerospublic static long estimatePartitionedSizeExactSparsity(long rlen, long clen, long blen, double sp)
rlen
- number of rowsclen
- number of colsblen
- rows/cols per blocksp
- sparsitypublic static long estimateSize(long nrows, long ncols)
nrows
- number of rowsncols
- number of colspublic static long estimateSizeEmptyBlock(long nrows, long ncols)
public static long estimateSizeTextOutput(long rows, long cols, long nnz, Types.FileFormat fmt)
public static long estimateSizeTextOutput(int[] dims, long nnz, Types.FileFormat fmt)
public static boolean isIndexingRangeBlockAligned(IndexRange ixrange, DataCharacteristics mc)
ixrange
- indexing rangemc
- matrix characteristicspublic static boolean isIndexingRangeBlockAligned(long rl, long ru, long cl, long cu, long blen)
rl
- rows lowerru
- rows uppercl
- cols lowercu
- cols upperblen
- rows/cols per blockpublic static boolean isValidCPDimensions(DataCharacteristics mc)
public static boolean isValidCPDimensions(long rows, long cols)
rows
- number of rowscols
- number of colspublic static boolean isValidCPDimensions(Types.ValueType[] schema, String[] names)
schema
- the schemanames
- the namespublic static boolean isValidCPMatrixSize(long rows, long cols, double sparsity)
rows
- number of rowscols
- number of colssparsity
- the sparsitypublic static boolean exceedsCachingThreshold(long dim2, double outMem)
dim2
- dimension 2outMem
- ?public static String getUniqueTempFileName()
public static boolean allowsToFilterEmptyBlockOutputs(Hop hop)
public static int getConstrainedNumThreads(int maxNumThreads)
public static org.apache.log4j.Level getDefaultLogLevel()
public static long getMatMultNnz(double sp1, double sp2, long m, long k, long n, boolean worstcase)
public static double getMatMultSparsity(double sp1, double sp2, long m, long k, long n, boolean worstcase)
sp1
- sparsity of Asp2
- sparsity of Bm
- nrow(A)k
- ncol(A), nrow(B)n
- ncol(B)worstcase
- true if worst casepublic static double getLeftIndexingSparsity(long rlen1, long clen1, long nnz1, long rlen2, long clen2, long nnz2)
public static boolean isBinaryOpConditionalSparseSafe(Types.OpOp2 op)
op
- the HOP OpOp2public static boolean isBinaryOpConditionalSparseSafeExact(Types.OpOp2 op, LiteralOp lit)
op
- the HOP OpOp2lit
- literal operatorpublic static boolean isBinaryOpSparsityConditionalSparseSafe(Types.OpOp2 op, LiteralOp lit)
public static double getBinaryOpSparsityConditionalSparseSafe(double sp1, Types.OpOp2 op, LiteralOp lit)
public static double getBinaryOpSparsity(double sp1, double sp2, Types.OpOp2 op, boolean worstcase)
sp1
- sparsity of Asp2
- sparsity of Bop
- binary operationworstcase
- true if worst casepublic static long getOuterNonZeros(long n1, long n2, long nnz1, long nnz2, Types.OpOp2 op)
public static long getNnz(long dim1, long dim2, double sp)
public static double getSparsity(DataCharacteristics dc)
public static double getSparsity(long dim1, long dim2, long nnz)
public static double getSparsity(long[] dims, long nnz)
public static String toMB(double inB)
public static long getNumIterations(ForProgramBlock fpb, long defaultValue)
public static long getNumIterations(ForProgramBlock fpb, LocalVariableMap vars, long defaultValue)
public static long rEvalSimpleLongExpression(Hop root, HashMap<Long,Long> valMemo)
root
- the root high-level operatorvalMemo
- ?public static long rEvalSimpleLongExpression(Hop root, HashMap<Long,Long> valMemo, LocalVariableMap vars)
public static double rEvalSimpleDoubleExpression(Hop root, HashMap<Long,Double> valMemo)
public static double rEvalSimpleDoubleExpression(Hop root, HashMap<Long,Double> valMemo, LocalVariableMap vars)
Copyright © 2020 The Apache Software Foundation. All rights reserved.