Class AColGroup
- java.lang.Object
-
- org.apache.sysds.runtime.compress.colgroup.AColGroup
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
AColGroupCompressed
,ColGroupUncompressed
public abstract class AColGroup extends Object implements Serializable
Abstract Class that is the lowest class type for the Compression framework. AColGroup store information about a number of columns.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
AColGroup.CompressionType
Public super types of compression ColGroups supported
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract AColGroup
append(AColGroup g)
Append the other column group to this column group.static AColGroup
appendN(AColGroup[] groups)
Append all column groups in the list provided together in one go allocating the output once.abstract AColGroup
binaryRowOpLeft(BinaryOperator op, double[] v, boolean isRowSafe)
Perform a binary row operation.abstract AColGroup
binaryRowOpRight(BinaryOperator op, double[] v, boolean isRowSafe)
Perform a binary row operation.abstract CM_COV_Object
centralMoment(CMOperator op, int nRows)
Central Moment instruction executed on a column group.static double[]
colSum(Collection<AColGroup> groups, double[] res, int nRows)
Compute the column sum of the given list of groupsabstract void
computeColSums(double[] c, int nRows)
Compute the column sumabstract boolean
containsValue(double pattern)
Detect if the column group contains a specific value.void
decompressToDenseBlock(DenseBlock db, int rl, int ru)
Decompress a range of rows into a dense blockabstract void
decompressToDenseBlock(DenseBlock db, int rl, int ru, int offR, int offC)
Decompress into the DenseBlock.void
decompressToSparseBlock(SparseBlock sb, int rl, int ru)
Decompress a range of rows into a sparse block Note that this is using append, so the sparse column indexes need to be sorted afterwards.abstract void
decompressToSparseBlock(SparseBlock sb, int rl, int ru, int offR, int offC)
Decompress into the SparseBlock.long
estimateInMemorySize()
Get the upper bound estimate of in memory allocation for the column group.double
get(int r, int c)
Get the value at a global row/column position.IColIndex
getColIndices()
Obtain the offsets of the columns in the matrix block that make up the groupabstract ICLAScheme
getCompressionScheme()
Get the compression scheme for this column group to enable compression of other data.abstract AColGroup.CompressionType
getCompType()
Obtain the compression type.abstract double
getCost(ComputationCostEstimator e, int nRows)
Get the computation cost associated with this column group.long
getExactSizeOnDisk()
Returns the exact serialized size of column group.abstract double
getIdx(int r, int colIdx)
Get the value at a colGroup specific row/column index position.abstract double
getMax()
Short hand method for getting maximum value contained in this column group.abstract double
getMin()
Short hand method for getting minimum value contained in this column group.abstract long
getNumberNonZeros(int nRows)
Get the number of nonZeros contained in this column group.int
getNumCols()
Obtain the number of columns in this column group.abstract int
getNumValues()
Obtain number of distinct tuples in contained sets of values associated with this column group.abstract double
getSum(int nRows)
Short hand method for getting the sum of this column groupabstract boolean
isEmpty()
Get if the group is only containing zeroabstract void
leftMultByAColGroup(AColGroup lhs, MatrixBlock result, int nRows)
Left side matrix multiplication with a column group that is transposed.abstract void
leftMultByMatrixNoPreAgg(MatrixBlock matrix, MatrixBlock result, int rl, int ru, int cl, int cu)
Left multiply with this column group.abstract AColGroup
replace(double pattern, double replace)
Make a copy of the column group values, and replace all values that match pattern with replacement value.abstract AColGroup
rexpandCols(int max, boolean ignore, boolean cast, int nRows)
Expand the column group to multiple columns.AColGroup
rightMultByMatrix(MatrixBlock right)
Right matrix multiplication with this column group.abstract AColGroup
rightMultByMatrix(MatrixBlock right, IColIndex allCols)
Right matrix multiplication with this column group.abstract AColGroup
scalarOperation(ScalarOperator op)
Perform the specified scalar operation directly on the compressed column group, without decompressing individual cells if possible.AColGroup
shiftColIndices(int offset)
Shift all column indexes contained by an offset.AColGroup
sliceColumn(int col)
Slice out a single column from the column group.AColGroup
sliceColumns(int cl, int cu)
Slice out the columns within the range of cl and cu to remove the dictionary values related to these columns.abstract AColGroup
sliceRows(int rl, int ru)
Slice range of rows out of the column group and return a new column group only containing the row segment.String
toString()
abstract void
tsmm(MatrixBlock ret, int nRows)
Do a transposed self matrix multiplication on the left side t(x) %*% x.abstract void
tsmmAColGroup(AColGroup other, MatrixBlock result)
Matrix multiply with this other column group, but: 1.abstract void
unaryAggregateOperations(AggregateUnaryOperator op, double[] c, int nRows, int rl, int ru)
Unary Aggregate operator, since aggregate operators require new object output, the output becomes an uncompressed matrix.abstract AColGroup
unaryOperation(UnaryOperator op)
Perform unary operation on the column group and return a new column group
-
-
-
Method Detail
-
getColIndices
public final IColIndex getColIndices()
Obtain the offsets of the columns in the matrix block that make up the group- Returns:
- offsets of the columns in the matrix block that make up the group
-
getNumCols
public final int getNumCols()
Obtain the number of columns in this column group.- Returns:
- number of columns in this column group
-
shiftColIndices
public final AColGroup shiftColIndices(int offset)
Shift all column indexes contained by an offset. This is used for rbind to combine compressed matrices. Since column indexes are reused between operations, we allocate a new list here to be safe- Parameters:
offset
- The offset to move all columns- Returns:
- A new column group object with the shifted columns
-
estimateInMemorySize
public long estimateInMemorySize()
Get the upper bound estimate of in memory allocation for the column group.- Returns:
- an upper bound on the number of bytes used to store this ColGroup in memory.
-
decompressToSparseBlock
public final void decompressToSparseBlock(SparseBlock sb, int rl, int ru)
Decompress a range of rows into a sparse block Note that this is using append, so the sparse column indexes need to be sorted afterwards.- Parameters:
sb
- Sparse Target blockrl
- Row to start atru
- Row to end at
-
decompressToDenseBlock
public final void decompressToDenseBlock(DenseBlock db, int rl, int ru)
Decompress a range of rows into a dense block- Parameters:
db
- Sparse Target blockrl
- Row to start atru
- Row to end at
-
getExactSizeOnDisk
public long getExactSizeOnDisk()
Returns the exact serialized size of column group. This can be used for example for buffer preallocation.- Returns:
- exact serialized size for column group
-
sliceColumns
public final AColGroup sliceColumns(int cl, int cu)
Slice out the columns within the range of cl and cu to remove the dictionary values related to these columns. If the ColGroup slicing from does not contain any columns within the range null is returned.- Parameters:
cl
- The lower bound of the columns to selectcu
- The upper bound of the columns to select (not inclusive).- Returns:
- A cloned Column Group, with a copied pointer to the old column groups index structure, but reduced dictionary and _columnIndexes correctly aligned with the expected sliced compressed matrix.
-
sliceColumn
public final AColGroup sliceColumn(int col)
Slice out a single column from the column group.- Parameters:
col
- The column to slice, the column could potentially not be inside the column group- Returns:
- A new column group that is a single column, if the column requested is not in this column group null is returned.
-
colSum
public static double[] colSum(Collection<AColGroup> groups, double[] res, int nRows)
Compute the column sum of the given list of groups- Parameters:
groups
- The Groups to sumres
- The result to put the values intonRows
- The number of rows in the groups- Returns:
- The given res list, where the sum of the column groups is added
-
get
public double get(int r, int c)
Get the value at a global row/column position. In general this performs since a binary search of colIndexes is performed for each lookup.- Parameters:
r
- rowc
- column- Returns:
- value at the row/column position
-
getIdx
public abstract double getIdx(int r, int colIdx)
Get the value at a colGroup specific row/column index position.- Parameters:
r
- rowcolIdx
- column index in the _colIndexes.- Returns:
- value at the row/column index position
-
getNumValues
public abstract int getNumValues()
Obtain number of distinct tuples in contained sets of values associated with this column group. If the column group is uncompressed the number or rows is returned.- Returns:
- the number of distinct sets of values associated with the bitmaps in this column group
-
getCompType
public abstract AColGroup.CompressionType getCompType()
Obtain the compression type.- Returns:
- How the elements of the column group are compressed.
-
decompressToDenseBlock
public abstract void decompressToDenseBlock(DenseBlock db, int rl, int ru, int offR, int offC)
Decompress into the DenseBlock. (no NNZ handling)- Parameters:
db
- Target DenseBlockrl
- Row to start decompression fromru
- Row to end decompression atoffR
- Row offset into the target to decompressoffC
- Column offset into the target to decompress
-
decompressToSparseBlock
public abstract void decompressToSparseBlock(SparseBlock sb, int rl, int ru, int offR, int offC)
Decompress into the SparseBlock. (no NNZ handling) Note this method is allowing to calls to append since it is assumed that the sparse column indexes are sorted afterwards- Parameters:
sb
- Target SparseBlockrl
- Row to start decompression fromru
- Row to end decompression atoffR
- Row offset into the target to decompressoffC
- Column offset into the target to decompress
-
rightMultByMatrix
public final AColGroup rightMultByMatrix(MatrixBlock right)
Right matrix multiplication with this column group. This method can return null, meaning that the output overlapping group would have been empty.- Parameters:
right
- The MatrixBlock on the right of this matrix multiplication- Returns:
- The new Column Group or null that is the result of the matrix multiplication.
-
rightMultByMatrix
public abstract AColGroup rightMultByMatrix(MatrixBlock right, IColIndex allCols)
Right matrix multiplication with this column group. This method can return null, meaning that the output overlapping group would have been empty.- Parameters:
right
- The MatrixBlock on the right of this matrix multiplicationallCols
- A pre-materialized list of all col indexes, that can be shared across all column groups if use full, can be set to null.- Returns:
- The new Column Group or null that is the result of the matrix multiplication.
-
tsmm
public abstract void tsmm(MatrixBlock ret, int nRows)
Do a transposed self matrix multiplication on the left side t(x) %*% x. but only with this column group. This gives better performance since there is no need to iterate through all the rows of the matrix, but the execution can be limited to its number of distinct values. Note it only calculate the upper triangle- Parameters:
ret
- The return matrix block [numColumns x numColumns]nRows
- The number of rows in the column group
-
leftMultByMatrixNoPreAgg
public abstract void leftMultByMatrixNoPreAgg(MatrixBlock matrix, MatrixBlock result, int rl, int ru, int cl, int cu)
Left multiply with this column group.- Parameters:
matrix
- The matrix to multiply with on the leftresult
- The result to output the values into, always dense for the purpose of the column groups parallelizingrl
- The row to begin the multiplication from on the lhs matrixru
- The row to end the multiplication at on the lhs matrixcl
- The column to begin the multiplication from on the lhs matrixcu
- The column to end the multiplication at on the lhs matrix
-
leftMultByAColGroup
public abstract void leftMultByAColGroup(AColGroup lhs, MatrixBlock result, int nRows)
Left side matrix multiplication with a column group that is transposed.- Parameters:
lhs
- The left hand side Column group to multiply with, the left hand side should be considered transposed. Also it should be guaranteed that this column group is not empty.result
- The result matrix to insert the result of the multiplication intonRows
- Number of rows in the lhs colGroup
-
tsmmAColGroup
public abstract void tsmmAColGroup(AColGroup other, MatrixBlock result)
Matrix multiply with this other column group, but: 1. Only output upper triangle values. 2. Multiply both ways with "this" being on the left and on the right. It should be guaranteed that the input is not the same as the caller of the method. The second step is achievable by treating the initial multiplied matrix, and adding its values to the correct locations in the output.- Parameters:
other
- The other Column group to multiply withresult
- The result matrix to put the results into
-
scalarOperation
public abstract AColGroup scalarOperation(ScalarOperator op)
Perform the specified scalar operation directly on the compressed column group, without decompressing individual cells if possible.- Parameters:
op
- operation to perform- Returns:
- version of this column group with the operation applied
-
binaryRowOpLeft
public abstract AColGroup binaryRowOpLeft(BinaryOperator op, double[] v, boolean isRowSafe)
Perform a binary row operation.- Parameters:
op
- The operation to executev
- The vector of values to apply, should be same length as dictionary length.isRowSafe
- True if the binary op is applied to an entire zero row and all results are zero- Returns:
- A updated column group with the new values.
-
binaryRowOpRight
public abstract AColGroup binaryRowOpRight(BinaryOperator op, double[] v, boolean isRowSafe)
Perform a binary row operation.- Parameters:
op
- The operation to executev
- The vector of values to apply, should be same length as dictionary length.isRowSafe
- True if the binary op is applied to an entire zero row and all results are zero- Returns:
- A updated column group with the new values.
-
unaryAggregateOperations
public abstract void unaryAggregateOperations(AggregateUnaryOperator op, double[] c, int nRows, int rl, int ru)
Unary Aggregate operator, since aggregate operators require new object output, the output becomes an uncompressed matrix. The range of rl to ru only applies to row aggregates. (ReduceCol)- Parameters:
op
- The operator usedc
- The output matrix blocknRows
- The total number of rows in the Column Grouprl
- The starting row to do aggregation fromru
- The last row to do aggregation to (not included)
-
sliceRows
public abstract AColGroup sliceRows(int rl, int ru)
Slice range of rows out of the column group and return a new column group only containing the row segment. Note that this slice should maintain pointers back to the original dictionaries and only modify index structures.- Parameters:
rl
- The row to start atru
- The row to end at (not included)- Returns:
- A new column group containing the specified row range.
-
getMin
public abstract double getMin()
Short hand method for getting minimum value contained in this column group.- Returns:
- The minimum value contained in this ColumnGroup
-
getMax
public abstract double getMax()
Short hand method for getting maximum value contained in this column group.- Returns:
- The maximum value contained in this ColumnGroup
-
getSum
public abstract double getSum(int nRows)
Short hand method for getting the sum of this column group- Parameters:
nRows
- The number of rows in the column group- Returns:
- The sum of this column group
-
containsValue
public abstract boolean containsValue(double pattern)
Detect if the column group contains a specific value.- Parameters:
pattern
- The value to look for.- Returns:
- boolean saying true if the value is contained.
-
getNumberNonZeros
public abstract long getNumberNonZeros(int nRows)
Get the number of nonZeros contained in this column group.- Parameters:
nRows
- The number of rows in the column group, this is used for groups that does not contain information about how many rows they have.- Returns:
- The nnz.
-
replace
public abstract AColGroup replace(double pattern, double replace)
Make a copy of the column group values, and replace all values that match pattern with replacement value.- Parameters:
pattern
- The value to look forreplace
- The value to replace the other value with- Returns:
- A new Column Group, reusing the index structure but with new values.
-
computeColSums
public abstract void computeColSums(double[] c, int nRows)
Compute the column sum- Parameters:
c
- The array to add the column sum to.nRows
- The number of rows in the column group.
-
centralMoment
public abstract CM_COV_Object centralMoment(CMOperator op, int nRows)
Central Moment instruction executed on a column group.- Parameters:
op
- The Operator to use.nRows
- The number of rows contained in the ColumnGroup.- Returns:
- A Central Moment object.
-
rexpandCols
public abstract AColGroup rexpandCols(int max, boolean ignore, boolean cast, int nRows)
Expand the column group to multiple columns. (one hot encode the column group)- Parameters:
max
- The number of columns to expand to and cutoff values at.ignore
- If zero and negative values should be ignored.cast
- If the double values contained should be cast to whole numbers.nRows
- The number of rows in the column group.- Returns:
- A new column group containing max number of columns.
-
getCost
public abstract double getCost(ComputationCostEstimator e, int nRows)
Get the computation cost associated with this column group.- Parameters:
e
- The computation cost estimatornRows
- the number of rows in the column group- Returns:
- The cost of this column group
-
unaryOperation
public abstract AColGroup unaryOperation(UnaryOperator op)
Perform unary operation on the column group and return a new column group- Parameters:
op
- The operation to perform- Returns:
- The new column group
-
isEmpty
public abstract boolean isEmpty()
Get if the group is only containing zero- Returns:
- true if empty
-
append
public abstract AColGroup append(AColGroup g)
Append the other column group to this column group. This method tries to combine them to return a new column group containing both. In some cases it is possible in reasonable time, in others it is not. The result is first this column group followed by the other column group in higher row values. If it is not possible or very inefficient null is returned.- Parameters:
g
- The other column group- Returns:
- A combined column group or null
-
appendN
public static AColGroup appendN(AColGroup[] groups)
Append all column groups in the list provided together in one go allocating the output once. If it is not possible or very inefficient null is returned.- Parameters:
groups
- The groups to combine.- Returns:
- A combined column group or null
-
getCompressionScheme
public abstract ICLAScheme getCompressionScheme()
Get the compression scheme for this column group to enable compression of other data.- Returns:
- The compression scheme of this column group
-
-