Class ColGroupDDCFOR
- java.lang.Object
-
- org.apache.sysds.runtime.compress.colgroup.AColGroup
-
- org.apache.sysds.runtime.compress.colgroup.AColGroupCompressed
-
- org.apache.sysds.runtime.compress.colgroup.ADictBasedColGroup
-
- org.apache.sysds.runtime.compress.colgroup.AColGroupValue
-
- org.apache.sysds.runtime.compress.colgroup.AMorphingMMColGroup
-
- org.apache.sysds.runtime.compress.colgroup.ColGroupDDCFOR
-
- All Implemented Interfaces:
Serializable
,IContainADictionary
,IContainDefaultTuple
,IFrameOfReferenceGroup
public class ColGroupDDCFOR extends AMorphingMMColGroup implements IFrameOfReferenceGroup
Class to encapsulate information about a column group that is encoded with dense dictionary encoding (DDC).- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.sysds.runtime.compress.colgroup.AColGroup
AColGroup.CompressionType
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description AColGroup
append(AColGroup g)
Append the other column group to this column group.AColGroup
appendNInternal(AColGroup[] g, int blen, int rlen)
AColGroup
binaryRowOpLeft(BinaryOperator op, double[] v, boolean isRowSafe)
Perform a binary row operation.AColGroup
binaryRowOpRight(BinaryOperator op, double[] v, boolean isRowSafe)
Perform a binary row operation.CM_COV_Object
centralMoment(CMOperator op, int nRows)
Central Moment instruction executed on a column group.AColGroup
combineWithSameIndex(int nRow, int nCol, List<AColGroup> right)
C bind the list of column groups with this column group.AColGroup
combineWithSameIndex(int nRow, int nCol, AColGroup right)
C bind the given column group to this.void
computeColSums(double[] c, int nRows)
Compute the column sumboolean
containsValue(double pattern)
Detect if the column group contains a specific value.static AColGroup
create(IColIndex colIndexes, IDictionary dict, AMapToData data, int[] cachedCounts, double[] reference)
long
estimateInMemorySize()
Get the upper bound estimate of in memory allocation for the column group.AColGroup
extractCommon(double[] constV)
extract common value from group and return non morphing grouporg.apache.sysds.runtime.compress.colgroup.AColGroup.ColGroupType
getColGroupType()
double[]
getCommon()
Get common vector, note this should not materialize anything but simply point to things that are already allocated.CompressedSizeInfoColGroup
getCompressionInfo(int nRow)
Get the compression info for this column group.ICLAScheme
getCompressionScheme()
Get the compression scheme for this column group to enable compression of other data.AColGroup.CompressionType
getCompType()
Obtain the compression type.double
getCost(ComputationCostEstimator e, int nRows)
Get the computation cost associated with this column group.int[]
getCounts(int[] counts)
double[]
getDefaultTuple()
IEncode
getEncoding()
Get encoding of this column group.long
getExactSizeOnDisk()
Returns the exact serialized size of column group.double
getIdx(int r, int colIdx)
Get the value at a colGroup specific row/column index position.long
getNumberNonZeros(int nRows)
Get the number of nonZeros contained in this column group.static ColGroupDDCFOR
read(DataInput in)
AColGroup
recompress()
Recompress this column group into a new column group.AColGroup
replace(double pattern, double replace)
Make a copy of the column group values, and replace all values that match pattern with replacement value.AColGroup
rexpandCols(int max, boolean ignore, boolean cast, int nRows)
Expand the column group to multiple columns.boolean
sameIndexStructure(AColGroupCompressed that)
AColGroup
scalarOperation(ScalarOperator op)
Perform the specified scalar operation directly on the compressed column group, without decompressing individual cells if possible.AColGroup
sliceRows(int rl, int ru)
Slice range of rows out of the column group and return a new column group only containing the row segment.static AColGroup
sparsifyFOR(ColGroupDDC g)
AColGroup[]
splitReshape(int multiplier, int nRow, int nColOrg)
This method returns a list of column groups that are naive splits of this column group as if it is reshaped.String
toString()
AColGroup
unaryOperation(UnaryOperator op)
Perform unary operation on the column group and return a new column groupvoid
write(DataOutput out)
-
Methods inherited from class org.apache.sysds.runtime.compress.colgroup.AMorphingMMColGroup
decompressToSparseBlockTransposed, leftMultByAColGroup, leftMultByMatrixNoPreAgg, tsmmAColGroup
-
Methods inherited from class org.apache.sysds.runtime.compress.colgroup.AColGroupValue
clear, getCounts, getNumValues
-
Methods inherited from class org.apache.sysds.runtime.compress.colgroup.ADictBasedColGroup
copyAndSet, copyAndSet, decompressToDenseBlock, decompressToDenseBlockTransposed, decompressToSparseBlock, getDictionary, getSparsity, reduceCols, rightMultByMatrix
-
Methods inherited from class org.apache.sysds.runtime.compress.colgroup.AColGroupCompressed
getMax, getMin, getSum, isEmpty, preAggRows, sameIndexStructure, tsmm, unaryAggregateOperations, unaryAggregateOperations
-
Methods inherited from class org.apache.sysds.runtime.compress.colgroup.AColGroup
addVector, appendN, colSum, combine, decompressToDenseBlock, decompressToSparseBlock, get, getColIndices, getNumCols, morph, rightDecompressingMult, rightMultByMatrix, selectionMultiply, shiftColIndices, sliceColumn, sliceColumns, sortColumnIndexes, splitReshapePushDown
-
-
-
-
Method Detail
-
create
public static AColGroup create(IColIndex colIndexes, IDictionary dict, AMapToData data, int[] cachedCounts, double[] reference)
-
sparsifyFOR
public static AColGroup sparsifyFOR(ColGroupDDC g)
-
getCompType
public AColGroup.CompressionType getCompType()
Description copied from class:AColGroup
Obtain the compression type.- Specified by:
getCompType
in classAColGroup
- Returns:
- How the elements of the column group are compressed.
-
getDefaultTuple
public double[] getDefaultTuple()
- Specified by:
getDefaultTuple
in interfaceIContainDefaultTuple
-
getIdx
public double getIdx(int r, int colIdx)
Description copied from class:AColGroup
Get the value at a colGroup specific row/column index position.
-
getCounts
public int[] getCounts(int[] counts)
-
getColGroupType
public org.apache.sysds.runtime.compress.colgroup.AColGroup.ColGroupType getColGroupType()
-
estimateInMemorySize
public long estimateInMemorySize()
Description copied from class:AColGroup
Get the upper bound estimate of in memory allocation for the column group.- Overrides:
estimateInMemorySize
in classAColGroupValue
- Returns:
- an upper bound on the number of bytes used to store this ColGroup in memory.
-
scalarOperation
public AColGroup scalarOperation(ScalarOperator op)
Description copied from class:AColGroup
Perform the specified scalar operation directly on the compressed column group, without decompressing individual cells if possible.- Specified by:
scalarOperation
in classAColGroup
- Parameters:
op
- operation to perform- Returns:
- version of this column group with the operation applied
-
unaryOperation
public AColGroup unaryOperation(UnaryOperator op)
Description copied from class:AColGroup
Perform unary operation on the column group and return a new column group- Specified by:
unaryOperation
in classAColGroup
- Parameters:
op
- The operation to perform- Returns:
- The new column group
-
binaryRowOpLeft
public AColGroup binaryRowOpLeft(BinaryOperator op, double[] v, boolean isRowSafe)
Description copied from class:AColGroup
Perform a binary row operation.- Specified by:
binaryRowOpLeft
in classAColGroup
- Parameters:
op
- The operation to executev
- The vector of values to apply the values contained should be at least the length of the highest value in the column indexisRowSafe
- True if the binary op is applied to an entire zero row and all results are zero- Returns:
- A updated column group with the new values.
-
binaryRowOpRight
public AColGroup binaryRowOpRight(BinaryOperator op, double[] v, boolean isRowSafe)
Description copied from class:AColGroup
Perform a binary row operation.- Specified by:
binaryRowOpRight
in classAColGroup
- Parameters:
op
- The operation to executev
- The vector of values to apply the values contained should be at least the length of the highest value in the column indexisRowSafe
- True if the binary op is applied to an entire zero row and all results are zero- Returns:
- A updated column group with the new values.
-
write
public void write(DataOutput out) throws IOException
- Overrides:
write
in classADictBasedColGroup
- Throws:
IOException
-
read
public static ColGroupDDCFOR read(DataInput in) throws IOException
- Throws:
IOException
-
getExactSizeOnDisk
public long getExactSizeOnDisk()
Description copied from class:AColGroup
Returns the exact serialized size of column group. This can be used for example for buffer preallocation.- Overrides:
getExactSizeOnDisk
in classADictBasedColGroup
- Returns:
- exact serialized size for column group
-
getCost
public double getCost(ComputationCostEstimator e, int nRows)
Description copied from class:AColGroup
Get the computation cost associated with this column group.
-
replace
public AColGroup replace(double pattern, double replace)
Description copied from class:AColGroup
Make a copy of the column group values, and replace all values that match pattern with replacement value.- Overrides:
replace
in classAColGroupValue
- Parameters:
pattern
- The value to look forreplace
- The value to replace the other value with- Returns:
- A new Column Group, reusing the index structure but with new values.
-
computeColSums
public void computeColSums(double[] c, int nRows)
Description copied from class:AColGroup
Compute the column sum- Overrides:
computeColSums
in classAColGroupValue
- Parameters:
c
- The array to add the column sum to.nRows
- The number of rows in the column group.
-
containsValue
public boolean containsValue(double pattern)
Description copied from class:AColGroup
Detect if the column group contains a specific value.- Specified by:
containsValue
in classAColGroup
- Parameters:
pattern
- The value to look for.- Returns:
- boolean saying true if the value is contained.
-
getNumberNonZeros
public long getNumberNonZeros(int nRows)
Description copied from class:AColGroup
Get the number of nonZeros contained in this column group.- Overrides:
getNumberNonZeros
in classAColGroupValue
- Parameters:
nRows
- The number of rows in the column group, this is used for groups that does not contain information about how many rows they have.- Returns:
- The nnz.
-
extractCommon
public AColGroup extractCommon(double[] constV)
Description copied from class:AMorphingMMColGroup
extract common value from group and return non morphing group- Specified by:
extractCommon
in interfaceIFrameOfReferenceGroup
- Specified by:
extractCommon
in classAMorphingMMColGroup
- Parameters:
constV
- a vector to contain all values, note length = nCols in total matrix.- Returns:
- A non morphing column group with decompression instructions.
-
rexpandCols
public AColGroup rexpandCols(int max, boolean ignore, boolean cast, int nRows)
Description copied from class:AColGroup
Expand the column group to multiple columns. (one hot encode the column group)- Overrides:
rexpandCols
in classAColGroupValue
- Parameters:
max
- The number of columns to expand to and cutoff values at.ignore
- If zero and negative values should be ignored.cast
- If the double values contained should be cast to whole numbers.nRows
- The number of rows in the column group.- Returns:
- A new column group containing max number of columns.
-
centralMoment
public CM_COV_Object centralMoment(CMOperator op, int nRows)
Description copied from class:AColGroup
Central Moment instruction executed on a column group.- Overrides:
centralMoment
in classAColGroupValue
- Parameters:
op
- The Operator to use.nRows
- The number of rows contained in the ColumnGroup.- Returns:
- A Central Moment object.
-
getCommon
public double[] getCommon()
Description copied from class:AMorphingMMColGroup
Get common vector, note this should not materialize anything but simply point to things that are already allocated.- Specified by:
getCommon
in classAMorphingMMColGroup
- Returns:
- the common double vector
-
sliceRows
public AColGroup sliceRows(int rl, int ru)
Description copied from class:AColGroup
Slice range of rows out of the column group and return a new column group only containing the row segment. Note that this slice should maintain pointers back to the original dictionaries and only modify index structures.
-
append
public AColGroup append(AColGroup g)
Description copied from class:AColGroup
Append the other column group to this column group. This method tries to combine them to return a new column group containing both. In some cases it is possible in reasonable time, in others it is not. The result is first this column group followed by the other column group in higher row values. If it is not possible or very inefficient null is returned.
-
getCompressionScheme
public ICLAScheme getCompressionScheme()
Description copied from class:AColGroup
Get the compression scheme for this column group to enable compression of other data.- Specified by:
getCompressionScheme
in classAColGroup
- Returns:
- The compression scheme of this column group
-
recompress
public AColGroup recompress()
Description copied from class:AColGroup
Recompress this column group into a new column group.- Specified by:
recompress
in classAColGroup
- Returns:
- A new or the same column group depending on optimization goal.
-
getCompressionInfo
public CompressedSizeInfoColGroup getCompressionInfo(int nRow)
Description copied from class:AColGroup
Get the compression info for this column group.- Specified by:
getCompressionInfo
in classAColGroup
- Parameters:
nRow
- The number of rows in this column group.- Returns:
- The compression info for this group.
-
getEncoding
public IEncode getEncoding()
Description copied from class:AColGroup
Get encoding of this column group.- Overrides:
getEncoding
in classAColGroup
- Returns:
- The encoding of the index structure.
-
sameIndexStructure
public boolean sameIndexStructure(AColGroupCompressed that)
- Specified by:
sameIndexStructure
in classAColGroupCompressed
-
combineWithSameIndex
public AColGroup combineWithSameIndex(int nRow, int nCol, List<AColGroup> right)
Description copied from class:AColGroup
C bind the list of column groups with this column group. the list of elements provided in the index of each list is guaranteed to have the same index structures- Overrides:
combineWithSameIndex
in classAColGroup
- Parameters:
nRow
- The number of rows contained in all right and this column group.nCol
- The number of columns to shift the right hand side column groups over when combining, this should only effect the column indexesright
- The right hand side column groups to combine. NOTE only the index offset of the second nested list should be used. The reason for providing this nested list is to avoid redundant allocations in calling methods.- Returns:
- A combined compressed column group of the same type as this!.
-
combineWithSameIndex
public AColGroup combineWithSameIndex(int nRow, int nCol, AColGroup right)
Description copied from class:AColGroup
C bind the given column group to this.- Overrides:
combineWithSameIndex
in classAColGroup
- Parameters:
nRow
- The number of rows contained in the right and this column group.nCol
- The number of columns in this.right
- The column group to c-bind.- Returns:
- a new combined column groups.
-
splitReshape
public AColGroup[] splitReshape(int multiplier, int nRow, int nColOrg)
Description copied from class:AColGroup
This method returns a list of column groups that are naive splits of this column group as if it is reshaped. This means the column groups rows are split into x number of other column groups where x is the multiplier. The indexes are assigned round robbin to each of the output groups, meaning the first index is assigned. If for instance the 4. column group is split by a 2 multiplier and there was 5 columns in total originally. The output becomes 2 column groups at column index 4 and one at 9. If possible the split column groups should reuse pointers back to the original dictionaries!- Specified by:
splitReshape
in classAColGroup
- Parameters:
multiplier
- The number of column groups to split intonRow
- The number of rows in this column group in case the underlying column group does not knownColOrg
- The number of overall columns in the host CompressedMatrixBlock.- Returns:
- a list of split column groups
-
toString
public String toString()
- Overrides:
toString
in classAColGroupValue
-
-