Class AColGroup

    • Method Detail

      • getColIndices

        public final IColIndex getColIndices()
        Obtain the offsets of the columns in the matrix block that make up the group
        Returns:
        offsets of the columns in the matrix block that make up the group
      • getNumCols

        public final int getNumCols()
        Obtain the number of columns in this column group.
        Returns:
        number of columns in this column group
      • shiftColIndices

        public final AColGroup shiftColIndices​(int offset)
        Shift all column indexes contained by an offset. This is used for rbind to combine compressed matrices. Since column indexes are reused between operations, we allocate a new list here to be safe
        Parameters:
        offset - The offset to move all columns
        Returns:
        A new column group object with the shifted columns
      • copyAndSet

        public abstract AColGroup copyAndSet​(IColIndex colIndexes)
        Copy the content of the column group with pointers to the previous content but with new column given Note this method does not verify if the colIndexes specified are valid and correct dimensions for the underlying column groups.
        Parameters:
        colIndexes - the new indexes to use in the copy
        Returns:
        a new object with pointers to underlying data.
      • estimateInMemorySize

        public long estimateInMemorySize()
        Get the upper bound estimate of in memory allocation for the column group.
        Returns:
        an upper bound on the number of bytes used to store this ColGroup in memory.
      • decompressToSparseBlock

        public final void decompressToSparseBlock​(SparseBlock sb,
                                                  int rl,
                                                  int ru)
        Decompress a range of rows into a sparse block Note that this is using append, so the sparse column indexes need to be sorted afterwards.
        Parameters:
        sb - Sparse Target block
        rl - Row to start at
        ru - Row to end at
      • decompressToDenseBlock

        public final void decompressToDenseBlock​(DenseBlock db,
                                                 int rl,
                                                 int ru)
        Decompress a range of rows into a dense block
        Parameters:
        db - Dense target block
        rl - Row to start at
        ru - Row to end at
      • decompressToDenseBlockTransposed

        public abstract void decompressToDenseBlockTransposed​(DenseBlock db,
                                                              int rl,
                                                              int ru)
        Decompress a range of rows into a dense transposed block.
        Parameters:
        db - Dense target block
        rl - Row in this column group to start at.
        ru - Row in this column group to end at.
      • decompressToSparseBlockTransposed

        public abstract void decompressToSparseBlockTransposed​(SparseBlockMCSR sb,
                                                               int nColOut)
        Decompress the column group to the sparse transposed block. Note that the column groups would only need to decompress into specific sub rows of the Sparse block
        Parameters:
        sb - Sparse target block
        nColOut - The number of columns in the sb.
      • getExactSizeOnDisk

        public long getExactSizeOnDisk()
        Returns the exact serialized size of column group. This can be used for example for buffer preallocation.
        Returns:
        exact serialized size for column group
      • sliceColumns

        public final AColGroup sliceColumns​(int cl,
                                            int cu)
        Slice out the columns within the range of cl and cu to remove the dictionary values related to these columns. If the ColGroup slicing from does not contain any columns within the range null is returned.
        Parameters:
        cl - The lower bound of the columns to select
        cu - The upper bound of the columns to select (not inclusive).
        Returns:
        A cloned Column Group, with a copied pointer to the old column groups index structure, but reduced dictionary and _columnIndexes correctly aligned with the expected sliced compressed matrix.
      • sliceColumn

        public final AColGroup sliceColumn​(int col)
        Slice out a single column from the column group.
        Parameters:
        col - The column to slice, the column could potentially not be inside the column group
        Returns:
        A new column group that is a single column, if the column requested is not in this column group null is returned.
      • colSum

        public static double[] colSum​(Collection<AColGroup> groups,
                                      double[] res,
                                      int nRows)
        Compute the column sum of the given list of groups
        Parameters:
        groups - The Groups to sum
        res - The result to put the values into
        nRows - The number of rows in the groups
        Returns:
        The given res list, where the sum of the column groups is added
      • get

        public double get​(int r,
                          int c)
        Get the value at a global row/column position. In general this performs since a binary search of colIndexes is performed for each lookup.
        Parameters:
        r - row
        c - column
        Returns:
        value at the row/column position
      • getIdx

        public abstract double getIdx​(int r,
                                      int colIdx)
        Get the value at a colGroup specific row/column index position.
        Parameters:
        r - row
        colIdx - column index in the _colIndexes.
        Returns:
        value at the row/column index position
      • getNumValues

        public abstract int getNumValues()
        Obtain number of distinct tuples in contained sets of values associated with this column group. If the column group is uncompressed the number or rows is returned.
        Returns:
        the number of distinct sets of values associated with the bitmaps in this column group
      • getCompType

        public abstract AColGroup.CompressionType getCompType()
        Obtain the compression type.
        Returns:
        How the elements of the column group are compressed.
      • decompressToDenseBlock

        public abstract void decompressToDenseBlock​(DenseBlock db,
                                                    int rl,
                                                    int ru,
                                                    int offR,
                                                    int offC)
        Decompress into the DenseBlock. (no NNZ handling)
        Parameters:
        db - Target DenseBlock
        rl - Row to start decompression from
        ru - Row to end decompression at (not inclusive)
        offR - Row offset into the target to decompress
        offC - Column offset into the target to decompress
      • decompressToSparseBlock

        public abstract void decompressToSparseBlock​(SparseBlock sb,
                                                     int rl,
                                                     int ru,
                                                     int offR,
                                                     int offC)
        Decompress into the SparseBlock. (no NNZ handling) Note this method is allowing to calls to append since it is assumed that the sparse column indexes are sorted afterwards
        Parameters:
        sb - Target SparseBlock
        rl - Row to start decompression from
        ru - Row to end decompression at (not inclusive)
        offR - Row offset into the target to decompress
        offC - Column offset into the target to decompress
      • rightMultByMatrix

        public final AColGroup rightMultByMatrix​(MatrixBlock right)
        Right matrix multiplication with this column group. This method can return null, meaning that the output overlapping group would have been empty.
        Parameters:
        right - The MatrixBlock on the right of this matrix multiplication
        Returns:
        The new Column Group or null that is the result of the matrix multiplication.
      • rightMultByMatrix

        public abstract AColGroup rightMultByMatrix​(MatrixBlock right,
                                                    IColIndex allCols,
                                                    int k)
        Right matrix multiplication with this column group. This method can return null, meaning that the output overlapping group would have been empty.
        Parameters:
        right - The MatrixBlock on the right of this matrix multiplication
        allCols - A pre-materialized list of all col indexes, that can be shared across all column groups if use full, can be set to null.
        k - The parallelization degree allowed internally in this operation.
        Returns:
        The new Column Group or null that is the result of the matrix multiplication.
      • rightDecompressingMult

        public void rightDecompressingMult​(MatrixBlock right,
                                           MatrixBlock ret,
                                           int rl,
                                           int ru,
                                           int nRows,
                                           int crl,
                                           int cru)
        Right side Matrix multiplication, iterating though this column group and adding to the ret
        Parameters:
        right - Right side matrix to multiply with.
        ret - The return matrix to add results to
        rl - The row of this column group to multiply from
        ru - The row of this column group to multiply to (not inclusive)
        crl - The right hand side column lower
        cru - The right hand side column upper
        nRows - The number of rows in this column group
      • tsmm

        public abstract void tsmm​(MatrixBlock ret,
                                  int nRows)
        Do a transposed self matrix multiplication on the left side t(x) %*% x. but only with this column group. This gives better performance since there is no need to iterate through all the rows of the matrix, but the execution can be limited to its number of distinct values. Note it only calculate the upper triangle
        Parameters:
        ret - The return matrix block [numColumns x numColumns]
        nRows - The number of rows in the column group
      • leftMultByMatrixNoPreAgg

        public abstract void leftMultByMatrixNoPreAgg​(MatrixBlock matrix,
                                                      MatrixBlock result,
                                                      int rl,
                                                      int ru,
                                                      int cl,
                                                      int cu)
        Left multiply with this column group.
        Parameters:
        matrix - The matrix to multiply with on the left
        result - The result to output the values into, always dense for the purpose of the column groups parallelizing
        rl - The row to begin the multiplication from on the lhs matrix
        ru - The row to end the multiplication at on the lhs matrix
        cl - The column to begin the multiplication from on the lhs matrix
        cu - The column to end the multiplication at on the lhs matrix
      • leftMultByAColGroup

        public abstract void leftMultByAColGroup​(AColGroup lhs,
                                                 MatrixBlock result,
                                                 int nRows)
        Left side matrix multiplication with a column group that is transposed.
        Parameters:
        lhs - The left hand side Column group to multiply with, the left hand side should be considered transposed. Also it should be guaranteed that this column group is not empty.
        result - The result matrix to insert the result of the multiplication into
        nRows - Number of rows in the lhs colGroup
      • tsmmAColGroup

        public abstract void tsmmAColGroup​(AColGroup other,
                                           MatrixBlock result)
        Matrix multiply with this other column group, but: 1. Only output upper triangle values. 2. Multiply both ways with "this" being on the left and on the right. It should be guaranteed that the input is not the same as the caller of the method. The second step is achievable by treating the initial multiplied matrix, and adding its values to the correct locations in the output.
        Parameters:
        other - The other Column group to multiply with
        result - The result matrix to put the results into
      • scalarOperation

        public abstract AColGroup scalarOperation​(ScalarOperator op)
        Perform the specified scalar operation directly on the compressed column group, without decompressing individual cells if possible.
        Parameters:
        op - operation to perform
        Returns:
        version of this column group with the operation applied
      • binaryRowOpLeft

        public abstract AColGroup binaryRowOpLeft​(BinaryOperator op,
                                                  double[] v,
                                                  boolean isRowSafe)
        Perform a binary row operation.
        Parameters:
        op - The operation to execute
        v - The vector of values to apply the values contained should be at least the length of the highest value in the column index
        isRowSafe - True if the binary op is applied to an entire zero row and all results are zero
        Returns:
        A updated column group with the new values.
      • addVector

        public AColGroup addVector​(double[] v)
        Short hand add operator call on column group to add a row vector to the column group
        Parameters:
        v - The vector to add
        Returns:
        A new column group where the vector is added.
      • binaryRowOpRight

        public abstract AColGroup binaryRowOpRight​(BinaryOperator op,
                                                   double[] v,
                                                   boolean isRowSafe)
        Perform a binary row operation.
        Parameters:
        op - The operation to execute
        v - The vector of values to apply the values contained should be at least the length of the highest value in the column index
        isRowSafe - True if the binary op is applied to an entire zero row and all results are zero
        Returns:
        A updated column group with the new values.
      • unaryAggregateOperations

        public abstract void unaryAggregateOperations​(AggregateUnaryOperator op,
                                                      double[] c,
                                                      int nRows,
                                                      int rl,
                                                      int ru)
        Unary Aggregate operator, since aggregate operators require new object output, the output becomes an uncompressed matrix. The range of rl to ru only applies to row aggregates. (ReduceCol)
        Parameters:
        op - The operator used
        c - The output matrix block
        nRows - The total number of rows in the Column Group
        rl - The starting row to do aggregation from
        ru - The last row to do aggregation to (not included)
      • sliceRows

        public abstract AColGroup sliceRows​(int rl,
                                            int ru)
        Slice range of rows out of the column group and return a new column group only containing the row segment. Note that this slice should maintain pointers back to the original dictionaries and only modify index structures.
        Parameters:
        rl - The row to start at
        ru - The row to end at (not included)
        Returns:
        A new column group containing the specified row range.
      • getMin

        public abstract double getMin()
        Short hand method for getting minimum value contained in this column group.
        Returns:
        The minimum value contained in this ColumnGroup
      • getMax

        public abstract double getMax()
        Short hand method for getting maximum value contained in this column group.
        Returns:
        The maximum value contained in this ColumnGroup
      • getSum

        public abstract double getSum​(int nRows)
        Short hand method for getting the sum of this column group
        Parameters:
        nRows - The number of rows in the column group
        Returns:
        The sum of this column group
      • containsValue

        public abstract boolean containsValue​(double pattern)
        Detect if the column group contains a specific value.
        Parameters:
        pattern - The value to look for.
        Returns:
        boolean saying true if the value is contained.
      • getNumberNonZeros

        public abstract long getNumberNonZeros​(int nRows)
        Get the number of nonZeros contained in this column group.
        Parameters:
        nRows - The number of rows in the column group, this is used for groups that does not contain information about how many rows they have.
        Returns:
        The nnz.
      • replace

        public abstract AColGroup replace​(double pattern,
                                          double replace)
        Make a copy of the column group values, and replace all values that match pattern with replacement value.
        Parameters:
        pattern - The value to look for
        replace - The value to replace the other value with
        Returns:
        A new Column Group, reusing the index structure but with new values.
      • computeColSums

        public abstract void computeColSums​(double[] c,
                                            int nRows)
        Compute the column sum
        Parameters:
        c - The array to add the column sum to.
        nRows - The number of rows in the column group.
      • centralMoment

        public abstract CM_COV_Object centralMoment​(CMOperator op,
                                                    int nRows)
        Central Moment instruction executed on a column group.
        Parameters:
        op - The Operator to use.
        nRows - The number of rows contained in the ColumnGroup.
        Returns:
        A Central Moment object.
      • rexpandCols

        public abstract AColGroup rexpandCols​(int max,
                                              boolean ignore,
                                              boolean cast,
                                              int nRows)
        Expand the column group to multiple columns. (one hot encode the column group)
        Parameters:
        max - The number of columns to expand to and cutoff values at.
        ignore - If zero and negative values should be ignored.
        cast - If the double values contained should be cast to whole numbers.
        nRows - The number of rows in the column group.
        Returns:
        A new column group containing max number of columns.
      • getCost

        public abstract double getCost​(ComputationCostEstimator e,
                                       int nRows)
        Get the computation cost associated with this column group.
        Parameters:
        e - The computation cost estimator
        nRows - the number of rows in the column group
        Returns:
        The cost of this column group
      • unaryOperation

        public abstract AColGroup unaryOperation​(UnaryOperator op)
        Perform unary operation on the column group and return a new column group
        Parameters:
        op - The operation to perform
        Returns:
        The new column group
      • isEmpty

        public abstract boolean isEmpty()
        Get if the group is only containing zero
        Returns:
        true if empty
      • append

        public abstract AColGroup append​(AColGroup g)
        Append the other column group to this column group. This method tries to combine them to return a new column group containing both. In some cases it is possible in reasonable time, in others it is not. The result is first this column group followed by the other column group in higher row values. If it is not possible or very inefficient null is returned.
        Parameters:
        g - The other column group
        Returns:
        A combined column group or null
      • appendN

        public static AColGroup appendN​(AColGroup[] groups,
                                        int blen,
                                        int rlen)
        Append all column groups in the list provided together in one go allocating the output once. If it is not possible or very inefficient null is returned.
        Parameters:
        groups - The groups to combine.
        blen - The normal number of rows in the groups
        rlen - The total number of rows of all combined.
        Returns:
        A combined column group or null
      • getCompressionScheme

        public abstract ICLAScheme getCompressionScheme()
        Get the compression scheme for this column group to enable compression of other data.
        Returns:
        The compression scheme of this column group
      • clear

        public void clear()
        Clear variables that can be recomputed from the allocation of this column group.
      • recompress

        public abstract AColGroup recompress()
        Recompress this column group into a new column group.
        Returns:
        A new or the same column group depending on optimization goal.
      • morph

        public AColGroup morph​(AColGroup.CompressionType ct,
                               int nRow)
        Recompress this column group into a new column group of the given type.
        Parameters:
        ct - The compressionType that the column group should morph into
        nRow - The number of rows in this columngroup.
        Returns:
        A new column group
      • getCompressionInfo

        public abstract CompressedSizeInfoColGroup getCompressionInfo​(int nRow)
        Get the compression info for this column group.
        Parameters:
        nRow - The number of rows in this column group.
        Returns:
        The compression info for this group.
      • combine

        public AColGroup combine​(AColGroup other,
                                 int nRow)
        Combine this column group with another
        Parameters:
        other - The other column group to combine with.
        nRow - The number of rows in both column groups.
        Returns:
        A combined representation as a column group.
      • getEncoding

        public IEncode getEncoding()
        Get encoding of this column group.
        Returns:
        The encoding of the index structure.
      • sortColumnIndexes

        public AColGroup sortColumnIndexes()
      • reduceCols

        public abstract AColGroup reduceCols()
        Perform row sum on the internal dictionaries, and return the same index structure. This method returns null on empty column groups. Note this method does not guarantee correct behavior if the given group is AMorphingGroup, instead it should be morphed to a valid columngroup via extractCommon first.
        Returns:
        The reduced colgroup.
      • selectionMultiply

        public final void selectionMultiply​(MatrixBlock selection,
                                            ColGroupUtils.P[] points,
                                            MatrixBlock ret,
                                            int rl,
                                            int ru)
        Selection (left matrix multiply)
        Parameters:
        selection - A sparse matrix with "max" a single one in each row all other values are zero.
        points - The coordinates in the selection matrix to extract.
        ret - The MatrixBlock to decompress the selected rows into
        rl - The row to start at in the selection matrix
        ru - the row to end at in the selection matrix (not inclusive)
      • getSparsity

        public abstract double getSparsity()
        Get an approximate sparsity of this column group
        Returns:
        the approximate sparsity of this columngroup
      • sameIndexStructure

        public boolean sameIndexStructure​(AColGroup that)
        Method to determine if the columnGroup have the same index structure as another. Note that the column indexes and dictionaries are allowed to be different.
        Parameters:
        that - the other column group
        Returns:
        if the index is the same.
      • combineWithSameIndex

        public AColGroup combineWithSameIndex​(int nRow,
                                              int nCol,
                                              List<AColGroup> right)
        C bind the list of column groups with this column group. the list of elements provided in the index of each list is guaranteed to have the same index structures
        Parameters:
        nRow - The number of rows contained in all right and this column group.
        nCol - The number of columns to shift the right hand side column groups over when combining, this should only effect the column indexes
        right - The right hand side column groups to combine. NOTE only the index offset of the second nested list should be used. The reason for providing this nested list is to avoid redundant allocations in calling methods.
        Returns:
        A combined compressed column group of the same type as this!.
      • combineWithSameIndex

        public AColGroup combineWithSameIndex​(int nRow,
                                              int nCol,
                                              AColGroup right)
        C bind the given column group to this.
        Parameters:
        nRow - The number of rows contained in the right and this column group.
        nCol - The number of columns in this.
        right - The column group to c-bind.
        Returns:
        a new combined column groups.
      • splitReshape

        public abstract AColGroup[] splitReshape​(int multiplier,
                                                 int nRow,
                                                 int nColOrg)
        This method returns a list of column groups that are naive splits of this column group as if it is reshaped. This means the column groups rows are split into x number of other column groups where x is the multiplier. The indexes are assigned round robbin to each of the output groups, meaning the first index is assigned. If for instance the 4. column group is split by a 2 multiplier and there was 5 columns in total originally. The output becomes 2 column groups at column index 4 and one at 9. If possible the split column groups should reuse pointers back to the original dictionaries!
        Parameters:
        multiplier - The number of column groups to split into
        nRow - The number of rows in this column group in case the underlying column group does not know
        nColOrg - The number of overall columns in the host CompressedMatrixBlock.
        Returns:
        a list of split column groups
      • splitReshapePushDown

        public AColGroup[] splitReshapePushDown​(int multiplier,
                                                int nRow,
                                                int nColOrg,
                                                ExecutorService pool)
                                         throws Exception
        This method returns a list of column groups that are naive splits of this column group as if it is reshaped. This means the column groups rows are split into x number of other column groups where x is the multiplier. The indexes are assigned round robbin to each of the output groups, meaning the first index is assigned. If for instance the 4. column group is split by a 2 multiplier and there was 5 columns in total originally. The output becomes 2 column groups at column index 4 and one at 9. If possible the split column groups should reuse pointers back to the original dictionaries! This specific variation is pushing down the parallelization given via the executor service provided. If not overwritten the default is to call the normal split reshape
        Parameters:
        multiplier - The number of column groups to split into
        nRow - The number of rows in this column group in case the underlying column group does not know
        nColOrg - The number of overall columns in the host CompressedMatrixBlock
        pool - The executor service to submit parallel tasks to
        Returns:
        a list of split column groups
        Throws:
        Exception - In case there is an error we throw the exception out instead of handling it