Class LibMatrixReorg


  • public class LibMatrixReorg
    extends Object
    MB: Library for selected matrix reorg operations including special cases and all combinations of dense and sparse representations. Current list of supported operations: - reshape, - r' (transpose), - rdiag (diagV2M/diagM2V), - rsort (sorting data/indexes) - rmempty (remove empty) - rexpand (outer/table-seq expansion)
    • Field Detail

      • PAR_NUMCELL_THRESHOLD

        public static long PAR_NUMCELL_THRESHOLD
      • PAR_NUMCELL_THRESHOLD_SORT

        public static final int PAR_NUMCELL_THRESHOLD_SORT
        See Also:
        Constant Field Values
      • SPARSE_OUTPUTS_IN_CSR

        public static final boolean SPARSE_OUTPUTS_IN_CSR
        See Also:
        Constant Field Values
      • TRANSPOSE_IN_PLACE_DENSE_LEGACY

        public static final boolean TRANSPOSE_IN_PLACE_DENSE_LEGACY
        See Also:
        Constant Field Values
    • Method Detail

      • isSupportedReorgOperator

        public static boolean isSupportedReorgOperator​(ReorgOperator op)
      • sort

        public static MatrixBlock sort​(MatrixBlock in,
                                       MatrixBlock out,
                                       int[] by,
                                       boolean desc,
                                       boolean ixret,
                                       int k)
        Parameters:
        in - Input matrix to sort
        out - Output matrix where the sorted input is inserted to
        by - The Ordering parameter
        desc - A boolean, specifying if it should be descending order.
        ixret - A boolean, specifying if the return should be the sorted indexes.
        k - Number of parallel threads
        Returns:
        The sorted out matrix.
      • reshape

        public static MatrixBlock reshape​(MatrixBlock in,
                                          int rows,
                                          int cols,
                                          boolean rowwise)
        CP reshape operation (single input, single output matrix) NOTE: In contrast to R, the rowwise parameter specifies both the read and write order, with row-wise being the default, while R uses always a column-wise read, rowwise specifying the write order and column-wise being the default.
        Parameters:
        in - input matrix
        rows - number of rows
        cols - number of columns
        rowwise - if true, reshape by row
        Returns:
        output matrix
      • reshape

        public static MatrixBlock reshape​(MatrixBlock in,
                                          MatrixBlock out,
                                          int rows,
                                          int cols,
                                          boolean rowwise)
        CP reshape operation (single input, single output matrix) NOTE: In contrast to R, the rowwise parameter specifies both the read and write order, with row-wise being the default, while R uses always a column-wise read, rowwise specifying the write order and column-wise being the default.
        Parameters:
        in - input matrix
        out - output matrix
        rows - number of rows
        cols - number of columns
        rowwise - if true, reshape by row
        Returns:
        output matrix
      • reshape

        public static MatrixBlock reshape​(MatrixBlock in,
                                          MatrixBlock out,
                                          int rows,
                                          int cols,
                                          boolean rowwise,
                                          int k)
        CP reshape operation (single input, single output matrix) NOTE: In contrast to R, the rowwise parameter specifies both the read and write order, with row-wise being the default, while R uses always a column-wise read, rowwise specifying the write order and column-wise being the default.
        Parameters:
        in - input matrix
        out - output matrix
        rows - number of rows
        cols - number of columns
        rowwise - if true, reshape by row
        k - The parallelization degree
        Returns:
        output matrix
      • reshape

        public static List<IndexedMatrixValue> reshape​(IndexedMatrixValue in,
                                                       DataCharacteristics mcIn,
                                                       DataCharacteristics mcOut,
                                                       boolean rowwise,
                                                       boolean outputEmptyBlocks)
        MR/SPARK reshape interface - for reshape we cannot view blocks independently, and hence, there are different CP and MR interfaces.
        Parameters:
        in - indexed matrix value
        mcIn - input matrix characteristics
        mcOut - output matrix characteristics
        rowwise - if true, reshape by row
        outputEmptyBlocks - output blocks with nnz=0
        Returns:
        list of indexed matrix values
      • rmempty

        public static MatrixBlock rmempty​(MatrixBlock in,
                                          MatrixBlock ret,
                                          boolean rows,
                                          boolean emptyReturn,
                                          MatrixBlock select)
        CP rmempty operation (single input, single output matrix)
        Parameters:
        in - input matrix
        ret - output matrix
        rows - ?
        emptyReturn - return row/column of zeros for empty input
        select - ?
        Returns:
        matrix block
      • rmempty

        public static void rmempty​(IndexedMatrixValue data,
                                   IndexedMatrixValue offset,
                                   boolean rmRows,
                                   long len,
                                   long blen,
                                   ArrayList<IndexedMatrixValue> outList)
        MR rmempty interface - for rmempty we cannot view blocks independently, and hence, there are different CP and MR interfaces.
        Parameters:
        data - ?
        offset - ?
        rmRows - ?
        len - ?
        blen - block length
        outList - list of indexed matrix values
      • rexpand

        public static MatrixBlock rexpand​(MatrixBlock in,
                                          MatrixBlock ret,
                                          double max,
                                          boolean rows,
                                          boolean cast,
                                          boolean ignore,
                                          int k)
        CP rexpand operation (single input, single output), the classic example of this operation is one hot encoding of a column to multiple columns.
        Parameters:
        in - Input matrix
        ret - Output matrix
        max - Number of rows/cols of the output
        rows - If the expansion is in rows direction
        cast - If the values contained should be cast to double (rounded up and down)
        ignore - Ignore if the input contain values below zero that technically is incorrect input.
        k - Degree of parallelism
        Returns:
        Output matrix rexpanded
      • rexpand

        public static MatrixBlock rexpand​(MatrixBlock in,
                                          MatrixBlock ret,
                                          int max,
                                          boolean rows,
                                          boolean cast,
                                          boolean ignore,
                                          int k)
        CP rexpand operation (single input, single output), the classic example of this operation is one hot encoding of a column to multiple columns.
        Parameters:
        in - Input matrix
        ret - Output matrix
        max - Number of rows/cols of the output
        rows - If the expansion is in rows direction
        cast - If the values contained should be cast to double (rounded up and down)
        ignore - Ignore if the input contain values below zero that technically is incorrect input.
        k - Degree of parallelism
        Returns:
        Output matrix rexpanded
      • fusedSeqRexpand

        public static MatrixBlock fusedSeqRexpand​(int seqHeight,
                                                  MatrixBlock A,
                                                  double w)
        The DML code to activate this function:

        ret = table(seq(1, nrow(A)), A, w)

        Parameters:
        seqHeight - A sequence vector height.
        A - The MatrixBlock vector to encode.
        w - The weight matrix to multiply on output cells.
        Returns:
        A new MatrixBlock with the table result.
      • fusedSeqRexpand

        public static MatrixBlock fusedSeqRexpand​(int seqHeight,
                                                  MatrixBlock A,
                                                  double w,
                                                  MatrixBlock ret,
                                                  boolean updateClen)
        The DML code to activate this function:

        ret = table(seq(1, nrow(A)), A, w)

        Parameters:
        seqHeight - A sequence vector height.
        A - The MatrixBlock vector to encode.
        w - The weight scalar to multiply on output cells.
        ret - The output MatrixBlock, does not have to be used, but depending on updateClen determine the output size.
        updateClen - Update clen, if set to true, ignore dimensions of ret, otherwise use the column dimension of ret.
        Returns:
        A new MatrixBlock or ret.
      • fusedSeqRexpand

        public static MatrixBlock fusedSeqRexpand​(int seqHeight,
                                                  MatrixBlock A,
                                                  double w,
                                                  MatrixBlock ret,
                                                  boolean updateClen,
                                                  int k)
        The DML code to activate this function:

        ret = table(seq(1, nrow(A)), A, w)

        Parameters:
        seqHeight - A sequence vector height.
        A - The MatrixBlock vector to encode.
        w - The weight matrix to multiply on output cells.
        ret - The output MatrixBlock, does not have to be used, but depending on updateClen determine the output size.
        updateClen - Update clen, if set to true, ignore dimensions of ret, otherwise use the column dimension of ret.
        k - Parallelization degree
        Returns:
        A new MatrixBlock or ret.
      • rexpandSingleRow

        public static int rexpandSingleRow​(int row,
                                           double v2,
                                           double w,
                                           int[] retIx,
                                           double[] retVals,
                                           boolean updateClen,
                                           int maxOutCol)
      • checkRexpand

        public static void checkRexpand​(MatrixBlock in,
                                        boolean ignore)
        Quick check if the input is valid for rexpand, this check does not guarantee that the input is valid for rexpand
        Parameters:
        in - Input matrix block
        ignore - If zero valued cells should be ignored
      • rexpand

        public static void rexpand​(IndexedMatrixValue data,
                                   double max,
                                   boolean rows,
                                   boolean cast,
                                   boolean ignore,
                                   long blen,
                                   ArrayList<IndexedMatrixValue> outList)
        MR/Spark rexpand operation (single input, multiple outputs incl empty blocks)
        Parameters:
        data - Input indexed matrix block
        max - Total nrows/cols of the output
        rows - If the expansion is in rows direction
        cast - If the values contained should be cast to double (rounded up and down)
        ignore - Ignore if the input contain values below zero that technically is incorrect input.
        blen - The block size to slice the output up into
        outList - The output indexedMatrixValues (a list to add all the output blocks to / modify)
      • countNnzPerColumn

        public static int[] countNnzPerColumn​(MatrixBlock in)
      • countNnzPerColumn

        public static int[] countNnzPerColumn​(MatrixBlock in,
                                              int rl,
                                              int ru)
      • mergeNnzCounts

        public static int[] mergeNnzCounts​(int[] cnt,
                                           int[] cnt2)
      • copyMtx

        public static void copyMtx​(MatrixBlock in,
                                   MatrixBlock out,
                                   int inStart,
                                   int outStart,
                                   int copyLen,
                                   boolean isAllocated,
                                   boolean copyTotalNonZeros)
      • copyDenseMtx

        public static void copyDenseMtx​(MatrixBlock in,
                                        MatrixBlock out,
                                        int inIdx,
                                        int outIdx,
                                        int copyLen,
                                        boolean isAllocated,
                                        boolean copyTotalNonZeros)
      • transposeInPlaceDenseBrenner

        public static void transposeInPlaceDenseBrenner​(MatrixBlock in,
                                                        int k)
        Transposes a dense matrix in-place using following cycles based on Brenner's method. This method shifts cycles with a focus on less storage by using cycle leaders based on prime factorization. The used storage is in O(n+m). Quadratic matrices should be handled outside this method (using the trivial method) for a speedup. This method is based on: Algorithm 467, Brenner, https://dl.acm.org/doi/pdf/10.1145/355611.362542.
        Parameters:
        in - The input matrix to be transposed.
        k - The number of threads.