Class MatrixBlockDictionary

    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addToEntry​(double[] v, int fr, int to, int nCol)
      Copies and adds the dictionary entry from this dictionary to the d dictionary
      void addToEntry​(double[] v, int fr, int to, int nCol, int rep)
      copies and adds the dictonary entry from this dictionary yo the d dictionary rep times.
      void addToEntryVectorized​(double[] v, int f1, int f2, int f3, int f4, int f5, int f6, int f7, int f8, int t1, int t2, int t3, int t4, int t5, int t6, int t7, int t8, int nCol)  
      double aggregate​(double init, Builtin fn)
      Aggregate all the contained values, useful in value only computations where the operation is iterating through all values contained in the dictionary.
      void aggregateCols​(double[] c, Builtin fn, int[] colIndexes)
      Aggregates the columns into the target double array provided.
      void aggregateColsWithReference​(double[] c, Builtin fn, int[] colIndexes, double[] reference, boolean def)
      Aggregates the columns into the target double array provided.
      double[] aggregateRows​(Builtin fn, int nCol)
      Aggregate all entries in the rows.
      double[] aggregateRowsWithDefault​(Builtin fn, double[] defaultTuple)
      Aggregate all entries in the rows of the dictionary with a extra cell in the end that contains the aggregate of the given defaultTuple.
      double[] aggregateRowsWithReference​(Builtin fn, double[] reference)
      Aggregate all entries in the rows with an offset value reference added.
      double aggregateWithReference​(double init, Builtin fn, double[] reference, boolean def)
      Aggregate all the contained values, with a reference offset.
      ADictionary applyScalarOp​(ScalarOperator op)
      Allocate a new dictionary and applies the scalar operation on each cell of the to then return the new dictionary.
      ADictionary applyScalarOpWithReference​(ScalarOperator op, double[] reference, double[] newReference)
      Allocate a new dictionary and apply the scalar operation on each cell to then return a new dictionary.
      ADictionary applyUnaryOp​(UnaryOperator op)
      Allocate a new dictionary and apply the unary operator on each cell.
      ADictionary applyUnaryOpWithReference​(UnaryOperator op, double[] reference, double[] newReference)
      Allocate a new dictionary and apply the scalar operation on each cell to then return a new dictionary.
      ADictionary binOpLeft​(BinaryOperator op, double[] v, int[] colIndexes)
      Apply binary row operation on the left side in place
      Dictionary binOpLeftWithReference​(BinaryOperator op, double[] v, int[] colIndexes, double[] reference, double[] newReference)
      Apply the binary operator such that each value is offset by the reference before application.
      MatrixBlockDictionary binOpRight​(BinaryOperator op, double[] v)
      Apply binary row operation on the right side.
      MatrixBlockDictionary binOpRight​(BinaryOperator op, double[] v, int[] colIndexes)
      Apply binary row operation on the right side.
      Dictionary binOpRightWithReference​(BinaryOperator op, double[] v, int[] colIndexes, double[] reference, double[] newReference)
      Apply the binary operator such that each value is offset by the reference before application.
      CM_COV_Object centralMoment​(CM_COV_Object ret, ValueFunction fn, int[] counts, int nRows)
      Central moment function to calculate the central moment of this column group.
      CM_COV_Object centralMomentWithReference​(CM_COV_Object ret, ValueFunction fn, int[] counts, double reference, int nRows)
      Central moment function to calculate the central moment of this column group with a reference offset on each tuple.
      ADictionary clone()
      Returns a deep clone of the dictionary.
      void colSum​(double[] c, int[] counts, int[] colIndexes)
      Get the column sum of the values contained in the dictionary
      void colSumSq​(double[] c, int[] counts, int[] colIndexes)
      Get the column sum of the values contained in the dictionary
      void colSumSqWithReference​(double[] c, int[] counts, int[] colIndexes, double[] reference)
      Get the column sum of the values contained in the dictionary with an offset reference value added to each cell.
      boolean containsValue​(double pattern)
      Detect if the dictionary contains a specific value.
      boolean containsValueWithReference​(double pattern, double[] reference)
      Detect if the dictionary contains a specific value with reference offset.
      static MatrixBlockDictionary createDictionary​(double[] values, int nCol)  
      long getExactSizeOnDisk()
      Calculate the space consumption if the dictionary is stored on disk.
      long getInMemorySize()
      Returns the memory usage of the dictionary.
      static long getInMemorySize​(int numberValues, int numberColumns, double sparsity)  
      MatrixBlock getMatrixBlock()  
      MatrixBlockDictionary getMBDict​(int nCol)
      Get this dictionary as a MatrixBlock dictionary.
      long getNumberNonZeros​(int[] counts, int nCol)
      Calculate the number of non zeros in the dictionary.
      long getNumberNonZerosWithReference​(int[] counts, double[] reference, int nRows)
      Calculate the number of non zeros in the dictionary.
      int getNumberOfValues​(int ncol)
      Get the number of distinct tuples given that the column group has n columns
      double getSparsity()
      Get the sparsity of the dictionary.
      String getString​(int colIndexes)
      Get a string representation of the dictionary, that considers the layout of the data.
      double getValue​(int i)
      Get Specific value contained in the dictionary at index.
      double[] getValues()
      Get all the values contained in the dictionary as a linearized double array.
      ADictionary inplaceScalarOp​(ScalarOperator op)
      Applies the scalar operation on the dictionary.
      boolean isLossy()
      Specify if the Dictionary is lossy.
      void multiplyScalar​(double v, double[] ret, int off, int dictIdx, int[] cols)
      Multiply the v value with the dictionary entry at dictIdx and add it to the ret matrix at the columns specified in the int array.
      MatrixBlockDictionary preaggValuesFromDense​(int numVals, int[] colIndexes, int[] aggregateColumns, double[] b, int cut)
      Pre Aggregate values for Right Matrix Multiplication.
      void product​(double[] ret, int[] counts, int nCol)
      Calculate the product of the dictionary weighted by counts.
      void productWithDefault​(double[] ret, int[] counts, double[] def, int defCount)
      Calculate the product of the dictionary weighted by counts with a default value added .
      void productWithReference​(double[] ret, int[] counts, double[] reference, int refCount)
      Calculate the product of the dictionary weighted by counts and offset by reference
      static MatrixBlockDictionary read​(DataInput in)  
      ADictionary replace​(double pattern, double replace, int nCol)
      Make a copy of the values, and replace all values that match pattern with replacement value.
      ADictionary replaceWithReference​(double pattern, double replace, double[] reference)
      Make a copy of the values, and replace all values that match pattern with replacement value.
      ADictionary rexpandCols​(int max, boolean ignore, boolean cast, int nCol)
      Rexpand the dictionary (one hot encode)
      ADictionary rexpandColsWithReference​(int max, boolean ignore, boolean cast, double reference)
      Rexpand the dictionary (one hot encode)
      ADictionary scaleTuples​(int[] scaling, int nCol)
      Scale all tuples contained in the dictionary by the scaling factor given in the int list.
      ADictionary sliceOutColumnRange​(int idxStart, int idxEnd, int previousNumberOfColumns)
      Modify the dictionary by removing columns not within the index range.
      ADictionary subtractTuple​(double[] tuple)
      Allocate a new dictionary where the tuple given is subtracted from all tuples in the previous dictionary.
      double sum​(int[] counts, int ncol)
      Get the sum of the values contained in the dictionary
      double[] sumAllRowsToDouble​(int nrColumns)
      Method used as a pre-aggregate of each tuple in the dictionary, to single double values.
      double[] sumAllRowsToDoubleSq​(int nrColumns)
      Method used as a pre-aggregate of each tuple in the dictionary, to single double values.
      double[] sumAllRowsToDoubleSqWithDefault​(double[] defaultTuple)
      Method used as a pre-aggregate of each tuple in the dictionary, to single double values.
      double[] sumAllRowsToDoubleSqWithReference​(double[] reference)
      Method used as a pre-aggregate of each tuple in the dictionary, to single double values.
      double[] sumAllRowsToDoubleWithDefault​(double[] defaultTuple)
      Do exactly the same as the sumAllRowsToDouble but also sum the array given to a extra index in the end of the array.
      double[] sumAllRowsToDoubleWithReference​(double[] reference)
      Method used as a pre-aggregate of each tuple in the dictionary, to single double values with a reference.
      double sumSq​(int[] counts, int ncol)
      Get the square sum of the values contained in the dictionary
      double sumSqWithReference​(int[] counts, double[] reference)
      Get the square sum of the values contained in the dictionary with a reference offset on each value.
      String toString()  
      void write​(DataOutput out)
      Write the dictionary to a DataOutput.
    • Constructor Detail

      • MatrixBlockDictionary

        public MatrixBlockDictionary​(MatrixBlock data,
                                     int nCol)
    • Method Detail

      • createDictionary

        public static MatrixBlockDictionary createDictionary​(double[] values,
                                                             int nCol)
      • getValues

        public double[] getValues()
        Description copied from class: ADictionary
        Get all the values contained in the dictionary as a linearized double array.
        Specified by:
        getValues in class ADictionary
        Returns:
        linearized double array
      • getValue

        public double getValue​(int i)
        Description copied from class: ADictionary
        Get Specific value contained in the dictionary at index.
        Specified by:
        getValue in class ADictionary
        Parameters:
        i - The index to extract the value from
        Returns:
        The value contained at the index
      • getInMemorySize

        public long getInMemorySize()
        Description copied from class: ADictionary
        Returns the memory usage of the dictionary.
        Specified by:
        getInMemorySize in class ADictionary
        Returns:
        a long value in number of bytes for the dictionary.
      • getInMemorySize

        public static long getInMemorySize​(int numberValues,
                                           int numberColumns,
                                           double sparsity)
      • aggregate

        public double aggregate​(double init,
                                Builtin fn)
        Description copied from class: ADictionary
        Aggregate all the contained values, useful in value only computations where the operation is iterating through all values contained in the dictionary.
        Specified by:
        aggregate in class ADictionary
        Parameters:
        init - The initial Value, in cases such as Max value, this could be -infinity
        fn - The Function to apply to values
        Returns:
        The aggregated value as a double.
      • aggregateWithReference

        public double aggregateWithReference​(double init,
                                             Builtin fn,
                                             double[] reference,
                                             boolean def)
        Description copied from class: ADictionary
        Aggregate all the contained values, with a reference offset.
        Specified by:
        aggregateWithReference in class ADictionary
        Parameters:
        init - The initial value, in cases such as Max value this could be -infinity.
        fn - The function to apply to the values
        reference - The reference offset to each value in the dictionary
        def - If the reference should be treated as an instance of only as reference
        Returns:
        The aggregated value as a double.
      • aggregateRows

        public double[] aggregateRows​(Builtin fn,
                                      int nCol)
        Description copied from class: ADictionary
        Aggregate all entries in the rows.
        Specified by:
        aggregateRows in class ADictionary
        Parameters:
        fn - The aggregate function
        nCol - The number of columns contained in the dictionary.
        Returns:
        Aggregates for this dictionary tuples.
      • aggregateRowsWithDefault

        public double[] aggregateRowsWithDefault​(Builtin fn,
                                                 double[] defaultTuple)
        Description copied from class: ADictionary
        Aggregate all entries in the rows of the dictionary with a extra cell in the end that contains the aggregate of the given defaultTuple.
        Specified by:
        aggregateRowsWithDefault in class ADictionary
        Parameters:
        fn - The aggregate function
        defaultTuple - The default tuple to aggregate in last cell
        Returns:
        Aggregates for this dictionary tuples.
      • aggregateRowsWithReference

        public double[] aggregateRowsWithReference​(Builtin fn,
                                                   double[] reference)
        Description copied from class: ADictionary
        Aggregate all entries in the rows with an offset value reference added.
        Specified by:
        aggregateRowsWithReference in class ADictionary
        Parameters:
        fn - The aggregate function
        reference - The reference offset to each value in the dictionary
        Returns:
        Aggregates for this dictionary tuples.
      • aggregateCols

        public void aggregateCols​(double[] c,
                                  Builtin fn,
                                  int[] colIndexes)
        Description copied from class: ADictionary
        Aggregates the columns into the target double array provided.
        Specified by:
        aggregateCols in class ADictionary
        Parameters:
        c - The target double array, this contains the full number of columns, therefore the colIndexes for this specific dictionary is needed.
        fn - The function to apply to individual columns
        colIndexes - The mapping to the target columns from the individual columns
      • aggregateColsWithReference

        public void aggregateColsWithReference​(double[] c,
                                               Builtin fn,
                                               int[] colIndexes,
                                               double[] reference,
                                               boolean def)
        Description copied from class: ADictionary
        Aggregates the columns into the target double array provided.
        Specified by:
        aggregateColsWithReference in class ADictionary
        Parameters:
        c - The target double array, this contains the full number of columns, therefore the colIndexes for this specific dictionary is needed.
        fn - The function to apply to individual columns
        colIndexes - The mapping to the target columns from the individual columns
        reference - The reference offset values to add to each cell.
        def - If the reference should be treated as a tuple as well
      • applyScalarOp

        public ADictionary applyScalarOp​(ScalarOperator op)
        Description copied from class: ADictionary
        Allocate a new dictionary and applies the scalar operation on each cell of the to then return the new dictionary.
        Specified by:
        applyScalarOp in class ADictionary
        Parameters:
        op - The operator.
        Returns:
        The new dictionary to return.
      • applyUnaryOp

        public ADictionary applyUnaryOp​(UnaryOperator op)
        Description copied from class: ADictionary
        Allocate a new dictionary and apply the unary operator on each cell.
        Specified by:
        applyUnaryOp in class ADictionary
        Parameters:
        op - the operator.
        Returns:
        The new dictionary to return.
      • applyScalarOpWithReference

        public ADictionary applyScalarOpWithReference​(ScalarOperator op,
                                                      double[] reference,
                                                      double[] newReference)
        Description copied from class: ADictionary
        Allocate a new dictionary and apply the scalar operation on each cell to then return a new dictionary. outValues[j] = op(this.values[j] + reference[i]) - newReference[i]
        Specified by:
        applyScalarOpWithReference in class ADictionary
        Parameters:
        op - The operator to apply to each cell.
        reference - The reference value to add before the operator.
        newReference - The reference value to subtract after the operator.
        Returns:
        A New Dictionary.
      • applyUnaryOpWithReference

        public ADictionary applyUnaryOpWithReference​(UnaryOperator op,
                                                     double[] reference,
                                                     double[] newReference)
        Description copied from class: ADictionary
        Allocate a new dictionary and apply the scalar operation on each cell to then return a new dictionary. outValues[j] = op(this.values[j] + reference[i]) - newReference[i]
        Specified by:
        applyUnaryOpWithReference in class ADictionary
        Parameters:
        op - The unary operator to apply to each cell.
        reference - The reference value to add before the operator.
        newReference - The reference value to subtract after the operator.
        Returns:
        A New Dictionary.
      • inplaceScalarOp

        public ADictionary inplaceScalarOp​(ScalarOperator op)
        Description copied from class: ADictionary
        Applies the scalar operation on the dictionary. Note that this operation modifies the underlying data, and normally require a copy of the original Dictionary to preserve old objects.
        Specified by:
        inplaceScalarOp in class ADictionary
        Parameters:
        op - The operator to apply to the dictionary values.
        Returns:
        this dictionary with modified values.
      • binOpLeft

        public ADictionary binOpLeft​(BinaryOperator op,
                                     double[] v,
                                     int[] colIndexes)
        Description copied from class: ADictionary
        Apply binary row operation on the left side in place
        Specified by:
        binOpLeft in class ADictionary
        Parameters:
        op - The operation to this dictionary
        v - The values to use on the left hand side.
        colIndexes - The column indexes to consider inside v.
        Returns:
        A new dictionary containing the updated values.
      • binOpLeftWithReference

        public Dictionary binOpLeftWithReference​(BinaryOperator op,
                                                 double[] v,
                                                 int[] colIndexes,
                                                 double[] reference,
                                                 double[] newReference)
        Description copied from class: ADictionary
        Apply the binary operator such that each value is offset by the reference before application. Then put the result into the new dictionary, but offset it by the new reference. outValues[j] = op(v[colIndexes[i]], this.values[j] + reference[i]) - newReference[i]
        Specified by:
        binOpLeftWithReference in class ADictionary
        Parameters:
        op - The operation to apply on the dictionary values.
        v - The values to use on the left side of the operator.
        colIndexes - The column indexes to use.
        reference - The reference value to add before operator.
        newReference - The reference value to subtract after operator.
        Returns:
        A new dictionary.
      • binOpRight

        public MatrixBlockDictionary binOpRight​(BinaryOperator op,
                                                double[] v,
                                                int[] colIndexes)
        Description copied from class: ADictionary
        Apply binary row operation on the right side.
        Specified by:
        binOpRight in class ADictionary
        Parameters:
        op - The operation to this dictionary
        v - The values to use on the right hand side.
        colIndexes - The column indexes to consider inside v.
        Returns:
        A new dictionary containing the updated values.
      • binOpRight

        public MatrixBlockDictionary binOpRight​(BinaryOperator op,
                                                double[] v)
        Description copied from class: ADictionary
        Apply binary row operation on the right side.
        Specified by:
        binOpRight in class ADictionary
        Parameters:
        op - The operation to this dictionary
        v - The values to apply on the dictionary (same number of cols as the dictionary)
        Returns:
        A new dictionary containing the updated values.
      • binOpRightWithReference

        public Dictionary binOpRightWithReference​(BinaryOperator op,
                                                  double[] v,
                                                  int[] colIndexes,
                                                  double[] reference,
                                                  double[] newReference)
        Description copied from class: ADictionary
        Apply the binary operator such that each value is offset by the reference before application. Then put the result into the new dictionary, but offset it by the new reference. outValues[j] = op(this.values[j] + reference[i], v[colIndexes[i]]) - newReference[i]
        Specified by:
        binOpRightWithReference in class ADictionary
        Parameters:
        op - The operation to apply on the dictionary values.
        v - The values to use on the right side of the operator.
        colIndexes - The column indexes to use.
        reference - The reference value to add before operator.
        newReference - The reference value to subtract after operator.
        Returns:
        A new dictionary.
      • isLossy

        public boolean isLossy()
        Description copied from class: ADictionary
        Specify if the Dictionary is lossy.
        Specified by:
        isLossy in class ADictionary
        Returns:
        A boolean
      • getNumberOfValues

        public int getNumberOfValues​(int ncol)
        Description copied from class: ADictionary
        Get the number of distinct tuples given that the column group has n columns
        Specified by:
        getNumberOfValues in class ADictionary
        Parameters:
        ncol - The number of Columns in the ColumnGroup.
        Returns:
        the number of value tuples contained in the dictionary.
      • sumAllRowsToDouble

        public double[] sumAllRowsToDouble​(int nrColumns)
        Description copied from class: ADictionary
        Method used as a pre-aggregate of each tuple in the dictionary, to single double values. Note if the number of columns is one the actual dictionaries values are simply returned.
        Specified by:
        sumAllRowsToDouble in class ADictionary
        Parameters:
        nrColumns - The number of columns in the ColGroup to know how to get the values from the dictionary.
        Returns:
        a double array containing the row sums from this dictionary.
      • sumAllRowsToDoubleWithDefault

        public double[] sumAllRowsToDoubleWithDefault​(double[] defaultTuple)
        Description copied from class: ADictionary
        Do exactly the same as the sumAllRowsToDouble but also sum the array given to a extra index in the end of the array.
        Specified by:
        sumAllRowsToDoubleWithDefault in class ADictionary
        Parameters:
        defaultTuple - The default row to sum in the end index returned.
        Returns:
        a double array containing the row sums from this dictionary.
      • sumAllRowsToDoubleWithReference

        public double[] sumAllRowsToDoubleWithReference​(double[] reference)
        Description copied from class: ADictionary
        Method used as a pre-aggregate of each tuple in the dictionary, to single double values with a reference.
        Specified by:
        sumAllRowsToDoubleWithReference in class ADictionary
        Parameters:
        reference - The reference values to add to each cell.
        Returns:
        a double array containing the row sums from this dictionary.
      • sumAllRowsToDoubleSq

        public double[] sumAllRowsToDoubleSq​(int nrColumns)
        Description copied from class: ADictionary
        Method used as a pre-aggregate of each tuple in the dictionary, to single double values. Note if the number of columns is one the actual dictionaries values are simply returned.
        Specified by:
        sumAllRowsToDoubleSq in class ADictionary
        Parameters:
        nrColumns - The number of columns in the ColGroup to know how to get the values from the dictionary.
        Returns:
        a double array containing the row sums from this dictionary.
      • sumAllRowsToDoubleSqWithDefault

        public double[] sumAllRowsToDoubleSqWithDefault​(double[] defaultTuple)
        Description copied from class: ADictionary
        Method used as a pre-aggregate of each tuple in the dictionary, to single double values. But adds another cell to the return with an extra value that is the sum of the given defaultTuple.
        Specified by:
        sumAllRowsToDoubleSqWithDefault in class ADictionary
        Parameters:
        defaultTuple - The default row to sum in the end index returned.
        Returns:
        a double array containing the row sums from this dictionary.
      • sumAllRowsToDoubleSqWithReference

        public double[] sumAllRowsToDoubleSqWithReference​(double[] reference)
        Description copied from class: ADictionary
        Method used as a pre-aggregate of each tuple in the dictionary, to single double values.
        Specified by:
        sumAllRowsToDoubleSqWithReference in class ADictionary
        Parameters:
        reference - The reference values to add to each cell.
        Returns:
        a double array containing the row sums from this dictionary.
      • colSum

        public void colSum​(double[] c,
                           int[] counts,
                           int[] colIndexes)
        Description copied from class: ADictionary
        Get the column sum of the values contained in the dictionary
        Specified by:
        colSum in class ADictionary
        Parameters:
        c - The output array allocated to contain all column groups output.
        counts - The counts of the individual tuples.
        colIndexes - The columns indexes of the parent column group, this indicate where to put the column sum into the c output.
      • colSumSq

        public void colSumSq​(double[] c,
                             int[] counts,
                             int[] colIndexes)
        Description copied from class: ADictionary
        Get the column sum of the values contained in the dictionary
        Specified by:
        colSumSq in class ADictionary
        Parameters:
        c - The output array allocated to contain all column groups output.
        counts - The counts of the individual tuples.
        colIndexes - The columns indexes of the parent column group, this indicate where to put the column sum into the c output.
      • colSumSqWithReference

        public void colSumSqWithReference​(double[] c,
                                          int[] counts,
                                          int[] colIndexes,
                                          double[] reference)
        Description copied from class: ADictionary
        Get the column sum of the values contained in the dictionary with an offset reference value added to each cell.
        Specified by:
        colSumSqWithReference in class ADictionary
        Parameters:
        c - The output array allocated to contain all column groups output.
        counts - The counts of the individual tuples.
        colIndexes - The columns indexes of the parent column group, this indicate where to put the column sum into the c output.
        reference - The reference values to add to each cell.
      • sum

        public double sum​(int[] counts,
                          int ncol)
        Description copied from class: ADictionary
        Get the sum of the values contained in the dictionary
        Specified by:
        sum in class ADictionary
        Parameters:
        counts - The counts of the individual tuples
        ncol - The number of columns contained
        Returns:
        The sum scaled by the counts provided.
      • sumSq

        public double sumSq​(int[] counts,
                            int ncol)
        Description copied from class: ADictionary
        Get the square sum of the values contained in the dictionary
        Specified by:
        sumSq in class ADictionary
        Parameters:
        counts - The counts of the individual tuples
        ncol - The number of columns contained
        Returns:
        The square sum scaled by the counts provided.
      • sumSqWithReference

        public double sumSqWithReference​(int[] counts,
                                         double[] reference)
        Description copied from class: ADictionary
        Get the square sum of the values contained in the dictionary with a reference offset on each value.
        Specified by:
        sumSqWithReference in class ADictionary
        Parameters:
        counts - The counts of the individual tuples
        reference - The reference value
        Returns:
        The square sum scaled by the counts and reference.
      • sliceOutColumnRange

        public ADictionary sliceOutColumnRange​(int idxStart,
                                               int idxEnd,
                                               int previousNumberOfColumns)
        Description copied from class: ADictionary
        Modify the dictionary by removing columns not within the index range.
        Specified by:
        sliceOutColumnRange in class ADictionary
        Parameters:
        idxStart - The column index to start at.
        idxEnd - The column index to end at (not inclusive)
        previousNumberOfColumns - The number of columns contained in the dictionary.
        Returns:
        A dictionary containing the sliced out columns values only.
      • containsValue

        public boolean containsValue​(double pattern)
        Description copied from class: ADictionary
        Detect if the dictionary contains a specific value.
        Specified by:
        containsValue in class ADictionary
        Parameters:
        pattern - The value to search for
        Returns:
        true if the value is contained else false.
      • containsValueWithReference

        public boolean containsValueWithReference​(double pattern,
                                                  double[] reference)
        Description copied from class: ADictionary
        Detect if the dictionary contains a specific value with reference offset.
        Specified by:
        containsValueWithReference in class ADictionary
        Parameters:
        pattern - The pattern/ value to search for
        reference - The reference double array.
        Returns:
        true if the value is contained else false.
      • getNumberNonZeros

        public long getNumberNonZeros​(int[] counts,
                                      int nCol)
        Description copied from class: ADictionary
        Calculate the number of non zeros in the dictionary. The number of non zeros should be scaled with the counts given. This gives the exact number of non zero values in the parent column group.
        Specified by:
        getNumberNonZeros in class ADictionary
        Parameters:
        counts - The counts of each dictionary entry
        nCol - The number of columns in this dictionary
        Returns:
        The nonZero count
      • getNumberNonZerosWithReference

        public long getNumberNonZerosWithReference​(int[] counts,
                                                   double[] reference,
                                                   int nRows)
        Description copied from class: ADictionary
        Calculate the number of non zeros in the dictionary. Each value in the dictionary should be added to the reference value. The number of non zeros should be scaled with the given counts.
        Specified by:
        getNumberNonZerosWithReference in class ADictionary
        Parameters:
        counts - The Counts of each dict entry.
        reference - The reference vector.
        nRows - The number of rows in the input.
        Returns:
        The NonZero Count.
      • addToEntry

        public void addToEntry​(double[] v,
                               int fr,
                               int to,
                               int nCol)
        Description copied from class: ADictionary
        Copies and adds the dictionary entry from this dictionary to the d dictionary
        Specified by:
        addToEntry in class ADictionary
        Parameters:
        v - the target dictionary (dense double array)
        fr - the from index
        to - the to index
        nCol - the number of columns
      • addToEntry

        public void addToEntry​(double[] v,
                               int fr,
                               int to,
                               int nCol,
                               int rep)
        Description copied from class: ADictionary
        copies and adds the dictonary entry from this dictionary yo the d dictionary rep times.
        Specified by:
        addToEntry in class ADictionary
        Parameters:
        v - the target dictionary (dense double array)
        fr - the from index
        to - the to index
        nCol - the number of columns
        rep - the number of repetitions to apply (simply multiply do not loop)
      • addToEntryVectorized

        public void addToEntryVectorized​(double[] v,
                                         int f1,
                                         int f2,
                                         int f3,
                                         int f4,
                                         int f5,
                                         int f6,
                                         int f7,
                                         int f8,
                                         int t1,
                                         int t2,
                                         int t3,
                                         int t4,
                                         int t5,
                                         int t6,
                                         int t7,
                                         int t8,
                                         int nCol)
        Specified by:
        addToEntryVectorized in class ADictionary
      • subtractTuple

        public ADictionary subtractTuple​(double[] tuple)
        Description copied from class: ADictionary
        Allocate a new dictionary where the tuple given is subtracted from all tuples in the previous dictionary.
        Specified by:
        subtractTuple in class ADictionary
        Parameters:
        tuple - a double list representing a tuple, it is given that the tuple with is the same as this dictionaries.
        Returns:
        a new instance of dictionary with the tuple subtracted.
      • getMBDict

        public MatrixBlockDictionary getMBDict​(int nCol)
        Description copied from class: ADictionary
        Get this dictionary as a MatrixBlock dictionary. This allows us to use optimized kernels coded elsewhere in the system, such as matrix multiplication. Return null if the matrix is empty.
        Specified by:
        getMBDict in class ADictionary
        Parameters:
        nCol - The number of columns contained in this column group.
        Returns:
        A Dictionary containing a MatrixBlock.
      • getString

        public String getString​(int colIndexes)
        Description copied from class: ADictionary
        Get a string representation of the dictionary, that considers the layout of the data.
        Specified by:
        getString in class ADictionary
        Parameters:
        colIndexes - The number of columns in the dictionary.
        Returns:
        A string that is nicer to print.
      • scaleTuples

        public ADictionary scaleTuples​(int[] scaling,
                                       int nCol)
        Description copied from class: ADictionary
        Scale all tuples contained in the dictionary by the scaling factor given in the int list.
        Specified by:
        scaleTuples in class ADictionary
        Parameters:
        scaling - The amount to multiply the given tuples with
        nCol - The number of columns contained in this column group.
        Returns:
        A New dictionary (since we don't want to modify the underlying dictionary)
      • getExactSizeOnDisk

        public long getExactSizeOnDisk()
        Description copied from class: ADictionary
        Calculate the space consumption if the dictionary is stored on disk.
        Specified by:
        getExactSizeOnDisk in class ADictionary
        Returns:
        the long count of bytes to store the dictionary.
      • preaggValuesFromDense

        public MatrixBlockDictionary preaggValuesFromDense​(int numVals,
                                                           int[] colIndexes,
                                                           int[] aggregateColumns,
                                                           double[] b,
                                                           int cut)
        Description copied from class: ADictionary
        Pre Aggregate values for Right Matrix Multiplication.
        Specified by:
        preaggValuesFromDense in class ADictionary
        Parameters:
        numVals - The number of values contained in this dictionary
        colIndexes - The column indexes that is associated with the parent column group
        aggregateColumns - The column to aggregate, this is preprocessed, to find remove consideration for empty columns
        b - The values in the right hand side matrix
        cut - The number of columns in b.
        Returns:
        A new dictionary with the pre aggregated values.
      • replace

        public ADictionary replace​(double pattern,
                                   double replace,
                                   int nCol)
        Description copied from class: ADictionary
        Make a copy of the values, and replace all values that match pattern with replacement value. If needed add a new column index.
        Specified by:
        replace in class ADictionary
        Parameters:
        pattern - The value to look for
        replace - The value to replace the other value with
        nCol - The number of columns contained in the dictionary.
        Returns:
        A new Column Group, reusing the index structure but with new values.
      • replaceWithReference

        public ADictionary replaceWithReference​(double pattern,
                                                double replace,
                                                double[] reference)
        Description copied from class: ADictionary
        Make a copy of the values, and replace all values that match pattern with replacement value. If needed add a new column index. With reference such that each value in the dict is considered offset by the values contained in the reference.
        Specified by:
        replaceWithReference in class ADictionary
        Parameters:
        pattern - The value to look for
        replace - The value to replace the other value with
        reference - The reference tuple to add to all entries when replacing
        Returns:
        A new Column Group, reusing the index structure but with new values.
      • product

        public void product​(double[] ret,
                            int[] counts,
                            int nCol)
        Description copied from class: ADictionary
        Calculate the product of the dictionary weighted by counts.
        Specified by:
        product in class ADictionary
        Parameters:
        ret - The result dense double array (containing one value)
        counts - The count of individual tuples
        nCol - Number of columns in the dictionary.
      • productWithDefault

        public void productWithDefault​(double[] ret,
                                       int[] counts,
                                       double[] def,
                                       int defCount)
        Description copied from class: ADictionary
        Calculate the product of the dictionary weighted by counts with a default value added .
        Specified by:
        productWithDefault in class ADictionary
        Parameters:
        ret - The result dense double array (containing one value)
        counts - The count of individual tuples
        def - The default tuple
        defCount - The count of the default tuple
      • productWithReference

        public void productWithReference​(double[] ret,
                                         int[] counts,
                                         double[] reference,
                                         int refCount)
        Description copied from class: ADictionary
        Calculate the product of the dictionary weighted by counts and offset by reference
        Specified by:
        productWithReference in class ADictionary
        Parameters:
        ret - The result dense double array (containing one value)
        counts - The counts of each entry in the dictionary
        reference - The reference value.
        refCount - The number of occurences of the ref value.
      • centralMoment

        public CM_COV_Object centralMoment​(CM_COV_Object ret,
                                           ValueFunction fn,
                                           int[] counts,
                                           int nRows)
        Description copied from class: ADictionary
        Central moment function to calculate the central moment of this column group. MUST be on a single column dictionary.
        Specified by:
        centralMoment in class ADictionary
        Parameters:
        ret - The Central Moment object to be modified and returned
        fn - The value function to apply
        counts - The weight of individual tuples
        nRows - The number of rows in total of the column group
        Returns:
        The central moment Object
      • centralMomentWithReference

        public CM_COV_Object centralMomentWithReference​(CM_COV_Object ret,
                                                        ValueFunction fn,
                                                        int[] counts,
                                                        double reference,
                                                        int nRows)
        Description copied from class: ADictionary
        Central moment function to calculate the central moment of this column group with a reference offset on each tuple. MUST be on a single column dictionary.
        Specified by:
        centralMomentWithReference in class ADictionary
        Parameters:
        ret - The Central Moment object to be modified and returned
        fn - The value function to apply
        counts - The weight of individual tuples
        reference - The reference values to offset the tuples with
        nRows - The number of rows in total of the column group
        Returns:
        The central moment Object
      • rexpandCols

        public ADictionary rexpandCols​(int max,
                                       boolean ignore,
                                       boolean cast,
                                       int nCol)
        Description copied from class: ADictionary
        Rexpand the dictionary (one hot encode)
        Specified by:
        rexpandCols in class ADictionary
        Parameters:
        max - the tuple width of the output
        ignore - If we should ignore zero and negative values
        cast - If we should cast all double values to whole integer values
        nCol - The number of columns in the dictionary already (should be 1)
        Returns:
        A new dictionary
      • rexpandColsWithReference

        public ADictionary rexpandColsWithReference​(int max,
                                                    boolean ignore,
                                                    boolean cast,
                                                    double reference)
        Description copied from class: ADictionary
        Rexpand the dictionary (one hot encode)
        Specified by:
        rexpandColsWithReference in class ADictionary
        Parameters:
        max - the tuple width of the output
        ignore - If we should ignore zero and negative values
        cast - If we should cast all double values to whole integer values
        reference - A reference value to add to all tuples before expanding
        Returns:
        A new dictionary
      • getSparsity

        public double getSparsity()
        Description copied from class: ADictionary
        Get the sparsity of the dictionary.
        Specified by:
        getSparsity in class ADictionary
        Returns:
        a sparsity between 0 and 1
      • multiplyScalar

        public void multiplyScalar​(double v,
                                   double[] ret,
                                   int off,
                                   int dictIdx,
                                   int[] cols)
        Description copied from class: ADictionary
        Multiply the v value with the dictionary entry at dictIdx and add it to the ret matrix at the columns specified in the int array.
        Specified by:
        multiplyScalar in class ADictionary
        Parameters:
        v - Value to multiply
        ret - Output dense double array location
        off - Offset into the ret array that the "row" output starts at
        dictIdx - The dictionary entry to multiply.
        cols - The columns to multiply into of the output.