Interface LibMatrixCountDistinct


  • public interface LibMatrixCountDistinct
    This class contains various methods for counting the number of distinct values inside a MatrixBlock
    • Field Detail

      • LOG

        static final org.apache.commons.logging.Log LOG
      • minimumSize

        static final int minimumSize
        The minimum number NonZero of cells in the input before using approximate techniques for counting number of distinct values.
        See Also:
        Constant Field Values
    • Method Detail

      • estimateDistinctValues

        static int estimateDistinctValues​(MatrixBlock in,
                                          CountDistinctOperator op)
        Public method to count the number of distinct values inside a matrix. Depending on which CountDistinctOperator selected it either gets the absolute number or a estimated value. TODO: Support counting num distinct in rows, or columns axis. TODO: Add support for distributed spark operations TODO: If the MatrixBlock type is CompressedMatrix, simply read the values from the ColGroups.
        Parameters:
        in - the input matrix to count number distinct values in
        op - the selected operator to use
        Returns:
        the distinct count