Class CompressionSettingsBuilder


  • public class CompressionSettingsBuilder
    extends Object
    Builder pattern for Compression Settings. See CompressionSettings for details on values.
    • Constructor Detail

      • CompressionSettingsBuilder

        public CompressionSettingsBuilder()
    • Method Detail

      • copySettings

        public CompressionSettingsBuilder copySettings​(CompressionSettings that)
        Copy the settings from another CompressionSettings Builder, modifies this, not that.
        Parameters:
        that - The other CompressionSettingsBuilder to copy settings from.
        Returns:
        The modified CompressionSettings in the same object.
      • setLossy

        public CompressionSettingsBuilder setLossy​(boolean lossy)
        Set the Compression to use Lossy compression.
        Parameters:
        lossy - A boolean specifying if the compression should be lossy
        Returns:
        The CompressionSettingsBuilder
      • setSamplingRatio

        public CompressionSettingsBuilder setSamplingRatio​(double samplingRatio)
        Set the sampling ratio in percent to sample the input matrix. Input value should be in range 0.0 - 1.0
        Parameters:
        samplingRatio - The ratio to sample from the input
        Returns:
        The CompressionSettingsBuilder
      • setSortValuesByLength

        public CompressionSettingsBuilder setSortValuesByLength​(boolean sortValuesByLength)
        Set the sortValuesByLength flag. This sorts the dictionaries containing the data based on their occurences in the ColGroup. Improving cache efficiency especially for diverse column groups.
        Parameters:
        sortValuesByLength - A boolean specifying if the values should be sorted
        Returns:
        The CompressionSettingsBuilder
      • setAllowSharedDictionary

        public CompressionSettingsBuilder setAllowSharedDictionary​(boolean allowSharedDictionary)
        Allow the Dictionaries to be shared between different column groups.
        Parameters:
        allowSharedDictionary - A boolean specifying if the dictionary can be shared between column groups.
        Returns:
        The CompressionSettingsBuilder
      • setTransposeInput

        public CompressionSettingsBuilder setTransposeInput​(String transposeInput)
        Specify if the input matrix should be transposed before compression. This improves cache efficiency while compression the input matrix
        Parameters:
        transposeInput - string specifying if the input should be transposed before compression, should be one of "auto", "true" or "false"
        Returns:
        The CompressionSettingsBuilder
      • setSeed

        public CompressionSettingsBuilder setSeed​(int seed)
        Set the seed for the compression operation.
        Parameters:
        seed - The seed used in sampling the matrix and general operations in the compression.
        Returns:
        The CompressionSettingsBuilder
      • setValidCompressions

        public CompressionSettingsBuilder setValidCompressions​(EnumSet<AColGroup.CompressionType> validCompressions)
        Set the valid compression strategies used for the compression.
        Parameters:
        validCompressions - An EnumSet of CompressionTypes to use in the compression
        Returns:
        The CompressionSettingsBuilder
      • addValidCompression

        public CompressionSettingsBuilder addValidCompression​(AColGroup.CompressionType cp)
        Add a single valid compression type to the EnumSet of valid compressions.
        Parameters:
        cp - The compression type to add to the valid ones.
        Returns:
        The CompressionSettingsBuilder
      • clearValidCompression

        public CompressionSettingsBuilder clearValidCompression()
        Clear all the compression types allowed in the compression. This will only allow the Uncompressed ColGroup type. Since this is required for operation of the compression
        Returns:
        The CompressionSettingsBuilder
      • setColumnPartitioner

        public CompressionSettingsBuilder setColumnPartitioner​(CoCoderFactory.PartitionerType columnPartitioner)
        Set the type of CoCoding Partitioner type to use for combining columns together.
        Parameters:
        columnPartitioner - The Strategy to select from PartitionerType
        Returns:
        The CompressionSettingsBuilder
      • setMaxColGroupCoCode

        public CompressionSettingsBuilder setMaxColGroupCoCode​(int maxColGroupCoCode)
        Set the maximum number of columns to CoCode together in the CoCoding strategy. Compression time increase with higher numbers.
        Parameters:
        maxColGroupCoCode - The max selected.
        Returns:
        The CompressionSettingsBuilder
      • setCoCodePercentage

        public CompressionSettingsBuilder setCoCodePercentage​(double coCodePercentage)
        Set the coCode percentage, the effect is different based on the coCoding strategy, but the general effect is that higher values results in more coCoding while lower values result in less. Note that with high coCoding the compression ratio would possibly be lower.
        Parameters:
        coCodePercentage - The percentage to set.
        Returns:
        The CompressionSettingsBuilder
      • setMinimumSampleSize

        public CompressionSettingsBuilder setMinimumSampleSize​(int minimumSampleSize)
        Set the minimum sample size to extract from a given matrix, this overrules the sample percentage if the sample percentage extracted is lower than this minimum bound.
        Parameters:
        minimumSampleSize - The minimum sample size to extract
        Returns:
        The CompressionSettingsBuilder
      • setMaxSampleSize

        public CompressionSettingsBuilder setMaxSampleSize​(int maxSampleSize)
        Set the maximum sample size to extract from a given matrix, this overrules the sample percentage if the sample percentage extracted is higher than this maximum bound.
        Parameters:
        maxSampleSize - The maximum sample size to extract
        Returns:
        The CompressionSettingsBuilder
      • setCostType

        public CompressionSettingsBuilder setCostType​(CostEstimatorFactory.CostType costType)
        Set the cost type used for estimating the cost of column groups default is memory based.
        Parameters:
        costType - The Cost type wanted
        Returns:
        The CompressionSettingsBuilder
      • setMinimumCompressionRatio

        public CompressionSettingsBuilder setMinimumCompressionRatio​(double ratio)
        Set the minimum compression ratio to be achieved by the compression.
        Parameters:
        ratio - The ratio to achieve while compressing
        Returns:
        The CompressionSettingsBuilder
      • setIsInSparkInstruction

        public CompressionSettingsBuilder setIsInSparkInstruction()
        Inform the compression that it is executed in a spark instruction.
        Returns:
        The CompressionSettingsBuilder
      • create

        public CompressionSettings create()
        Create the CompressionSettings object to use in the compression.
        Returns:
        The CompressionSettings