Class DDCArray<T>
- java.lang.Object
 - 
- org.apache.sysds.runtime.frame.data.columns.Array<T>
 - 
- org.apache.sysds.runtime.frame.data.columns.ACompressedArray<T>
 - 
- org.apache.sysds.runtime.frame.data.columns.DDCArray<T>
 
 
 
 
- 
- All Implemented Interfaces:
 org.apache.hadoop.io.Writable
public class DDCArray<T> extends ACompressedArray<T>
A dense dictionary version of an column array 
- 
- 
Nested Class Summary
- 
Nested classes/interfaces inherited from class org.apache.sysds.runtime.frame.data.columns.Array
Array.ArrayIterator 
 - 
 
- 
Field Summary
- 
Fields inherited from class org.apache.sysds.runtime.frame.data.columns.Array
ROW_PARALLELIZATION_THRESHOLD 
 - 
 
- 
Constructor Summary
Constructors Constructor Description DDCArray(Array<T> dict, AMapToData map) 
- 
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Pair<Types.ValueType,Boolean>analyzeValueType(int maxCells)Analyze the column to figure out if the value type can be refined to a better type.Array<T>append(Array<T> other)Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this.Array<?>changeType(Types.ValueType t)Change the allocated array to a different type.Array<?>changeTypeWithNulls(Types.ValueType t)Array<T>clone()Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arraysstatic <T> Array<T>compressToDDC(Array<T> arr)static <T> Array<T>compressToDDC(Array<T> arr, int estimateUnique)Try to compress array into DDC format.booleancontainsNull()analyze if the array contains null values.booleanequals(Array<T> other)Equals operation on arrays.static longestimateInMemorySize(int memSizeBitPerElement, int estDistinct, int nRow)double[]extractDouble(double[] ret, int rl, int ru)Extract the sub array into the ret array as doubles.Tget(int index)Get the value at a given index.byte[]getAsByteArray()Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.doublegetAsDouble(int i)Get the index's value.doublegetAsNaNDouble(int i)Get the index's value.Array<T>getDict()longgetExactSerializedSize()Get the exact serialized size on disk of this array.ArrayFactory.FrameArrayTypegetFrameArrayType()Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.longgetInMemorySize()Get in memory size, not counting reference to this object.TgetInternal(int index)Get the internal value at a given index.AMapToDatagetMap()Types.ValueTypegetValueType()Get the current value type of this array.doublehashDouble(int idx)Hash the given index of the array.booleanisEmpty()Get if this array is empty, aka filled with empty values.booleanisNotEmpty(int i)booleanisShallowSerialize()analyze if this array can be shallow serialized.double[]minMax()Get the minimum and maximum double value of this array.double[]minMax(int l, int u)Get the minimum and maximum double value of a specific sub part of this array.DDCArray<T>nullDict()booleanpossiblyContainsNaN()static DDCArray<?>read(DataInput in)voidreadFields(DataInput in)Array<T>select(boolean[] select, int nTrue)Slice out the true indices in the select input and return the sub array.Array<T>select(int[] indices)Slice out the specified indices and return the sub array.voidset(int rl, int ru, Array<T> value, int rlSrc)Set range to given arrays value with an offset into other array<J> DDCArray<J>setDict(Array<J> dict)DDCArray<T>setMap(AMapToData map)Array<T>slice(int rl, int ru)Slice out the sub range and return new array with the specified type.ArrayCompressionStatisticsstatistics(int nSamples)Get the compression statistics of this array allocation.StringtoString()voidwrite(DataOutput out)- 
Methods inherited from class org.apache.sysds.runtime.frame.data.columns.ACompressedArray
append, append, fill, fill, get, reset, set, set, set, setFromOtherType, setFromOtherTypeNz, setNz 
- 
Methods inherited from class org.apache.sysds.runtime.frame.data.columns.Array
analyzeValueType, baseMemoryCost, changeType, changeType, changeType, changeTypeWithNulls, changeTypeWithNulls, equals, findEmpty, findEmptyInverse, getCache, getIterator, getMinMaxLength, getNulls, getRecodeMap, getRecodeMap, getRecodeMap, set, setCache, setFromOtherTypeNz, setM, setM, setNz, size 
 - 
 
 - 
 
- 
- 
Constructor Detail
- 
DDCArray
public DDCArray(Array<T> dict, AMapToData map)
 
 - 
 
- 
Method Detail
- 
getMap
public AMapToData getMap()
 
- 
setMap
public DDCArray<T> setMap(AMapToData map)
 
- 
compressToDDC
public static <T> Array<T> compressToDDC(Array<T> arr, int estimateUnique)
Try to compress array into DDC format.- Type Parameters:
 T- The type of the Array- Parameters:
 arr- The array to try to compressestimateUnique- The estimated number of unique values- Returns:
 - Either a compressed version or the original.
 
 
- 
write
public void write(DataOutput out) throws IOException
- Throws:
 IOException
 
- 
readFields
public void readFields(DataInput in) throws IOException
- Throws:
 IOException
 
- 
read
public static DDCArray<?> read(DataInput in) throws IOException
- Throws:
 IOException
 
- 
get
public T get(int index)
Description copied from class:ArrayGet the value at a given index. This method returns objects that have a high overhead in allocation. Therefore it is not as efficient as using the vectorized operations specified in the object. 
- 
getInternal
public T getInternal(int index)
Description copied from class:ArrayGet the internal value at a given index. For instance HashIntegerArray would return the underlying long not a string.- Overrides:
 getInternalin classArray<T>- Parameters:
 index- the index to get- Returns:
 - The value to get
 
 
- 
extractDouble
public double[] extractDouble(double[] ret, int rl, int ru)Description copied from class:ArrayExtract the sub array into the ret array as doubles. The ret array is filled from - rl, meaning that the ret array should be of length ru - rl.- Overrides:
 extractDoublein classArray<T>- Parameters:
 ret- The array to returnrl- The row to start atru- The row to end at (not inclusive.)- Returns:
 - The ret array given as argument
 
 
- 
getAsDouble
public double getAsDouble(int i)
Description copied from class:ArrayGet the index's value. returns 0 in case of Null.- Specified by:
 getAsDoublein classArray<T>- Parameters:
 i- index to get value from- Returns:
 - the value
 
 
- 
getAsNaNDouble
public double getAsNaNDouble(int i)
Description copied from class:ArrayGet the index's value. returns Double.NaN in case of Null.- Overrides:
 getAsNaNDoublein classArray<T>- Parameters:
 i- index to get value from- Returns:
 - the value
 
 
- 
append
public Array<T> append(Array<T> other)
Description copied from class:ArrayAppend other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this. This method should use the set range function, and should be preferred over the append single values. 
- 
slice
public Array<T> slice(int rl, int ru)
Description copied from class:ArraySlice out the sub range and return new array with the specified type. If the conversion fails fallback to normal slice. 
- 
getAsByteArray
public byte[] getAsByteArray()
Description copied from class:ArrayReturn the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.- Specified by:
 getAsByteArrayin classArray<T>- Returns:
 - The array as bytes
 
 
- 
getValueType
public Types.ValueType getValueType()
Description copied from class:ArrayGet the current value type of this array.- Specified by:
 getValueTypein classArray<T>- Returns:
 - The current value type.
 
 
- 
analyzeValueType
public Pair<Types.ValueType,Boolean> analyzeValueType(int maxCells)
Description copied from class:ArrayAnalyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.- Specified by:
 analyzeValueTypein classArray<T>- Parameters:
 maxCells- maximum number of cells to analyze- Returns:
 - A better or equivalent value type to represent the column, including null information.
 
 
- 
set
public void set(int rl, int ru, Array<T> value, int rlSrc)Description copied from class:ArraySet range to given arrays value with an offset into other array 
- 
getFrameArrayType
public ArrayFactory.FrameArrayType getFrameArrayType()
Description copied from class:ArrayGet the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.- Specified by:
 getFrameArrayTypein classArray<T>- Returns:
 - The FrameArrayType
 
 
- 
getExactSerializedSize
public long getExactSerializedSize()
Description copied from class:ArrayGet the exact serialized size on disk of this array.- Specified by:
 getExactSerializedSizein classArray<T>- Returns:
 - The exact size on disk
 
 
- 
changeType
public Array<?> changeType(Types.ValueType t)
Description copied from class:ArrayChange the allocated array to a different type. If the type is the same a deep copy is returned for safety.- Specified by:
 changeTypein classACompressedArray<T>- Parameters:
 t- The type to change to- Returns:
 - A new column array.
 
 
- 
changeTypeWithNulls
public Array<?> changeTypeWithNulls(Types.ValueType t)
- Overrides:
 changeTypeWithNullsin classArray<T>
 
- 
isShallowSerialize
public boolean isShallowSerialize()
Description copied from class:Arrayanalyze if this array can be shallow serialized. to allow caching without modification.- Specified by:
 isShallowSerializein classArray<T>- Returns:
 - boolean saying true if shallow serialization is available
 
 
- 
isEmpty
public boolean isEmpty()
Description copied from class:ArrayGet if this array is empty, aka filled with empty values. 
- 
select
public Array<T> select(int[] indices)
Description copied from class:ArraySlice out the specified indices and return the sub array. 
- 
select
public Array<T> select(boolean[] select, int nTrue)
Description copied from class:ArraySlice out the true indices in the select input and return the sub array. 
- 
isNotEmpty
public boolean isNotEmpty(int i)
- Specified by:
 isNotEmptyin classArray<T>
 
- 
clone
public Array<T> clone()
Description copied from class:ArrayOverwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays 
- 
hashDouble
public double hashDouble(int idx)
Description copied from class:ArrayHash the given index of the array. It is allowed to return NaN on null elements.- Specified by:
 hashDoublein classArray<T>- Parameters:
 idx- The index to hash- Returns:
 - The hash value of that index.
 
 
- 
getInMemorySize
public long getInMemorySize()
Description copied from class:ArrayGet in memory size, not counting reference to this object.- Overrides:
 getInMemorySizein classArray<T>- Returns:
 - the size in memory of this object.
 
 
- 
estimateInMemorySize
public static long estimateInMemorySize(int memSizeBitPerElement, int estDistinct, int nRow) 
- 
containsNull
public boolean containsNull()
Description copied from class:Arrayanalyze if the array contains null values.- Overrides:
 containsNullin classArray<T>- Returns:
 - If the array contains null.
 
 
- 
equals
public boolean equals(Array<T> other)
Description copied from class:ArrayEquals operation on arrays. 
- 
possiblyContainsNaN
public boolean possiblyContainsNaN()
- Specified by:
 possiblyContainsNaNin classArray<T>
 
- 
minMax
public double[] minMax()
Description copied from class:ArrayGet the minimum and maximum double value of this array. Note that we ignore NaN Values. 
- 
minMax
public double[] minMax(int l, int u)Description copied from class:ArrayGet the minimum and maximum double value of a specific sub part of this array. Note that we ignore NaN Values. 
- 
statistics
public ArrayCompressionStatistics statistics(int nSamples)
Description copied from class:ArrayGet the compression statistics of this array allocation.- Specified by:
 statisticsin classACompressedArray<T>- Parameters:
 nSamples- The number of sample elements suggested (not forced) to be used.- Returns:
 - The compression statistics of this array.
 
 
 - 
 
 -