Class Array<T>
- java.lang.Object
-
- org.apache.sysds.runtime.frame.data.columns.Array<T>
-
- All Implemented Interfaces:
org.apache.hadoop.io.Writable
- Direct Known Subclasses:
ABooleanArray,ACompressedArray,CharArray,DoubleArray,FloatArray,HashLongArray,IntegerArray,LongArray,OptionalArray,RaggedArray,StringArray
public abstract class Array<T> extends Object implements org.apache.hadoop.io.Writable
Generic, resizable native arrays for the internal representation of the columns in the FrameBlock. We use this custom class hierarchy instead of Trove or other libraries in order to avoid unnecessary dependencies.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description classArray.ArrayIterator
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description Pair<Types.ValueType,Boolean>analyzeValueType()Analyze the column to figure out if the value type can be refined to a better type.abstract Pair<Types.ValueType,Boolean>analyzeValueType(int maxCells)Analyze the column to figure out if the value type can be refined to a better type.abstract voidappend(String value)Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.abstract Array<T>append(Array<T> other)Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this.abstract voidappend(T value)Append a value of the same type of the Array.static longbaseMemoryCost()Get the base memory cost of the Arrays allocation.Array<?>changeType(Types.ValueType t)Change the allocated array to a different type.Array<?>changeType(Types.ValueType t, boolean containsNull)Array<?>changeTypeWithNulls(Types.ValueType t)abstract Array<T>clone()Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arraysbooleancontainsNull()analyze if the array contains null values.AMapToDatacreateMapping(Map<T,Integer> d)booleanequals(Object other)abstract booleanequals(Array<T> other)double[]extractDouble(double[] ret, int rl, int ru)abstract voidfill(String val)fill the entire array with specific value.abstract voidfill(T val)fill the entire array with specific value.voidfindEmpty(boolean[] select)Find the empty rows, it is assumed that the input is to be only modified to set variables to true.voidfindEmptyInverse(boolean[] select)Find the filled rows, it is assumed that the input i to be only modified to set variables to true;abstract Objectget()Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is.abstract Tget(int index)Get the value at a given index.abstract byte[]getAsByteArray()Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.abstract doublegetAsDouble(int i)Get the index's value.doublegetAsNaNDouble(int i)Get the index's value.SoftReference<Map<T,Long>>getCache()Get the current cached element.abstract longgetExactSerializedSize()Get the exact serialized size on disk of this array.abstract ArrayFactory.FrameArrayTypegetFrameArrayType()Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.longgetInMemorySize()Get in memory size, not counting reference to this object.Array.ArrayIteratorgetIterator()Pair<Integer,Integer>getMinMaxLength()Get the minimum and maximum length of the contained values as string type.ABooleanArraygetNulls()Map<T,Long>getRecodeMap()Get a recode map that maps each unique value in the array, to a long ID.abstract Types.ValueTypegetValueType()Get the current value type of this array.abstract doublehashDouble(int idx)Hash the given index of the array.abstract booleanisEmpty()Get if this array is empty, aka filled with empty values.abstract booleanisNotEmpty(int i)abstract booleanisShallowSerialize()analyze if this array can be shallow serialized.abstract booleanpossiblyContainsNaN()abstract voidreset(int size)Reset the Array and set to a different size.Array<?>safeChangeType(Types.ValueType t, boolean containsNull)abstract Array<T>select(boolean[] select, int nTrue)Slice out the true indices in the select input and return the sub array.abstract Array<T>select(int[] indices)Slice out the specified indices and return the sub array.abstract voidset(int index, double value)Set index to given double value (cast to the correct type of this array)voidset(int rl, int ru, Array<T> value)Set range to given arrays valuevoidset(int rl, int ru, Array<T> value, int rlSrc)Set range to given arrays value with an offset into other arrayabstract voidset(int index, String value)Set index to the given value of the string parsed.abstract voidset(int index, T value)Set index to the given value of same typevoidsetCache(SoftReference<Map<T,Long>> m)Set the cached hashmap cache of this Array allocation, to be used in transformEncode.abstract voidsetFromOtherType(int rl, int ru, Array<?> value)Set range to given arrays valueabstract voidsetFromOtherTypeNz(int rl, int ru, Array<?> value)Set non default values in the range from the value array givenvoidsetFromOtherTypeNz(Array<?> value)Set non default values from the value array givenabstract voidsetNz(int rl, int ru, Array<T> value)Set non default values in the range from the value array givenvoidsetNz(Array<T> value)Set non default values from the value array givenintsize()Get the number of elements in the array, this does not necessarily reflect the current allocated size.abstract Array<T>slice(int rl, int ru)Slice out the sub range and return new array with the specified type.ArrayCompressionStatisticsstatistics(int nSamples)StringtoString()
-
-
-
Method Detail
-
getCache
public final SoftReference<Map<T,Long>> getCache()
Get the current cached element.- Returns:
- The cached object
-
setCache
public final void setCache(SoftReference<Map<T,Long>> m)
Set the cached hashmap cache of this Array allocation, to be used in transformEncode.- Parameters:
m- The element to cache.
-
getRecodeMap
public final Map<T,Long> getRecodeMap()
Get a recode map that maps each unique value in the array, to a long ID. Null values are ignored, and not included in the mapping. The resulting recode map in stored in a soft reference to speed up repeated calls to the same column.- Returns:
- A recode map
-
size
public final int size()
Get the number of elements in the array, this does not necessarily reflect the current allocated size.- Returns:
- the current number of elements
-
get
public abstract T get(int index)
Get the value at a given index. This method returns objects that have a high overhead in allocation. Therefore it is not as efficient as using the vectorized operations specified in the object.- Parameters:
index- The index to query- Returns:
- The value returned as an object
-
get
public abstract Object get()
Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is. Also it is not guaranteed that the underlying data structure does not allocate an appropriate response to the caller. This in practice means that if called there is a possibility that the entire array is allocated again. So the method should only be used for debugging purposes not for performance.- Returns:
- The underlying array.
-
getAsDouble
public abstract double getAsDouble(int i)
Get the index's value. returns 0 in case of Null.- Parameters:
i- index to get value from- Returns:
- the value
-
getAsNaNDouble
public double getAsNaNDouble(int i)
Get the index's value. returns Double.NaN in case of Null.- Parameters:
i- index to get value from- Returns:
- the value
-
set
public abstract void set(int index, T value)Set index to the given value of same type- Parameters:
index- The index to setvalue- The value to assign
-
set
public abstract void set(int index, double value)Set index to given double value (cast to the correct type of this array)- Parameters:
index- the index to setvalue- the value to set it to (before casting to correct value type)
-
set
public abstract void set(int index, String value)Set index to the given value of the string parsed.- Parameters:
index- The index to setvalue- The value to assign
-
setFromOtherType
public abstract void setFromOtherType(int rl, int ru, Array<?> value)Set range to given arrays value- Parameters:
rl- row lowerru- row upper (inclusive)value- value array to take values from (other type)
-
set
public void set(int rl, int ru, Array<T> value)Set range to given arrays value- Parameters:
rl- row lowerru- row upper (inclusive)value- value array to take values from (same type)
-
set
public void set(int rl, int ru, Array<T> value, int rlSrc)Set range to given arrays value with an offset into other array- Parameters:
rl- row lowerru- row upper (inclusive)value- value array to take values fromrlSrc- the offset into the value array to take values from
-
setNz
public final void setNz(Array<T> value)
Set non default values from the value array given- Parameters:
value- array of same type and length
-
setNz
public abstract void setNz(int rl, int ru, Array<T> value)Set non default values in the range from the value array given- Parameters:
rl- row startru- row upper inclusivevalue- value array of same type
-
setFromOtherTypeNz
public final void setFromOtherTypeNz(Array<?> value)
Set non default values from the value array given- Parameters:
value- array of other type
-
setFromOtherTypeNz
public abstract void setFromOtherTypeNz(int rl, int ru, Array<?> value)Set non default values in the range from the value array given- Parameters:
rl- row startru- row end inclusivevalue- value array of different type
-
append
public abstract void append(String value)
Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.- Parameters:
value- The value to append
-
append
public abstract void append(T value)
Append a value of the same type of the Array. This should in general be avoided, and appending larger blocks at a time should be preferred.- Parameters:
value- The value to append
-
append
public abstract Array<T> append(Array<T> other)
Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this. This method should use the set range function, and should be preferred over the append single values.- Parameters:
other- The other array of same type to append to this.- Returns:
- The combined arrays.
-
slice
public abstract Array<T> slice(int rl, int ru)
Slice out the sub range and return new array with the specified type. If the conversion fails fallback to normal slice.- Parameters:
rl- row startru- row end (not included)- Returns:
- A new array of sub range.
-
reset
public abstract void reset(int size)
Reset the Array and set to a different size. This method is used to reuse an already allocated Array, without extra allocation. It should only be done in cases where the Array is no longer in use in any FrameBlocks.- Parameters:
size- The size to reallocate into.
-
getAsByteArray
public abstract byte[] getAsByteArray()
Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.- Returns:
- The array as bytes
-
getValueType
public abstract Types.ValueType getValueType()
Get the current value type of this array.- Returns:
- The current value type.
-
analyzeValueType
public final Pair<Types.ValueType,Boolean> analyzeValueType()
Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.- Returns:
- A better or equivalent value type to represent the column, including null information.
-
analyzeValueType
public abstract Pair<Types.ValueType,Boolean> analyzeValueType(int maxCells)
Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.- Parameters:
maxCells- maximum number of cells to analyze- Returns:
- A better or equivalent value type to represent the column, including null information.
-
getFrameArrayType
public abstract ArrayFactory.FrameArrayType getFrameArrayType()
Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.- Returns:
- The FrameArrayType
-
getInMemorySize
public long getInMemorySize()
Get in memory size, not counting reference to this object.- Returns:
- the size in memory of this object.
-
baseMemoryCost
public static long baseMemoryCost()
Get the base memory cost of the Arrays allocation.- Returns:
- The base memory cost
-
getExactSerializedSize
public abstract long getExactSerializedSize()
Get the exact serialized size on disk of this array.- Returns:
- The exact size on disk
-
getNulls
public ABooleanArray getNulls()
-
containsNull
public boolean containsNull()
analyze if the array contains null values.- Returns:
- If the array contains null.
-
possiblyContainsNaN
public abstract boolean possiblyContainsNaN()
-
safeChangeType
public Array<?> safeChangeType(Types.ValueType t, boolean containsNull)
-
changeType
public Array<?> changeType(Types.ValueType t, boolean containsNull)
-
changeTypeWithNulls
public Array<?> changeTypeWithNulls(Types.ValueType t)
-
changeType
public final Array<?> changeType(Types.ValueType t)
Change the allocated array to a different type. If the type is the same a deep copy is returned for safety.- Parameters:
t- The type to change to- Returns:
- A new column array.
-
getMinMaxLength
public Pair<Integer,Integer> getMinMaxLength()
Get the minimum and maximum length of the contained values as string type.- Returns:
- A Pair of first the minimum length, second the maximum length
-
fill
public abstract void fill(String val)
fill the entire array with specific value.- Parameters:
val- the value to fill with.
-
fill
public abstract void fill(T val)
fill the entire array with specific value.- Parameters:
val- the value to fill with.
-
isShallowSerialize
public abstract boolean isShallowSerialize()
analyze if this array can be shallow serialized. to allow caching without modification.- Returns:
- boolean saying true if shallow serialization is available
-
isEmpty
public abstract boolean isEmpty()
Get if this array is empty, aka filled with empty values.- Returns:
- boolean saying true if empty
-
select
public abstract Array<T> select(int[] indices)
Slice out the specified indices and return the sub array.- Parameters:
indices- The indices to slice out- Returns:
- the sliced out indices in an array format
-
select
public abstract Array<T> select(boolean[] select, int nTrue)
Slice out the true indices in the select input and return the sub array.- Parameters:
select- a boolean vector specifying what to selectnTrue- number of true values inside select- Returns:
- the sliced out indices in an array format
-
findEmpty
public final void findEmpty(boolean[] select)
Find the empty rows, it is assumed that the input is to be only modified to set variables to true.- Parameters:
select- Modify this to true in indexes that are not empty.
-
isNotEmpty
public abstract boolean isNotEmpty(int i)
-
findEmptyInverse
public void findEmptyInverse(boolean[] select)
Find the filled rows, it is assumed that the input i to be only modified to set variables to true;- Parameters:
select- modify this to true in indexes that are empty.
-
clone
public abstract Array<T> clone()
Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays- Returns:
- A clone
-
hashDouble
public abstract double hashDouble(int idx)
Hash the given index of the array. It is allowed to return NaN on null elements.- Parameters:
idx- The index to hash- Returns:
- The hash value of that index.
-
getIterator
public Array.ArrayIterator getIterator()
-
extractDouble
public double[] extractDouble(double[] ret, int rl, int ru)
-
statistics
public ArrayCompressionStatistics statistics(int nSamples)
-
createMapping
public AMapToData createMapping(Map<T,Integer> d)
-
-