Class Array<T>

  • All Implemented Interfaces:
    org.apache.hadoop.io.Writable
    Direct Known Subclasses:
    ABooleanArray, CharArray, DoubleArray, FloatArray, IntegerArray, LongArray, OptionalArray, StringArray

    public abstract class Array<T>
    extends Object
    implements org.apache.hadoop.io.Writable
    Generic, resizable native arrays for the internal representation of the columns in the FrameBlock. We use this custom class hierarchy instead of Trove or other libraries in order to avoid unnecessary dependencies.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      class  Array.ArrayIterator  
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      abstract Pair<Types.ValueType,​Boolean> analyzeValueType()
      Analyze the column to figure out if the value type can be refined to a better type.
      abstract void append​(String value)
      Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.
      abstract Array<T> append​(Array<T> other)
      append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this.
      abstract void append​(T value)
      Append a value of the same type of the Array.
      static long baseMemoryCost()
      Get the base memory cost of the Arrays allocation.
      Array<?> changeType​(Types.ValueType t)
      Change the allocated array to a different type.
      Array<?> changeTypeWithNulls​(Types.ValueType t)  
      abstract Array<T> clone()
      Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays
      boolean containsNull()
      analyze if the array contains null values.
      abstract void fill​(String val)
      fill the entire array with specific value.
      abstract void fill​(T val)
      fill the entire array with specific value.
      void findEmpty​(boolean[] select)
      Find the empty rows, it is assumed that the input is to be only modified to set variables to true.
      void findEmptyInverse​(boolean[] select)
      Find the filled rows, it is assumed that the input i to be only modified to set variables to true;
      abstract Object get()
      Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is.
      abstract T get​(int index)
      Get the value at a given index.
      abstract byte[] getAsByteArray()
      Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.
      abstract double getAsDouble​(int i)  
      double getAsNaNDouble​(int i)  
      SoftReference<HashMap<T,​Long>> getCache()
      Get the current cached element.
      abstract long getExactSerializedSize()
      Get the exact serialized size on disk of this array.
      abstract ArrayFactory.FrameArrayType getFrameArrayType()
      Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.
      long getInMemorySize()
      Get in memory size, not counting reference to this object.
      Array.ArrayIterator getIterator()  
      Pair<Integer,​Integer> getMinMaxLength()
      Get the minimum and maximum length of the contained values as string type.
      ABooleanArray getNulls()  
      HashMap<T,​Long> getRecodeMap()  
      abstract Types.ValueType getValueType()
      Get the current value type of this array.
      abstract boolean isEmpty()
      Get if this array is empty, aka filled with empty values.
      abstract boolean isNotEmpty​(int i)  
      abstract boolean isShallowSerialize()
      analyze if this array can be shallow serialized.
      abstract void reset​(int size)
      Reset the Array and set to a different size.
      abstract Array<T> select​(boolean[] select, int nTrue)
      Slice out the true indices in the select input and return the sub array.
      abstract Array<T> select​(int[] indices)
      Slice out the specified indices and return the sub array.
      abstract void set​(int index, double value)
      Set index to given double value (cast to the correct type of this array)
      abstract void set​(int rl, int ru, Array<T> value)
      Set range to given arrays value
      abstract void set​(int rl, int ru, Array<T> value, int rlSrc)
      Set range to given arrays value with an offset into other array
      abstract void set​(int index, String value)
      Set index to the given value of the string parsed.
      abstract void set​(int index, T value)
      Set index to the given value of same type
      void setCache​(SoftReference<HashMap<T,​Long>> m)
      Set the cached hashmap cache of this Array allocation, to be used in transformEncode.
      abstract void setFromOtherType​(int rl, int ru, Array<?> value)
      Set range to given arrays value
      abstract void setFromOtherTypeNz​(int rl, int ru, Array<?> value)
      Set non default values in the range from the value array given
      void setFromOtherTypeNz​(Array<?> value)
      Set non default values from the value array given
      abstract void setNz​(int rl, int ru, Array<T> value)
      Set non default values in the range from the value array given
      void setNz​(Array<T> value)
      Set non default values from the value array given
      int size()
      Get the number of elements in the array, this does not necessarily reflect the current allocated size.
      abstract Array<T> slice​(int rl, int ru)
      Slice out the sub range and return new array with the specified type.
      String toString()  
      • Methods inherited from interface org.apache.hadoop.io.Writable

        readFields, write
    • Method Detail

      • getCache

        public final SoftReference<HashMap<T,​Long>> getCache()
        Get the current cached element.
        Returns:
        The cached object
      • setCache

        public final void setCache​(SoftReference<HashMap<T,​Long>> m)
        Set the cached hashmap cache of this Array allocation, to be used in transformEncode.
        Parameters:
        m - The element to cache.
      • size

        public final int size()
        Get the number of elements in the array, this does not necessarily reflect the current allocated size.
        Returns:
        the current number of elements
      • get

        public abstract T get​(int index)
        Get the value at a given index. This method returns objects that have a high overhead in allocation. Therefore it is not as efficient as using the vectorized operations specified in the object.
        Parameters:
        index - The index to query
        Returns:
        The value returned as an object
      • get

        public abstract Object get()
        Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is. Also it is not guaranteed that the underlying data structure does not allocate an appropriate response to the caller. This in practice means that if called there is a possibility that the entire array is allocated again. So the method should only be used for debugging purposes not for performance.
        Returns:
        The underlying array.
      • getAsDouble

        public abstract double getAsDouble​(int i)
      • getAsNaNDouble

        public double getAsNaNDouble​(int i)
      • set

        public abstract void set​(int index,
                                 T value)
        Set index to the given value of same type
        Parameters:
        index - The index to set
        value - The value to assign
      • set

        public abstract void set​(int index,
                                 double value)
        Set index to given double value (cast to the correct type of this array)
        Parameters:
        index - the index to set
        value - the value to set it to (before casting to correct value type)
      • set

        public abstract void set​(int index,
                                 String value)
        Set index to the given value of the string parsed.
        Parameters:
        index - The index to set
        value - The value to assign
      • setFromOtherType

        public abstract void setFromOtherType​(int rl,
                                              int ru,
                                              Array<?> value)
        Set range to given arrays value
        Parameters:
        rl - row lower
        ru - row upper (inclusive)
        value - value array to take values from (other type)
      • set

        public abstract void set​(int rl,
                                 int ru,
                                 Array<T> value)
        Set range to given arrays value
        Parameters:
        rl - row lower
        ru - row upper (inclusive)
        value - value array to take values from (same type)
      • set

        public abstract void set​(int rl,
                                 int ru,
                                 Array<T> value,
                                 int rlSrc)
        Set range to given arrays value with an offset into other array
        Parameters:
        rl - row lower
        ru - row upper (inclusive)
        value - value array to take values from
        rlSrc - the offset into the value array to take values from
      • setNz

        public final void setNz​(Array<T> value)
        Set non default values from the value array given
        Parameters:
        value - array of same type and length
      • setNz

        public abstract void setNz​(int rl,
                                   int ru,
                                   Array<T> value)
        Set non default values in the range from the value array given
        Parameters:
        rl - row start
        ru - row upper inclusive
        value - value array of same type
      • setFromOtherTypeNz

        public final void setFromOtherTypeNz​(Array<?> value)
        Set non default values from the value array given
        Parameters:
        value - array of other type
      • setFromOtherTypeNz

        public abstract void setFromOtherTypeNz​(int rl,
                                                int ru,
                                                Array<?> value)
        Set non default values in the range from the value array given
        Parameters:
        rl - row start
        ru - row end inclusive
        value - value array of different type
      • append

        public abstract void append​(String value)
        Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.
        Parameters:
        value - The value to append
      • append

        public abstract void append​(T value)
        Append a value of the same type of the Array. This should in general be avoided, and appending larger blocks at a time should be preferred.
        Parameters:
        value - The value to append
      • append

        public abstract Array<T> append​(Array<T> other)
        append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this. This method should use the set range function, and should be preferred over the append single values.
        Parameters:
        other - The other array of same type to append to this.
        Returns:
        The combined arrays.
      • slice

        public abstract Array<T> slice​(int rl,
                                       int ru)
        Slice out the sub range and return new array with the specified type. If the conversion fails fallback to normal slice.
        Parameters:
        rl - row start
        ru - row end (not included)
        Returns:
        A new array of sub range.
      • reset

        public abstract void reset​(int size)
        Reset the Array and set to a different size. This method is used to reuse an already allocated Array, without extra allocation. It should only be done in cases where the Array is no longer in use in any FrameBlocks.
        Parameters:
        size - The size to reallocate into.
      • getAsByteArray

        public abstract byte[] getAsByteArray()
        Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.
        Returns:
        The array as bytes
      • getValueType

        public abstract Types.ValueType getValueType()
        Get the current value type of this array.
        Returns:
        The current value type.
      • analyzeValueType

        public abstract Pair<Types.ValueType,​Boolean> analyzeValueType()
        Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.
        Returns:
        A better or equivalent value type to represent the column, including null information.
      • getFrameArrayType

        public abstract ArrayFactory.FrameArrayType getFrameArrayType()
        Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.
        Returns:
        The FrameArrayType
      • getInMemorySize

        public long getInMemorySize()
        Get in memory size, not counting reference to this object.
        Returns:
        the size in memory of this object.
      • baseMemoryCost

        public static long baseMemoryCost()
        Get the base memory cost of the Arrays allocation.
        Returns:
        The base memory cost
      • getExactSerializedSize

        public abstract long getExactSerializedSize()
        Get the exact serialized size on disk of this array.
        Returns:
        The exact size on disk
      • containsNull

        public boolean containsNull()
        analyze if the array contains null values.
        Returns:
        If the array contains null.
      • changeType

        public final Array<?> changeType​(Types.ValueType t)
        Change the allocated array to a different type. If the type is the same a deep copy is returned for safety.
        Parameters:
        t - The type to change to
        Returns:
        A new column array.
      • getMinMaxLength

        public Pair<Integer,​Integer> getMinMaxLength()
        Get the minimum and maximum length of the contained values as string type.
        Returns:
        A Pair of first the minimum length, second the maximum length
      • fill

        public abstract void fill​(String val)
        fill the entire array with specific value.
        Parameters:
        val - the value to fill with.
      • fill

        public abstract void fill​(T val)
        fill the entire array with specific value.
        Parameters:
        val - the value to fill with.
      • isShallowSerialize

        public abstract boolean isShallowSerialize()
        analyze if this array can be shallow serialized. to allow caching without modification.
        Returns:
        boolean saying true if shallow serialization is available
      • isEmpty

        public abstract boolean isEmpty()
        Get if this array is empty, aka filled with empty values.
        Returns:
        boolean saying true if empty
      • select

        public abstract Array<T> select​(int[] indices)
        Slice out the specified indices and return the sub array.
        Parameters:
        indices - The indices to slice out
        Returns:
        the sliced out indices in an array format
      • select

        public abstract Array<T> select​(boolean[] select,
                                        int nTrue)
        Slice out the true indices in the select input and return the sub array.
        Parameters:
        select - a boolean vector specifying what to select
        nTrue - number of true values inside select
        Returns:
        the sliced out indices in an array format
      • findEmpty

        public final void findEmpty​(boolean[] select)
        Find the empty rows, it is assumed that the input is to be only modified to set variables to true.
        Parameters:
        select - Modify this to true in indexes that are not empty.
      • isNotEmpty

        public abstract boolean isNotEmpty​(int i)
      • findEmptyInverse

        public void findEmptyInverse​(boolean[] select)
        Find the filled rows, it is assumed that the input i to be only modified to set variables to true;
        Parameters:
        select - modify this to true in indexes that are empty.
      • clone

        public abstract Array<T> clone()
        Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays
        Returns:
        A clone