Class FrameBlock

    • Field Detail

      • BUFFER_SIZE

        public static final int BUFFER_SIZE
        Buffer size variable: 1M elements, size of default matrix block
        See Also:
        Constant Field Values
    • Constructor Detail

      • FrameBlock

        public FrameBlock()
      • FrameBlock

        public FrameBlock​(FrameBlock that)
        Copy constructor for frame blocks, which uses a shallow copy for the schema (column types and names) but a deep copy for meta data and actual column data.
        Parameters:
        that - frame block
      • FrameBlock

        public FrameBlock​(Types.ValueType[] schema,
                          String constant,
                          int nRow)
        FrameBlock constructor with constant
        Parameters:
        schema - The schema to allocate (also specifying number of columns)
        constant - The constant to allocate in all cells
        nRow - the number of rows
      • FrameBlock

        public FrameBlock​(Types.ValueType[] schema,
                          String[] names,
                          String[][] data)
        allocate a FrameBlock with the given data arrays. The data is in row major, making the first dimension number of rows. second number of columns.
        Parameters:
        schema - the schema to allocate
        names - The names of the column
        data - The data.
    • Method Detail

      • getNumRows

        public int getNumRows()
        Get the number of rows of the frame block.
        Specified by:
        getNumRows in interface CacheBlock<FrameBlock>
        Returns:
        number of rows
      • getDouble

        public double getDouble​(int r,
                                int c)
        Description copied from interface: CacheBlock
        Returns the double value at the passed row and column. If the value is missing 0 is returned.
        Specified by:
        getDouble in interface CacheBlock<FrameBlock>
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        double value at the passed row and column
      • getDoubleNaN

        public double getDoubleNaN​(int r,
                                   int c)
        Description copied from interface: CacheBlock
        Returns the double value at the passed row and column. If the value is missing NaN is returned.
        Specified by:
        getDoubleNaN in interface CacheBlock<FrameBlock>
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        double value at the passed row and column
      • getString

        public String getString​(int r,
                                int c)
        Description copied from interface: CacheBlock
        Returns the string of the value at the passed row and column. If the value is missing or NaN, null is returned.
        Specified by:
        getString in interface CacheBlock<FrameBlock>
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        string of the value at the passed row and column
      • getNumColumns

        public int getNumColumns()
        Get the number of columns of the frame block, that is the number of columns defined in the schema.
        Specified by:
        getNumColumns in interface CacheBlock<FrameBlock>
        Returns:
        number of columns
      • getSchema

        public Types.ValueType[] getSchema()
        Returns the schema of the frame block.
        Returns:
        schema as array of ValueTypes
      • setSchema

        public void setSchema​(Types.ValueType[] schema)
        Sets the schema of the frame block.
        Parameters:
        schema - schema as array of ValueTypes
      • getColumnNames

        public String[] getColumnNames()
        Returns the column names of the frame block. This method allocates default column names if required.
        Returns:
        column names
      • getColumnNamesAsFrame

        public FrameBlock getColumnNamesAsFrame()
      • getColumnNames

        public String[] getColumnNames​(boolean alloc)
        Returns the column names of the frame block. This method allocates default column names if required.
        Parameters:
        alloc - if true, create column names
        Returns:
        array of column names
      • getColumnName

        public String getColumnName​(int c)
        Returns the column name for the requested column. This method allocates default column names if required.
        Parameters:
        c - column index
        Returns:
        column name
      • setColumnNames

        public void setColumnNames​(String[] colnames)
      • setColumnName

        public void setColumnName​(int index,
                                  String name)
      • getColumnMetadata

        public ColumnMetadata getColumnMetadata​(int c)
      • getColumns

        public Array<?>[] getColumns()
      • isColumnMetadataDefault

        public boolean isColumnMetadataDefault()
      • isColumnMetadataDefault

        public boolean isColumnMetadataDefault​(int c)
      • setColumnMetadata

        public void setColumnMetadata​(ColumnMetadata[] colmeta)
      • setColumnMetadata

        public void setColumnMetadata​(int c,
                                      ColumnMetadata colmeta)
      • getColumnNameIDMap

        public Map<String,​Integer> getColumnNameIDMap()
        Creates a mapping from column names to column IDs, i.e., 1-based column indexes
        Returns:
        map of column name keys and id values
      • ensureAllocatedColumns

        public void ensureAllocatedColumns​(int numRows)
        Allocate column data structures if necessary, i.e., if schema specified but not all column data structures created yet.
        Parameters:
        numRows - number of rows
      • ensureColumnCompatibility

        public void ensureColumnCompatibility​(int newLen)
        Checks for matching column sizes in case of existing columns. If the check parses the number of rows is reassigned to the given newLen
        Parameters:
        newLen - number of rows to compare with existing number of rows
      • createColNames

        public static String[] createColNames​(int size)
      • createColNames

        public static String[] createColNames​(int off,
                                              int size)
      • createColName

        public static String createColName​(int i)
      • isColNamesDefault

        public boolean isColNamesDefault()
      • isColNameDefault

        public boolean isColNameDefault​(int i)
      • recomputeColumnCardinality

        public void recomputeColumnCardinality()
      • get

        public Object get​(int r,
                          int c)
        Gets a boxed object of the value in position (r,c).
        Parameters:
        r - row index, 0-based
        c - column index, 0-based
        Returns:
        object of the value at specified position
      • set

        public void set​(int r,
                        int c,
                        Object val)
        Sets the value in position (r,c), where the input is assumed to be a boxed object consistent with the schema definition.
        Parameters:
        r - row index
        c - column index
        val - value to set at specified position
      • set

        public void set​(int r,
                        int c,
                        String val)
        Sets the value in position (r,c), to the input string value, and at the individual arrays, convert to correct type.
        Parameters:
        r - row index
        c - column index
        val - value to set at specified position
      • reset

        public void reset​(int nrow,
                          boolean clearMeta)
      • reset

        public void reset()
      • appendRow

        public void appendRow​(Object[] row)
        Append a row to the end of the data frame, where all row fields are boxed objects according to the schema. Append row should be avoided if possible.
        Parameters:
        row - array of objects
      • appendRow

        public void appendRow​(String[] row)
        Append a row to the end of the data frame, where all row fields are string encoded. Append row should be avoided if possible
        Parameters:
        row - array of strings
      • appendColumn

        public void appendColumn​(String[] col)
        Append a column of value type STRING as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of strings
      • appendColumn

        public void appendColumn​(boolean[] col)
        Append a column of value type BOOLEAN as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of booleans
      • appendColumn

        public void appendColumn​(int[] col)
        Append a column of value type INT as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of longs
      • appendColumn

        public void appendColumn​(long[] col)
        Append a column of value type LONG as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of longs
      • appendColumn

        public void appendColumn​(float[] col)
        Append a column of value type float as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of doubles
      • appendColumn

        public void appendColumn​(double[] col)
        Append a column of value type DOUBLE as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of doubles
      • appendColumns

        public void appendColumns​(double[][] cols)
        Append a set of column of value type DOUBLE at the end of the frame in order to avoid repeated allocation with appendColumns. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        cols - 2d array of doubles
      • appendColumn

        public void appendColumn​(Array col)
        Add a column of already allocated Array type.
        Parameters:
        col - column to add.
      • getColumnData

        public Object getColumnData​(int c)
      • getColumn

        public Array<?> getColumn​(int c)
      • setColumn

        public void setColumn​(int c,
                              Array<?> column)
      • readFields

        public void readFields​(DataInput in)
                        throws IOException
        Specified by:
        readFields in interface org.apache.hadoop.io.Writable
        Throws:
        IOException
      • getInMemorySize

        public long getInMemorySize()
        Description copied from interface: CacheBlock
        Get the in-memory size in bytes of the cache block.
        Specified by:
        getInMemorySize in interface CacheBlock<FrameBlock>
        Returns:
        in-memory size in bytes of cache block
      • getExactSerializedSize

        public long getExactSerializedSize()
        Description copied from interface: CacheBlock
        Get the exact serialized size in bytes of the cache block.
        Specified by:
        getExactSerializedSize in interface CacheBlock<FrameBlock>
        Returns:
        exact serialized size in bytes of cache block
      • isShallowSerialize

        public boolean isShallowSerialize()
        Description copied from interface: CacheBlock
        Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.
        Specified by:
        isShallowSerialize in interface CacheBlock<FrameBlock>
        Returns:
        true if shallow serialized
      • isShallowSerialize

        public boolean isShallowSerialize​(boolean inclConvert)
        Description copied from interface: CacheBlock
        Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.
        Specified by:
        isShallowSerialize in interface CacheBlock<FrameBlock>
        Parameters:
        inclConvert - if true report blocks as shallow serialize that are currently not amenable but can be brought into an amenable form via toShallowSerializeBlock.
        Returns:
        true if shallow serialized
      • toShallowSerializeBlock

        public void toShallowSerializeBlock()
        Description copied from interface: CacheBlock
        Converts a cache block that is not shallow serializable into a form that is shallow serializable. This methods has no affect if the given cache block is not amenable.
        Specified by:
        toShallowSerializeBlock in interface CacheBlock<FrameBlock>
      • binaryOperations

        public FrameBlock binaryOperations​(BinaryOperator bop,
                                           FrameBlock that,
                                           FrameBlock out)
        This method performs the value comparison on two frames if the values in both frames are equal, not equal, less than, greater than, less than/greater than and equal to the output frame will store boolean value for each each comparison
        Parameters:
        bop - binary operator
        that - frame block of rhs of m * n dimensions
        out - output frame block
        Returns:
        a boolean frameBlock
      • slice

        public final FrameBlock slice​(IndexRange ixrange,
                                      FrameBlock ret)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<FrameBlock>
        Parameters:
        ixrange - index range inclusive
        ret - outputBlock
        Returns:
        sub-block of cache block
      • slice

        public final FrameBlock slice​(int rl,
                                      int ru)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<FrameBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        Returns:
        sub-block of cache block
      • slice

        public final FrameBlock slice​(int rl,
                                      int ru,
                                      boolean deep)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<FrameBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        deep - enforce deep-copy
        Returns:
        sub-block of cache block
      • slice

        public final FrameBlock slice​(int rl,
                                      int ru,
                                      int cl,
                                      int cu)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<FrameBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        Returns:
        sub-block of cache block
      • slice

        public final FrameBlock slice​(int rl,
                                      int ru,
                                      int cl,
                                      int cu,
                                      FrameBlock ret)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<FrameBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        ret - cache block
        Returns:
        sub-block of cache block
      • slice

        public final FrameBlock slice​(int rl,
                                      int ru,
                                      int cl,
                                      int cu,
                                      boolean deep)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<FrameBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        deep - enforce deep-copy
        Returns:
        sub-block of cache block
      • slice

        public FrameBlock slice​(int rl,
                                int ru,
                                int cl,
                                int cu,
                                boolean deep,
                                FrameBlock ret)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<FrameBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        deep - enforce deep-copy
        ret - cache block
        Returns:
        sub-block of cache block
      • append

        public FrameBlock append​(FrameBlock that,
                                 boolean cbind)
        Appends the given argument FrameBlock 'that' to this FrameBlock by creating a deep copy to prevent side effects. For cbind, the frames are appended column-wise (same number of rows), while for rbind the frames are appended row-wise (same number of columns).
        Parameters:
        that - frame block to append to current frame block
        cbind - if true, column append
        Returns:
        frame block
      • copy

        public void copy​(int rl,
                         int ru,
                         int cl,
                         int cu,
                         FrameBlock src)
        Copy src matrix into the index range of the existing current matrix.
        Parameters:
        rl - row start
        ru - row end inclusive
        cl - col start
        cu - col end inclusive
        src - source FrameBlock
      • getRecodeMap

        public HashMap<Object,​Long> getRecodeMap​(int col)
        This function will split every Recode map in the column using delimiter Lop.DATATYPE_PREFIX, as Recode map generated earlier in the form of Code+Lop.DATATYPE_PREFIX+Token and store it in a map which contains token and code for every unique tokens.
        Parameters:
        col - is the column # from frame data which contains Recode map generated earlier.
        Returns:
        map of token and code for every element in the input column of a frame containing Recode map
      • merge

        public void merge​(FrameBlock that,
                          boolean appendOnly)
        Description copied from interface: CacheBlock
        Merge the given block into the current block. Both blocks needs to be of equal dimensions and contain disjoint non-zero cells.
        Specified by:
        merge in interface CacheBlock<FrameBlock>
        Parameters:
        that - cache block
        appendOnly - ?
      • zeroOutOperations

        public FrameBlock zeroOutOperations​(FrameBlock result,
                                            IndexRange range,
                                            boolean complementary,
                                            int iRowStartSrc,
                                            int iRowStartDest,
                                            int blen,
                                            int iMaxRowsToCopy)
        This function ZERO OUT the data in the slicing window applicable for this block.
        Parameters:
        result - frame block
        range - index range
        complementary - ?
        iRowStartSrc - ?
        iRowStartDest - ?
        blen - ?
        iMaxRowsToCopy - ?
        Returns:
        frame block
      • getSchemaTypeOf

        public FrameBlock getSchemaTypeOf()
      • detectSchema

        public final FrameBlock detectSchema​(int k)
      • detectSchema

        public final FrameBlock detectSchema​(double sampleFraction,
                                             int k)
      • dropInvalidType

        public FrameBlock dropInvalidType​(FrameBlock schema)
        Drop the cell value which does not confirms to the data type of its column
        Parameters:
        schema - of the frame
        Returns:
        original frame where invalid values are replaced with null
      • invalidByLength

        public FrameBlock invalidByLength​(MatrixBlock feaLen)
        This method validates the frame data against an attribute length constrain if data value in any cell is greater than the specified threshold of that attribute the output frame will store a null on that cell position, thus removing the length-violating values.
        Parameters:
        feaLen - vector of valid lengths
        Returns:
        FrameBlock with invalid values converted into missing values (null)
      • removeEmptyOperations

        public FrameBlock removeEmptyOperations​(boolean rows,
                                                boolean emptyReturn,
                                                MatrixBlock select)