Class FrameBlock

    • Constructor Detail

      • FrameBlock

        public FrameBlock()
      • FrameBlock

        public FrameBlock​(FrameBlock that)
        Copy constructor for frame blocks, which uses a shallow copy for the schema (column types and names) but a deep copy for meta data and actual column data.
        Parameters:
        that - frame block
    • Method Detail

      • getNumRows

        public int getNumRows()
        Get the number of rows of the frame block.
        Specified by:
        getNumRows in interface CacheBlock
        Returns:
        number of rows
      • getDouble

        public double getDouble​(int r,
                                int c)
        Description copied from interface: CacheBlock
        Returns the double value at the passed row and column. If the value is missing 0 is returned.
        Specified by:
        getDouble in interface CacheBlock
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        double value at the passed row and column
      • getDoubleNaN

        public double getDoubleNaN​(int r,
                                   int c)
        Description copied from interface: CacheBlock
        Returns the double value at the passed row and column. If the value is missing NaN is returned.
        Specified by:
        getDoubleNaN in interface CacheBlock
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        double value at the passed row and column
      • getString

        public String getString​(int r,
                                int c)
        Description copied from interface: CacheBlock
        Returns the string of the value at the passed row and column. If the value is missing or NaN, null is returned.
        Specified by:
        getString in interface CacheBlock
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        string of the value at the passed row and column
      • setNumRows

        public void setNumRows​(int numRows)
      • getNumColumns

        public int getNumColumns()
        Get the number of columns of the frame block, that is the number of columns defined in the schema.
        Specified by:
        getNumColumns in interface CacheBlock
        Returns:
        number of columns
      • getSchema

        public Types.ValueType[] getSchema()
        Returns the schema of the frame block.
        Returns:
        schema as array of ValueTypes
      • setSchema

        public void setSchema​(Types.ValueType[] schema)
        Sets the schema of the frame block.
        Parameters:
        schema - schema as array of ValueTypes
      • getColumnNames

        public String[] getColumnNames()
        Returns the column names of the frame block. This method allocates default column names if required.
        Returns:
        column names
      • getColumnNamesAsFrame

        public FrameBlock getColumnNamesAsFrame()
      • getColumnNames

        public String[] getColumnNames​(boolean alloc)
        Returns the column names of the frame block. This method allocates default column names if required.
        Parameters:
        alloc - if true, create column names
        Returns:
        array of column names
      • getColumnName

        public String getColumnName​(int c)
        Returns the column name for the requested column. This method allocates default column names if required.
        Parameters:
        c - column index
        Returns:
        column name
      • setColumnNames

        public void setColumnNames​(String[] colnames)
      • isColumnMetadataDefault

        public boolean isColumnMetadataDefault()
      • isColumnMetadataDefault

        public boolean isColumnMetadataDefault​(int c)
      • getColumnNameIDMap

        public Map<String,​Integer> getColumnNameIDMap()
        Creates a mapping from column names to column IDs, i.e., 1-based column indexes
        Returns:
        map of column name keys and id values
      • ensureAllocatedColumns

        public void ensureAllocatedColumns​(int numRows)
        Allocate column data structures if necessary, i.e., if schema specified but not all column data structures created yet.
        Parameters:
        numRows - number of rows
      • ensureColumnCompatibility

        public void ensureColumnCompatibility​(int newlen)
        Checks for matching column sizes in case of existing columns.
        Parameters:
        newlen - number of rows to compare with existing number of rows
      • createColNames

        public static String[] createColNames​(int size)
      • createColNames

        public static String[] createColNames​(int off,
                                              int size)
      • createColName

        public static String createColName​(int i)
      • isColNamesDefault

        public boolean isColNamesDefault()
      • isColNameDefault

        public boolean isColNameDefault​(int i)
      • recomputeColumnCardinality

        public void recomputeColumnCardinality()
      • get

        public Object get​(int r,
                          int c)
        Gets a boxed object of the value in position (r,c).
        Parameters:
        r - row index, 0-based
        c - column index, 0-based
        Returns:
        object of the value at specified position
      • set

        public void set​(int r,
                        int c,
                        Object val)
        Sets the value in position (r,c), where the input is assumed to be a boxed object consistent with the schema definition.
        Parameters:
        r - row index
        c - column index
        val - value to set at specified position
      • reset

        public void reset​(int nrow,
                          boolean clearMeta)
      • reset

        public void reset()
      • appendRow

        public void appendRow​(Object[] row)
        Append a row to the end of the data frame, where all row fields are boxed objects according to the schema.
        Parameters:
        row - array of objects
      • appendRow

        public void appendRow​(String[] row)
        Append a row to the end of the data frame, where all row fields are string encoded.
        Parameters:
        row - array of strings
      • appendColumn

        public void appendColumn​(String[] col)
        Append a column of value type STRING as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of strings
      • appendColumn

        public void appendColumn​(boolean[] col)
        Append a column of value type BOOLEAN as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of booleans
      • appendColumn

        public void appendColumn​(int[] col)
        Append a column of value type INT as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of longs
      • appendColumn

        public void appendColumn​(long[] col)
        Append a column of value type LONG as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of longs
      • appendColumn

        public void appendColumn​(float[] col)
        Append a column of value type float as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of doubles
      • appendColumn

        public void appendColumn​(double[] col)
        Append a column of value type DOUBLE as the last column of the data frame. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        col - array of doubles
      • appendColumns

        public void appendColumns​(double[][] cols)
        Append a set of column of value type DOUBLE at the end of the frame in order to avoid repeated allocation with appendColumns. The given array is wrapped but not copied and hence might be updated in the future.
        Parameters:
        cols - 2d array of doubles
      • appendColumn

        public void appendColumn​(Types.ValueType vt,
                                 org.apache.sysds.runtime.matrix.data.FrameBlock.Array col)
      • getColumnData

        public Object getColumnData​(int c)
      • getColumnType

        public String getColumnType​(int c)
      • getIndexAsBytes

        public byte[] getIndexAsBytes​(int c,
                                      int r)
        Get a specific index as bytes, this method is used to parse the strings into Python. It should only be used in columns where the datatype is String. Since in other cases it might be faster to return other types. Note that P
        Parameters:
        c - The column index.
        r - The row index.
        Returns:
        The returned byte array.
      • getColumnAsBytes

        public byte[] getColumnAsBytes​(int c)
      • getColumn

        public org.apache.sysds.runtime.matrix.data.FrameBlock.Array getColumn​(int c)
      • setColumn

        public void setColumn​(int c,
                              org.apache.sysds.runtime.matrix.data.FrameBlock.Array column)
      • getStringRowIterator

        public Iterator<String[]> getStringRowIterator()
        Get a row iterator over the frame where all fields are encoded as strings independent of their value types.
        Returns:
        string array iterator
      • getStringRowIterator

        public Iterator<String[]> getStringRowIterator​(int[] cols)
        Get a row iterator over the frame where all selected fields are encoded as strings independent of their value types.
        Parameters:
        cols - column selection, 1-based
        Returns:
        string array iterator
      • getStringRowIterator

        public Iterator<String[]> getStringRowIterator​(int colID)
        Get a row iterator over the frame where all selected fields are encoded as strings independent of their value types.
        Parameters:
        colID - column selection, 1-based
        Returns:
        string array iterator
      • getStringRowIterator

        public Iterator<String[]> getStringRowIterator​(int rl,
                                                       int ru)
        Get a row iterator over the frame where all fields are encoded as strings independent of their value types.
        Parameters:
        rl - lower row index
        ru - upper row index
        Returns:
        string array iterator
      • getStringRowIterator

        public Iterator<String[]> getStringRowIterator​(int rl,
                                                       int ru,
                                                       int[] cols)
        Get a row iterator over the frame where all selected fields are encoded as strings independent of their value types.
        Parameters:
        rl - lower row index
        ru - upper row index
        cols - column selection, 1-based
        Returns:
        string array iterator
      • getStringRowIterator

        public Iterator<String[]> getStringRowIterator​(int rl,
                                                       int ru,
                                                       int colID)
        Get a row iterator over the frame where all selected fields are encoded as strings independent of their value types.
        Parameters:
        rl - lower row index
        ru - upper row index
        colID - columnID, 1-based
        Returns:
        string array iterator
      • getObjectRowIterator

        public Iterator<Object[]> getObjectRowIterator()
        Get a row iterator over the frame where all fields are encoded as boxed objects according to their value types.
        Returns:
        object array iterator
      • getObjectRowIterator

        public Iterator<Object[]> getObjectRowIterator​(Types.ValueType[] schema)
        Get a row iterator over the frame where all fields are encoded as boxed objects according to the value types of the provided target schema.
        Parameters:
        schema - target schema of objects
        Returns:
        object array iterator
      • getObjectRowIterator

        public Iterator<Object[]> getObjectRowIterator​(int[] cols)
        Get a row iterator over the frame where all selected fields are encoded as boxed objects according to their value types.
        Parameters:
        cols - column selection, 1-based
        Returns:
        object array iterator
      • getObjectRowIterator

        public Iterator<Object[]> getObjectRowIterator​(int rl,
                                                       int ru)
        Get a row iterator over the frame where all fields are encoded as boxed objects according to their value types.
        Parameters:
        rl - lower row index
        ru - upper row index
        Returns:
        object array iterator
      • getObjectRowIterator

        public Iterator<Object[]> getObjectRowIterator​(int rl,
                                                       int ru,
                                                       int[] cols)
        Get a row iterator over the frame where all selected fields are encoded as boxed objects according to their value types.
        Parameters:
        rl - lower row index
        ru - upper row index
        cols - column selection, 1-based
        Returns:
        object array iterator
      • readFields

        public void readFields​(DataInput in)
                        throws IOException
        Specified by:
        readFields in interface org.apache.hadoop.io.Writable
        Throws:
        IOException
      • getInMemorySize

        public long getInMemorySize()
        Description copied from interface: CacheBlock
        Get the in-memory size in bytes of the cache block.
        Specified by:
        getInMemorySize in interface CacheBlock
        Returns:
        in-memory size in bytes of cache block
      • getExactSerializedSize

        public long getExactSerializedSize()
        Description copied from interface: CacheBlock
        Get the exact serialized size in bytes of the cache block.
        Specified by:
        getExactSerializedSize in interface CacheBlock
        Returns:
        exact serialized size in bytes of cache block
      • isShallowSerialize

        public boolean isShallowSerialize()
        Description copied from interface: CacheBlock
        Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.
        Specified by:
        isShallowSerialize in interface CacheBlock
        Returns:
        true if shallow serialized
      • isShallowSerialize

        public boolean isShallowSerialize​(boolean inclConvert)
        Description copied from interface: CacheBlock
        Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.
        Specified by:
        isShallowSerialize in interface CacheBlock
        Parameters:
        inclConvert - if true report blocks as shallow serialize that are currently not amenable but can be brought into an amenable form via toShallowSerializeBlock.
        Returns:
        true if shallow serialized
      • toShallowSerializeBlock

        public void toShallowSerializeBlock()
        Description copied from interface: CacheBlock
        Converts a cache block that is not shallow serializable into a form that is shallow serializable. This methods has no affect if the given cache block is not amenable.
        Specified by:
        toShallowSerializeBlock in interface CacheBlock
      • compactEmptyBlock

        public void compactEmptyBlock()
        Description copied from interface: CacheBlock
        Free unnecessarily allocated empty block.
        Specified by:
        compactEmptyBlock in interface CacheBlock
      • binaryOperations

        public FrameBlock binaryOperations​(BinaryOperator bop,
                                           FrameBlock that,
                                           FrameBlock out)
        This method performs the value comparison on two frames if the values in both frames are equal, not equal, less than, greater than, less than/greater than and equal to the output frame will store boolean value for each each comparison
        Parameters:
        bop - binary operator
        that - frame block of rhs of m * n dimensions
        out - output frame block
        Returns:
        a boolean frameBlock
      • slice

        public FrameBlock slice​(int rl,
                                int ru,
                                int cl,
                                int cu,
                                CacheBlock retCache)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock
        Parameters:
        rl - row lower
        ru - row upper
        cl - column lower
        cu - column upper
        retCache - cache block
        Returns:
        sub-block of cache block
      • slice

        public FrameBlock slice​(int rl,
                                int ru,
                                int cl,
                                int cu,
                                boolean deep,
                                CacheBlock retCache)
        Right indexing operations to slice a subframe out of this frame block. Note that the existing column value types are preserved.
        Specified by:
        slice in interface CacheBlock
        Parameters:
        rl - row lower index, inclusive, 0-based
        ru - row upper index, inclusive, 0-based
        cl - column lower index, inclusive, 0-based
        cu - column upper index, inclusive, 0-based
        deep - enforce deep-copy
        retCache - cache block
        Returns:
        frame block
      • append

        public FrameBlock append​(FrameBlock that,
                                 FrameBlock ret,
                                 boolean cbind)
        Appends the given argument frameblock 'that' to this frameblock by creating a deep copy to prevent side effects. For cbind, the frames are appended column-wise (same number of rows), while for rbind the frames are appended row-wise (same number of columns).
        Parameters:
        that - frame block to append to current frame block
        ret - frame block to return, can be null
        cbind - if true, column append
        Returns:
        frame block
      • copy

        public void copy​(int rl,
                         int ru,
                         int cl,
                         int cu,
                         FrameBlock src)
      • getRecodeMap

        public HashMap<String,​Long> getRecodeMap​(int col)
        This function will split every Recode map in the column using delimiter Lop.DATATYPE_PREFIX, as Recode map generated earlier in the form of Code+Lop.DATATYPE_PREFIX+Token and store it in a map which contains token and code for every unique tokens.
        Parameters:
        col - is the column # from frame data which contains Recode map generated earlier.
        Returns:
        map of token and code for every element in the input column of a frame containing Recode map
      • merge

        public void merge​(CacheBlock that,
                          boolean bDummy)
        Description copied from interface: CacheBlock
        Merge the given block into the current block. Both blocks needs to be of equal dimensions and contain disjoint non-zero cells.
        Specified by:
        merge in interface CacheBlock
        Parameters:
        that - cache block
        bDummy - ?
      • zeroOutOperations

        public FrameBlock zeroOutOperations​(FrameBlock result,
                                            IndexRange range,
                                            boolean complementary,
                                            int iRowStartSrc,
                                            int iRowStartDest,
                                            int blen,
                                            int iMaxRowsToCopy)
        This function ZERO OUT the data in the slicing window applicable for this block.
        Parameters:
        result - frame block
        range - index range
        complementary - ?
        iRowStartSrc - ?
        iRowStartDest - ?
        blen - ?
        iMaxRowsToCopy - ?
        Returns:
        frame block
      • getSchemaTypeOf

        public FrameBlock getSchemaTypeOf()
      • detectSchemaFromRow

        public FrameBlock detectSchemaFromRow​(double sampleFraction)
      • dropInvalidType

        public FrameBlock dropInvalidType​(FrameBlock schema)
        Drop the cell value which does not confirms to the data type of its column
        Parameters:
        schema - of the frame
        Returns:
        original frame where invalid values are replaced with null
      • invalidByLength

        public FrameBlock invalidByLength​(MatrixBlock feaLen)
        This method validates the frame data against an attribute length constrain if data value in any cell is greater than the specified threshold of that attribute the output frame will store a null on that cell position, thus removing the length-violating values.
        Parameters:
        feaLen - vector of valid lengths
        Returns:
        FrameBlock with invalid values converted into missing values (null)
      • removeEmptyOperations

        public FrameBlock removeEmptyOperations​(boolean rows,
                                                boolean emptyReturn,
                                                MatrixBlock select)