Package org.apache.sysds.runtime.data
Class TensorBlock
- java.lang.Object
-
- org.apache.sysds.runtime.data.TensorBlock
-
- All Implemented Interfaces:
Externalizable
,Serializable
,org.apache.hadoop.io.Writable
,CacheBlock<TensorBlock>
public class TensorBlock extends Object implements CacheBlock<TensorBlock>, Externalizable
ATensorBlock
is the most top level representation of a tensor. There are two types of data representation which can be used: Basic/Homogeneous and Data/Heterogeneous Basic supports only oneValueType
, while Data supports multipleValueType
s along the column axis. The format determines if theTensorBlock
uses aBasicTensorBlock
or aDataTensorBlock
for storing the data.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static int[]
DEFAULT_DIMS
static Types.ValueType
DEFAULT_VTYPE
-
Constructor Summary
Constructors Constructor Description TensorBlock()
Create aTensorBlock
with [0,0] dimension and homogeneous representation (aka.TensorBlock(double value)
Create a [1,1] basic FP64TensorBlock
containing the given value.TensorBlock(int[] dims, boolean basic)
Create aTensorBlock
with the given dimensions and the given data representation (basic/data).TensorBlock(Types.ValueType[] schema, int[] dims)
Create a dataTensorBlock
with the given schema and the given dimensions.TensorBlock(Types.ValueType vt, int[] dims)
Create a basicTensorBlock
with the givenValueType
and the given dimensions.TensorBlock(BasicTensorBlock basicTensor)
Wrap the givenBasicTensorBlock
inside aTensorBlock
.TensorBlock(DataTensorBlock dataTensor)
Wrap the givenDataTensorBlock
inside aTensorBlock
.TensorBlock(TensorBlock that)
Copy constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description TensorBlock
allocateBlock()
If data is not yet allocated, allocate.TensorBlock
binaryOperations(BinaryOperator op, TensorBlock thatValue, TensorBlock result)
void
compactEmptyBlock()
Free unnecessarily allocated empty block.TensorBlock
copy(int[] lower, int[] upper, TensorBlock src)
Copy a part of anotherTensorBlock
TensorBlock
copy(TensorBlock src)
TensorBlock
copyExact(int[] lower, int[] upper, TensorBlock src)
Copy a part of anotherTensorBlock
.Object
get(int[] ix)
double
get(int r, int c)
BasicTensorBlock
getBasicTensor()
DataCharacteristics
getDataCharacteristics()
DataTensorBlock
getDataTensor()
int
getDim(int i)
int[]
getDims()
double
getDouble(int r, int c)
Returns the double value at the passed row and column.double
getDoubleNaN(int r, int c)
Returns the double value at the passed row and column.long
getExactBlockDataSerializedSize(BasicTensorBlock bt)
Get the exact serialized size of aBasicTensorBlock
if written byTensorBlock.writeBlockData(DataOutput,BasicTensorBlock)
.long
getExactSerializedSize()
Get the exact serialized size in bytes of the cache block.long
getInMemorySize()
Get the in-memory size in bytes of the cache block.long
getLength()
long[]
getLongDims()
void
getNextIndexes(int[] ix)
Calculates the next index array.static void
getNextIndexes(int[] dims, int[] ix)
Calculates the next index array.long
getNonZeros()
int
getNumColumns()
int
getNumDims()
int
getNumRows()
Types.ValueType[]
getSchema()
Get the schema if thisTensorBlock
is heterogeneous.String
getString(int r, int c)
Returns the string of the value at the passed row and column.Types.ValueType
getValueType()
Get theValueType
if thisTensorBlock
is homogeneous.boolean
isAllocated()
boolean
isBasic()
boolean
isEmpty()
boolean
isEmpty(boolean safe)
boolean
isMatrix()
boolean
isShallowSerialize()
Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.boolean
isShallowSerialize(boolean inclConvert)
Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.boolean
isVector()
TensorBlock
merge(TensorBlock that, boolean appendOnly)
Merge disjoint: merges all non-zero values of the given input into the current block.void
readExternal(ObjectInput in)
void
readFields(DataInput in)
void
reset()
Reset all cells to 0.void
reset(int[] dims)
Reset data with new dimensions.static Types.ValueType
resultValueType(Types.ValueType in1, Types.ValueType in2)
void
set(int[] ix, Object v)
Set a cell to the value given as an `Object`.void
set(int r, int c, double v)
Set a cell in a 2-dimensional tensor.void
set(Object v)
void
set(MatrixBlock other)
TensorBlock
slice(int[] offsets, TensorBlock outBlock)
Slice the current block and write into the outBlock.TensorBlock
slice(int rl, int ru)
Slice a sub block out of the current block and write into the given output block.TensorBlock
slice(int rl, int ru, boolean deep)
Slice a sub block out of the current block and write into the given output block.TensorBlock
slice(int rl, int ru, int cl, int cu)
Slice a sub block out of the current block and write into the given output block.TensorBlock
slice(int rl, int ru, int cl, int cu, boolean deep)
Slice a sub block out of the current block and write into the given output block.TensorBlock
slice(int rl, int ru, int cl, int cu, boolean deep, TensorBlock block)
Slice a sub block out of the current block and write into the given output block.TensorBlock
slice(int rl, int ru, int cl, int cu, TensorBlock ret)
Slice a sub block out of the current block and write into the given output block.TensorBlock
slice(IndexRange ixrange, TensorBlock ret)
Slice a sub block out of the current block and write into the given output block.void
toShallowSerializeBlock()
Converts a cache block that is not shallow serializable into a form that is shallow serializable.void
write(DataOutput out)
void
writeBlockData(DataOutput out, BasicTensorBlock bt)
Write aBasicTensorBlock
.void
writeExternal(ObjectOutput out)
-
-
-
Field Detail
-
DEFAULT_DIMS
public static final int[] DEFAULT_DIMS
-
DEFAULT_VTYPE
public static final Types.ValueType DEFAULT_VTYPE
-
-
Constructor Detail
-
TensorBlock
public TensorBlock()
Create aTensorBlock
with [0,0] dimension and homogeneous representation (aka. basic).
-
TensorBlock
public TensorBlock(int[] dims, boolean basic)
Create aTensorBlock
with the given dimensions and the given data representation (basic/data).- Parameters:
dims
- dimensionsbasic
- if true then basicTensorBlock
else a data type ofTensorBlock
.
-
TensorBlock
public TensorBlock(Types.ValueType vt, int[] dims)
Create a basicTensorBlock
with the givenValueType
and the given dimensions.- Parameters:
vt
- value typedims
- dimensions
-
TensorBlock
public TensorBlock(Types.ValueType[] schema, int[] dims)
Create a dataTensorBlock
with the given schema and the given dimensions.- Parameters:
schema
- schema of the columnsdims
- dimensions
-
TensorBlock
public TensorBlock(double value)
Create a [1,1] basic FP64TensorBlock
containing the given value.- Parameters:
value
- value to put inside
-
TensorBlock
public TensorBlock(BasicTensorBlock basicTensor)
Wrap the givenBasicTensorBlock
inside aTensorBlock
.- Parameters:
basicTensor
- basic tensor block
-
TensorBlock
public TensorBlock(DataTensorBlock dataTensor)
Wrap the givenDataTensorBlock
inside aTensorBlock
.- Parameters:
dataTensor
- basic tensor block
-
TensorBlock
public TensorBlock(TensorBlock that)
Copy constructor- Parameters:
that
-TensorBlock
to copy
-
-
Method Detail
-
reset
public void reset()
Reset all cells to 0.
-
reset
public void reset(int[] dims)
Reset data with new dimensions.- Parameters:
dims
- new dimensions
-
isBasic
public boolean isBasic()
-
isAllocated
public boolean isAllocated()
-
allocateBlock
public TensorBlock allocateBlock()
If data is not yet allocated, allocate.- Returns:
- this
TensorBlock
-
getBasicTensor
public BasicTensorBlock getBasicTensor()
-
getDataTensor
public DataTensorBlock getDataTensor()
-
getValueType
public Types.ValueType getValueType()
Get theValueType
if thisTensorBlock
is homogeneous.- Returns:
ValueType
if homogeneous, null otherwise
-
getSchema
public Types.ValueType[] getSchema()
Get the schema if thisTensorBlock
is heterogeneous.- Returns:
- value type if heterogeneous, null otherwise
-
getNumDims
public int getNumDims()
-
getNumRows
public int getNumRows()
- Specified by:
getNumRows
in interfaceCacheBlock<TensorBlock>
-
getNumColumns
public int getNumColumns()
- Specified by:
getNumColumns
in interfaceCacheBlock<TensorBlock>
-
getDataCharacteristics
public DataCharacteristics getDataCharacteristics()
- Specified by:
getDataCharacteristics
in interfaceCacheBlock<TensorBlock>
-
getInMemorySize
public long getInMemorySize()
Description copied from interface:CacheBlock
Get the in-memory size in bytes of the cache block.- Specified by:
getInMemorySize
in interfaceCacheBlock<TensorBlock>
- Returns:
- in-memory size in bytes of cache block
-
isShallowSerialize
public boolean isShallowSerialize()
Description copied from interface:CacheBlock
Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.- Specified by:
isShallowSerialize
in interfaceCacheBlock<TensorBlock>
- Returns:
- true if shallow serialized
-
isShallowSerialize
public boolean isShallowSerialize(boolean inclConvert)
Description copied from interface:CacheBlock
Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.- Specified by:
isShallowSerialize
in interfaceCacheBlock<TensorBlock>
- Parameters:
inclConvert
- if true report blocks as shallow serialize that are currently not amenable but can be brought into an amenable form viatoShallowSerializeBlock
.- Returns:
- true if shallow serialized
-
toShallowSerializeBlock
public void toShallowSerializeBlock()
Description copied from interface:CacheBlock
Converts a cache block that is not shallow serializable into a form that is shallow serializable. This methods has no affect if the given cache block is not amenable.- Specified by:
toShallowSerializeBlock
in interfaceCacheBlock<TensorBlock>
-
compactEmptyBlock
public void compactEmptyBlock()
Description copied from interface:CacheBlock
Free unnecessarily allocated empty block.- Specified by:
compactEmptyBlock
in interfaceCacheBlock<TensorBlock>
-
slice
public final TensorBlock slice(IndexRange ixrange, TensorBlock ret)
Description copied from interface:CacheBlock
Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.- Specified by:
slice
in interfaceCacheBlock<TensorBlock>
- Parameters:
ixrange
- index range inclusiveret
- outputBlock- Returns:
- sub-block of cache block
-
slice
public final TensorBlock slice(int rl, int ru)
Description copied from interface:CacheBlock
Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.- Specified by:
slice
in interfaceCacheBlock<TensorBlock>
- Parameters:
rl
- row lowerru
- row upper inclusive- Returns:
- sub-block of cache block
-
slice
public final TensorBlock slice(int rl, int ru, boolean deep)
Description copied from interface:CacheBlock
Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.- Specified by:
slice
in interfaceCacheBlock<TensorBlock>
- Parameters:
rl
- row lowerru
- row upper inclusivedeep
- enforce deep-copy- Returns:
- sub-block of cache block
-
slice
public final TensorBlock slice(int rl, int ru, int cl, int cu)
Description copied from interface:CacheBlock
Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.- Specified by:
slice
in interfaceCacheBlock<TensorBlock>
- Parameters:
rl
- row lowerru
- row upper inclusivecl
- column lowercu
- column upper inclusive- Returns:
- sub-block of cache block
-
slice
public final TensorBlock slice(int rl, int ru, int cl, int cu, TensorBlock ret)
Description copied from interface:CacheBlock
Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.- Specified by:
slice
in interfaceCacheBlock<TensorBlock>
- Parameters:
rl
- row lowerru
- row upper inclusivecl
- column lowercu
- column upper inclusiveret
- cache block- Returns:
- sub-block of cache block
-
slice
public final TensorBlock slice(int rl, int ru, int cl, int cu, boolean deep)
Description copied from interface:CacheBlock
Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.- Specified by:
slice
in interfaceCacheBlock<TensorBlock>
- Parameters:
rl
- row lowerru
- row upper inclusivecl
- column lowercu
- column upper inclusivedeep
- enforce deep-copy- Returns:
- sub-block of cache block
-
slice
public TensorBlock slice(int rl, int ru, int cl, int cu, boolean deep, TensorBlock block)
Description copied from interface:CacheBlock
Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.- Specified by:
slice
in interfaceCacheBlock<TensorBlock>
- Parameters:
rl
- row lowerru
- row upper inclusivecl
- column lowercu
- column upper inclusivedeep
- enforce deep-copyblock
- cache block- Returns:
- sub-block of cache block
-
merge
public TensorBlock merge(TensorBlock that, boolean appendOnly)
Description copied from interface:CacheBlock
Merge disjoint: merges all non-zero values of the given input into the current block. Note that this method does NOT check for overlapping entries; it's the callers responsibility of ensuring disjoint blocks. The appendOnly parameter is only relevant for sparse target blocks; if true, we only append values and do not sort sparse rows for each call; this is useful whenever we merge iterators of matrix blocks into one target block.- Specified by:
merge
in interfaceCacheBlock<TensorBlock>
- Parameters:
that
- cache blockappendOnly
- Indicate if the merger can be append only on sparse rows.- Returns:
- the merged group, in most implementations 'this' is modified.
-
getDouble
public double getDouble(int r, int c)
Description copied from interface:CacheBlock
Returns the double value at the passed row and column. If the value is missing 0 is returned.- Specified by:
getDouble
in interfaceCacheBlock<TensorBlock>
- Parameters:
r
- row of the valuec
- column of the value- Returns:
- double value at the passed row and column
-
getDoubleNaN
public double getDoubleNaN(int r, int c)
Description copied from interface:CacheBlock
Returns the double value at the passed row and column. If the value is missing NaN is returned.- Specified by:
getDoubleNaN
in interfaceCacheBlock<TensorBlock>
- Parameters:
r
- row of the valuec
- column of the value- Returns:
- double value at the passed row and column
-
getString
public String getString(int r, int c)
Description copied from interface:CacheBlock
Returns the string of the value at the passed row and column. If the value is missing or NaN, null is returned.- Specified by:
getString
in interfaceCacheBlock<TensorBlock>
- Parameters:
r
- row of the valuec
- column of the value- Returns:
- string of the value at the passed row and column
-
getDim
public int getDim(int i)
-
getDims
public int[] getDims()
-
getLongDims
public long[] getLongDims()
-
getNextIndexes
public static void getNextIndexes(int[] dims, int[] ix)
Calculates the next index array. Note that if the given index array was the last element, the next index will be the first one.- Parameters:
dims
- the dims array for which we have to decide the next indexix
- the index array which will be incremented to the next index array
-
getNextIndexes
public void getNextIndexes(int[] ix)
Calculates the next index array. Note that if the given index array was the last element, the next index will be the first one.- Parameters:
ix
- the index array which will be incremented to the next index array
-
isVector
public boolean isVector()
-
isMatrix
public boolean isMatrix()
-
getLength
public long getLength()
-
isEmpty
public boolean isEmpty()
-
isEmpty
public boolean isEmpty(boolean safe)
-
getNonZeros
public long getNonZeros()
-
get
public Object get(int[] ix)
-
get
public double get(int r, int c)
-
set
public void set(Object v)
-
set
public void set(MatrixBlock other)
-
set
public void set(int[] ix, Object v)
Set a cell to the value given as an `Object`.- Parameters:
ix
- indexes in each dimension, starting with 0v
- value to set
-
set
public void set(int r, int c, double v)
Set a cell in a 2-dimensional tensor.- Parameters:
r
- row of the cellc
- column of the cellv
- value to set
-
slice
public TensorBlock slice(int[] offsets, TensorBlock outBlock)
Slice the current block and write into the outBlock. The offsets determines where the slice starts, the length of the blocks is given by the outBlock dimensions.- Parameters:
offsets
- offsets where the slice startsoutBlock
- sliced result block- Returns:
- the sliced result block
-
copy
public TensorBlock copy(TensorBlock src)
-
copy
public TensorBlock copy(int[] lower, int[] upper, TensorBlock src)
Copy a part of anotherTensorBlock
- Parameters:
lower
- lower index of elements to copy (inclusive)upper
- upper index of elements to copy (exclusive)src
- sourceTensorBlock
- Returns:
- the shallow copy of the src
TensorBlock
-
copyExact
public TensorBlock copyExact(int[] lower, int[] upper, TensorBlock src)
Copy a part of anotherTensorBlock
. The difference tocopy()
is that this allows for exact sub-blocks instead of taking all consecutive data elements from lower to upper.- Parameters:
lower
- lower index of elements to copy (inclusive)upper
- upper index of elements to copy (exclusive)src
- sourceTensorBlock
- Returns:
- the deep copy of the src
TensorBlock
-
getExactSerializedSize
public long getExactSerializedSize()
Description copied from interface:CacheBlock
Get the exact serialized size in bytes of the cache block.- Specified by:
getExactSerializedSize
in interfaceCacheBlock<TensorBlock>
- Returns:
- exact serialized size in bytes of cache block
-
getExactBlockDataSerializedSize
public long getExactBlockDataSerializedSize(BasicTensorBlock bt)
Get the exact serialized size of aBasicTensorBlock
if written byTensorBlock.writeBlockData(DataOutput,BasicTensorBlock)
.- Parameters:
bt
-BasicTensorBlock
- Returns:
- the size of the block data in serialized form
-
write
public void write(DataOutput out) throws IOException
- Specified by:
write
in interfaceorg.apache.hadoop.io.Writable
- Throws:
IOException
-
writeBlockData
public void writeBlockData(DataOutput out, BasicTensorBlock bt) throws IOException
Write aBasicTensorBlock
.- Parameters:
out
- output streambt
- sourceBasicTensorBlock
- Throws:
IOException
- if writing with the output stream fails
-
readFields
public void readFields(DataInput in) throws IOException
- Specified by:
readFields
in interfaceorg.apache.hadoop.io.Writable
- Throws:
IOException
-
writeExternal
public void writeExternal(ObjectOutput out) throws IOException
- Specified by:
writeExternal
in interfaceExternalizable
- Throws:
IOException
-
readExternal
public void readExternal(ObjectInput in) throws IOException
- Specified by:
readExternal
in interfaceExternalizable
- Throws:
IOException
-
binaryOperations
public TensorBlock binaryOperations(BinaryOperator op, TensorBlock thatValue, TensorBlock result)
-
resultValueType
public static Types.ValueType resultValueType(Types.ValueType in1, Types.ValueType in2)
-
-