Package org.apache.sysds.runtime.data
Class SparseBlockCOO
- java.lang.Object
-
- org.apache.sysds.runtime.data.SparseBlock
-
- org.apache.sysds.runtime.data.SparseBlockCOO
-
- All Implemented Interfaces:
Serializable
public class SparseBlockCOO extends SparseBlock
SparseBlock implementation that realizes a traditional 'coordinate matrix' representation, where the entire sparse block is stored as triples in three arrays: row indexes, column indexes, and values, where row indexes and colunm indexes are sorted in order to allow binary search. This format is very memory efficient for ultra-sparse matrices, allows fast incremental construction but has performance drawbacks for row-major access through our sparse block abstraction since there is no constant-time random access to individual rows. Similar to CSR, the nnz is limited to Integer.MAX_VALUE. In contrast to COO matrix formats with three arrays, we use 1+#dims arrays to represent the values and indexes of all dimensions.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.sysds.runtime.data.SparseBlock
SparseBlock.Type
-
-
Constructor Summary
Constructors Constructor Description SparseBlockCOO(int rlen)
SparseBlockCOO(int rlen, int capacity)
SparseBlockCOO(SparseBlock sblock)
Copy constructor sparse block abstraction.SparseBlockCOO(SparseRow[] rows, int nnz)
Copy constructor old sparse row representation.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
add(int r, int c, double v)
Add a value to a matrix cell (r,c).void
allocate(int r)
Allocate the underlying data structure holding non-zero values of row r if necessary.void
allocate(int r, int nnz)
Allocate the underlying data structure holding non-zero values of row r if necessary, w/ given size.void
allocate(int r, int ennz, int maxnnz)
Allocate the underlying data structure holding non-zero values of row r w/ the specified estimated nnz and max nnz.void
append(int r, int c, double v)
Append a value to the end of the physical representation.boolean
checkValidity(int rlen, int clen, long nnz, boolean strict)
Validate the correctness of the internal data structures of the different sparse block implementations.void
compact(int r)
Re-allocate physical row if physical size exceeds logical size plus resize factor.void
deleteIndexRange(int r, int cl, int cu)
Deletes all non-zero values of the given column range [cl,cu) in row r.static long
estimateSizeInMemory(long nrows, long ncols, double sparsity)
Get the estimated in-memory size of the sparse block in COO with the given dimensions w/o accounting for overallocation.SparseRow
get(int r)
Get values of row r in the format of a sparse row.double
get(int r, int c)
Get value of matrix cell (r,c).Iterator<IJV>
getIterator()
Get a non-zero iterator over the entire sparse block.Iterator<IJV>
getIterator(int ru)
Get a non-zero iterator over the partial sparse block [0,ru).Iterator<IJV>
getIterator(int rl, int ru)
Get a non-zero iterator over the subblock [rl, ru).int[]
indexes()
Get raw access to underlying array of column indices For use in GPU codeint[]
indexes(int r)
Get the sorted array of column indexes of all non-zero entries in row r.boolean
isAllocated(int r)
Indicates if the underlying data structure for a given row is already allocated.boolean
isContiguous()
Indicates if the underlying data structures returned by values and indexes are contiguous arrays, which can be exploited for more efficient operations.boolean
isEmpty(int r)
Get information if row r is empty, i.e., does not contain non-zero values.boolean
isThreadSafe()
Indicates if the underlying implementation allows thread-safe row updates if concurrent threads update disjoint rows.int
numRows()
Get the number of rows in the sparse block.int
pos(int r)
Get the starting position of row r in the indexes/values arrays returned by indexes(r) and values(r).int
posFIndexGT(int r, int c)
Get position of first column index greater than column c in row r.int
posFIndexGTE(int r, int c)
Get position of first column index greater than or equal column c in row r.int
posFIndexLTE(int r, int c)
Get position of first column index lower than or equal column c in row r.void
reset()
Clears the sparse block by deleting non-zero values.void
reset(int ennz, int maxnnz)
Clears the sparse block by deleting non-zero values.void
reset(int r, int ennz, int maxnnz)
Clears row r of the sparse block by deleting non-zero values.int[]
rowIndexes()
Get raw access to underlying array of row indices For use in GPU codeboolean
set(int r, int c, double v)
Set the value of a matrix cell (r,c).void
set(int r, SparseRow row, boolean deep)
Set the values of row r to the given sparse row.void
setIndexRange(int r, int cl, int cu, double[] v, int[] vix, int vpos, int vlen)
Sets a sparse array of non-zeros values and indexes into the column range [cl,cu) in row r.void
setIndexRange(int r, int cl, int cu, double[] v, int vix, int vlen)
Sets a dense array of non-zeros values into the column range [cl,cu) in row r.long
size()
Get the number of non-zero values in the sparse block.int
size(int r)
Get the number of non-zero values in row r.long
size(int rl, int ru)
Get the number of non-zeros values in the row range of [rl, ru).long
size(int rl, int ru, int cl, int cu)
Get the number of non-zeros values in the row and column range of [rl/cl, ru/cu);void
sort()
Sort all non-zero value/index pairs of the sparse block by row and column index.void
sort(int r)
Sort all non-zero value/index pairs of row r column index.String
toString()
double[]
values()
Get raw access to underlying array of values For use in GPU codedouble[]
values(int r)
Get the array of all non-zero entries in row r, sorted by their column indexes.-
Methods inherited from class org.apache.sysds.runtime.data.SparseBlock
isAligned, isAligned
-
-
-
-
Constructor Detail
-
SparseBlockCOO
public SparseBlockCOO(int rlen)
-
SparseBlockCOO
public SparseBlockCOO(int rlen, int capacity)
-
SparseBlockCOO
public SparseBlockCOO(SparseBlock sblock)
Copy constructor sparse block abstraction.- Parameters:
sblock
- sparse block to copy
-
SparseBlockCOO
public SparseBlockCOO(SparseRow[] rows, int nnz)
Copy constructor old sparse row representation.- Parameters:
rows
- array of sparse rowsnnz
- number of non-zeros
-
-
Method Detail
-
estimateSizeInMemory
public static long estimateSizeInMemory(long nrows, long ncols, double sparsity)
Get the estimated in-memory size of the sparse block in COO with the given dimensions w/o accounting for overallocation.- Parameters:
nrows
- number of rowsncols
- number of columnssparsity
- sparsity ratio- Returns:
- memory estimate
-
allocate
public void allocate(int r)
Description copied from class:SparseBlock
Allocate the underlying data structure holding non-zero values of row r if necessary.- Specified by:
allocate
in classSparseBlock
- Parameters:
r
- row index
-
allocate
public void allocate(int r, int nnz)
Description copied from class:SparseBlock
Allocate the underlying data structure holding non-zero values of row r if necessary, w/ given size.- Specified by:
allocate
in classSparseBlock
- Parameters:
r
- row indexnnz
- number of non-zeros
-
allocate
public void allocate(int r, int ennz, int maxnnz)
Description copied from class:SparseBlock
Allocate the underlying data structure holding non-zero values of row r w/ the specified estimated nnz and max nnz.- Specified by:
allocate
in classSparseBlock
- Parameters:
r
- row indexennz
- estimated non-zerosmaxnnz
- max non-zeros
-
compact
public void compact(int r)
Description copied from class:SparseBlock
Re-allocate physical row if physical size exceeds logical size plus resize factor.- Specified by:
compact
in classSparseBlock
- Parameters:
r
- row index
-
numRows
public int numRows()
Description copied from class:SparseBlock
Get the number of rows in the sparse block.- Specified by:
numRows
in classSparseBlock
- Returns:
- number of rows
-
isThreadSafe
public boolean isThreadSafe()
Description copied from class:SparseBlock
Indicates if the underlying implementation allows thread-safe row updates if concurrent threads update disjoint rows.- Specified by:
isThreadSafe
in classSparseBlock
- Returns:
- true if thread-safe row updates
-
isContiguous
public boolean isContiguous()
Description copied from class:SparseBlock
Indicates if the underlying data structures returned by values and indexes are contiguous arrays, which can be exploited for more efficient operations.- Specified by:
isContiguous
in classSparseBlock
- Returns:
- true if underlying data structures are contiguous arrays
-
isAllocated
public boolean isAllocated(int r)
Description copied from class:SparseBlock
Indicates if the underlying data structure for a given row is already allocated.- Specified by:
isAllocated
in classSparseBlock
- Parameters:
r
- row index- Returns:
- true if already allocated
-
checkValidity
public boolean checkValidity(int rlen, int clen, long nnz, boolean strict)
Description copied from class:SparseBlock
Validate the correctness of the internal data structures of the different sparse block implementations.- Specified by:
checkValidity
in classSparseBlock
- Parameters:
rlen
- number of rowsclen
- number of columnsnnz
- number of non zerosstrict
- enforce optional properties- Returns:
- true if the sparse block is valid wrt the corresponding format such as COO, CSR, MCSR.
-
reset
public void reset()
Description copied from class:SparseBlock
Clears the sparse block by deleting non-zero values. After this call all size() calls are guaranteed to return 0.- Specified by:
reset
in classSparseBlock
-
reset
public void reset(int ennz, int maxnnz)
Description copied from class:SparseBlock
Clears the sparse block by deleting non-zero values. After this call all size() calls are guaranteed to return 0.- Specified by:
reset
in classSparseBlock
- Parameters:
ennz
- estimated non-zerosmaxnnz
- max non-zeros
-
reset
public void reset(int r, int ennz, int maxnnz)
Description copied from class:SparseBlock
Clears row r of the sparse block by deleting non-zero values. After this call size(r) is guaranteed to return 0.- Specified by:
reset
in classSparseBlock
- Parameters:
r
- row indexennz
- estimated non-zerosmaxnnz
- max non-zeros
-
size
public long size()
Description copied from class:SparseBlock
Get the number of non-zero values in the sparse block.- Specified by:
size
in classSparseBlock
- Returns:
- number of non-zero values in sparse block
-
size
public int size(int r)
Description copied from class:SparseBlock
Get the number of non-zero values in row r.- Specified by:
size
in classSparseBlock
- Parameters:
r
- row index starting at 0- Returns:
- number of non-zero values in row r
-
size
public long size(int rl, int ru)
Description copied from class:SparseBlock
Get the number of non-zeros values in the row range of [rl, ru).- Specified by:
size
in classSparseBlock
- Parameters:
rl
- row lower indexru
- row upper index- Returns:
- number of non-zero values in the row range
-
size
public long size(int rl, int ru, int cl, int cu)
Description copied from class:SparseBlock
Get the number of non-zeros values in the row and column range of [rl/cl, ru/cu);- Specified by:
size
in classSparseBlock
- Parameters:
rl
- row lower indexru
- row upper indexcl
- column lower indexcu
- column upper index- Returns:
- number of non-zero values in the row and column range
-
isEmpty
public boolean isEmpty(int r)
Description copied from class:SparseBlock
Get information if row r is empty, i.e., does not contain non-zero values. Equivalent to size(r)==0. Users should do this check if it is unknown if the underlying row data structure is allocated.- Specified by:
isEmpty
in classSparseBlock
- Parameters:
r
- row index starting at 0- Returns:
- true if row does not contain non-zero values
-
indexes
public int[] indexes(int r)
Description copied from class:SparseBlock
Get the sorted array of column indexes of all non-zero entries in row r. Note that - for flexibility of the implementing format - the returned array may be larger, where the range for row r is given by [pos(r),pos(r)+size(r)).- Specified by:
indexes
in classSparseBlock
- Parameters:
r
- row index starting at 0- Returns:
- sorted array of column indexes
-
values
public double[] values(int r)
Description copied from class:SparseBlock
Get the array of all non-zero entries in row r, sorted by their column indexes. Note that - for flexibility of the implementing format - the returned array may be larger, where the range for row r is given by [pos(r),pos(r)+size(r)).- Specified by:
values
in classSparseBlock
- Parameters:
r
- row index starting at 0- Returns:
- array of all non-zero entries in row r sorted by column indexes
-
pos
public int pos(int r)
Description copied from class:SparseBlock
Get the starting position of row r in the indexes/values arrays returned by indexes(r) and values(r).- Specified by:
pos
in classSparseBlock
- Parameters:
r
- row index starting at 0- Returns:
- starting position of row r
-
set
public boolean set(int r, int c, double v)
Description copied from class:SparseBlock
Set the value of a matrix cell (r,c). This might update an existing non-zero value, insert a new non-zero value, or delete a non-zero value.- Specified by:
set
in classSparseBlock
- Parameters:
r
- row index starting at 0c
- column index starting at 0v
- zero or non-zero value- Returns:
- true, if number of non-zeros changed
-
set
public void set(int r, SparseRow row, boolean deep)
Description copied from class:SparseBlock
Set the values of row r to the given sparse row. This might update existing non-zero values, insert a new row, or delete a row. NOTE: This method exists for incremental runtime integration and might be deleted in the future.- Specified by:
set
in classSparseBlock
- Parameters:
r
- row index starting at 0row
- sparse rowdeep
- indicator to create deep copy of sparse row
-
add
public boolean add(int r, int c, double v)
Description copied from class:SparseBlock
Add a value to a matrix cell (r,c). This might update an existing non-zero value, or insert a new non-zero value.- Specified by:
add
in classSparseBlock
- Parameters:
r
- row index starting at 0c
- column index starting at 0v
- zero or non-zero value- Returns:
- true, if number of non-zeros changed
-
append
public void append(int r, int c, double v)
Description copied from class:SparseBlock
Append a value to the end of the physical representation. This should only be used for operations with sequential write pattern or if followed by a sort() operation. Note that this operation does not perform any matrix cell updates.- Specified by:
append
in classSparseBlock
- Parameters:
r
- row index starting at 0c
- column index starting at 0v
- zero or non-zero value
-
setIndexRange
public void setIndexRange(int r, int cl, int cu, double[] v, int vix, int vlen)
Description copied from class:SparseBlock
Sets a dense array of non-zeros values into the column range [cl,cu) in row r. The passed value array may be larger and the relevant range is given by [vix,vix+len).- Specified by:
setIndexRange
in classSparseBlock
- Parameters:
r
- row index starting at 0cl
- lower column index starting at 0cu
- upper column index starting at 0v
- value arrayvix
- start index in value arrayvlen
- number of relevant values
-
setIndexRange
public void setIndexRange(int r, int cl, int cu, double[] v, int[] vix, int vpos, int vlen)
Description copied from class:SparseBlock
Sets a sparse array of non-zeros values and indexes into the column range [cl,cu) in row r. The passed value array may be larger.- Specified by:
setIndexRange
in classSparseBlock
- Parameters:
r
- row index starting at 0cl
- lower column index starting at 0cu
- upper column index starting at 0v
- value arrayvix
- column index arrayvpos
- start index in value and index arraysvlen
- number of relevant values
-
deleteIndexRange
public void deleteIndexRange(int r, int cl, int cu)
Description copied from class:SparseBlock
Deletes all non-zero values of the given column range [cl,cu) in row r.- Specified by:
deleteIndexRange
in classSparseBlock
- Parameters:
r
- row index starting at 0cl
- lower column index starting at 0cu
- upper column index starting at 0
-
sort
public void sort()
Description copied from class:SparseBlock
Sort all non-zero value/index pairs of the sparse block by row and column index.- Specified by:
sort
in classSparseBlock
-
sort
public void sort(int r)
Description copied from class:SparseBlock
Sort all non-zero value/index pairs of row r column index.- Specified by:
sort
in classSparseBlock
- Parameters:
r
- row index starting at 0
-
get
public double get(int r, int c)
Description copied from class:SparseBlock
Get value of matrix cell (r,c). In case of non existing values this call returns 0.- Specified by:
get
in classSparseBlock
- Parameters:
r
- row index starting at 0c
- column index starting at 0- Returns:
- value of cell at position (r,c)
-
get
public SparseRow get(int r)
Description copied from class:SparseBlock
Get values of row r in the format of a sparse row. NOTE: This method exists for incremental runtime integration and might be deleted in the future.- Specified by:
get
in classSparseBlock
- Parameters:
r
- row index starting at 0- Returns:
- values of row r as a sparse row
-
posFIndexLTE
public int posFIndexLTE(int r, int c)
Description copied from class:SparseBlock
Get position of first column index lower than or equal column c in row r. The position is relative to the indexes/values arrays returned by indexes(r) and values(r). If no such value exists, this call returns -1.- Specified by:
posFIndexLTE
in classSparseBlock
- Parameters:
r
- row index starting at 0c
- column index starting at 0- Returns:
- position of the first column index lower than or equal to column c in row r
-
posFIndexGTE
public int posFIndexGTE(int r, int c)
Description copied from class:SparseBlock
Get position of first column index greater than or equal column c in row r. The position is relative to the indexes/values arrays returned by indexes(r) and values(r). If no such value exists, this call returns -1. Note if CSR the pos(r) is subtracted from the result.- Specified by:
posFIndexGTE
in classSparseBlock
- Parameters:
r
- row index starting at 0c
- column index starting at 0- Returns:
- position of the first column index greater than or equal to column c in row r
-
posFIndexGT
public int posFIndexGT(int r, int c)
Description copied from class:SparseBlock
Get position of first column index greater than column c in row r. The position is relative to the indexes/values arrays returned by indexes(r) and values(r). If no such value exists, this call returns -1.- Specified by:
posFIndexGT
in classSparseBlock
- Parameters:
r
- row index starting at 0c
- column index starting at 0- Returns:
- position of the first column index greater than column c in row r
-
getIterator
public Iterator<IJV> getIterator()
Description copied from class:SparseBlock
Get a non-zero iterator over the entire sparse block. Note that the returned IJV object is reused across next calls and should be directly consumed or deep copied.- Overrides:
getIterator
in classSparseBlock
- Returns:
- IJV iterator
-
getIterator
public Iterator<IJV> getIterator(int ru)
Description copied from class:SparseBlock
Get a non-zero iterator over the partial sparse block [0,ru). Note that the returned IJV object is reused across next calls and should be directly consumed or deep copied.- Overrides:
getIterator
in classSparseBlock
- Parameters:
ru
- exclusive upper row index starting at 0- Returns:
- IJV iterator
-
getIterator
public Iterator<IJV> getIterator(int rl, int ru)
Description copied from class:SparseBlock
Get a non-zero iterator over the subblock [rl, ru). Note that the returned IJV object is reused across next calls and should be directly consumed or deep copied.- Overrides:
getIterator
in classSparseBlock
- Parameters:
rl
- inclusive lower row index starting at 0ru
- exclusive upper row index starting at 0- Returns:
- IJV iterator
-
toString
public String toString()
- Specified by:
toString
in classSparseBlock
-
rowIndexes
public int[] rowIndexes()
Get raw access to underlying array of row indices For use in GPU code- Returns:
- array of row indices
-
indexes
public int[] indexes()
Get raw access to underlying array of column indices For use in GPU code- Returns:
- array of column indices
-
values
public double[] values()
Get raw access to underlying array of values For use in GPU code- Returns:
- array of values
-
-