Package org.apache.sysds.runtime.data
Class SparseBlockCSC
- java.lang.Object
 - 
- org.apache.sysds.runtime.data.SparseBlock
 - 
- org.apache.sysds.runtime.data.SparseBlockCSC
 
 
 
- 
- All Implemented Interfaces:
 Serializable,Block
public class SparseBlockCSC extends SparseBlock
SparseBlock implementation that realizes a traditional 'compressed sparse column' representation, where the entire sparse block is stored as three arrays: ptr of length clen+1 to store offsets per column, and indexes/values of length nnz to store row indexes and values of non-zero entries. This format is very memory efficient for sparse (but not ultra-sparse) matrices and provides very good performance for common operations, partially due to lower memory bandwidth requirements. However, this format is slow on incremental construction (because it does not allow append/sort per row) without reshifting. For these types of operations, the 'modified compressed sparse row/column' representations are better suited since they are update friendlier. Further, this representation implements both the row oriented and the column oriented one. By implementing the row-oriented API, we ensure interoperability with existing operators. Finally, the total nnz is limited to INTEGER_MAX, whereas for SparseBlockMCSR only the nnz per row are limited to INTEGER_MAX. TODO performance - currently this sparse block is very slow and this slows down even the component tests- See Also:
 - Serialized Form
 
 
- 
- 
Nested Class Summary
Nested Classes Modifier and Type Class Description classSparseBlockCSC.NonEmptyColumnsIteratorCSCclassSparseBlockCSC.NonEmptyRowsIteratorCSC- 
Nested classes/interfaces inherited from class org.apache.sysds.runtime.data.SparseBlock
SparseBlock.Type 
 - 
 
- 
Constructor Summary
Constructors Constructor Description SparseBlockCSC(int[] rowPtr, int[] rowInd, double[] values, int nnz)SparseBlockCSC(int rlen, int clen)SparseBlockCSC(int cols, int[] rowInd, int[] colInd, double[] values)SparseBlockCSC(int clen, int capacity, int size)SparseBlockCSC(int cols, int nnz, int[] rowInd)SparseBlockCSC(SparseBlock sblock)SparseBlockCSC(SparseBlock sblock, int clen)SparseBlockCSC(SparseRow[] cols, int nnz) 
- 
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanadd(int r, int c, double v)Add a value to a matrix cell (r,c).voidallocate(int r)Allocate the underlying data structure holding non-zero values of row r if necessary.voidallocate(int r, int nnz)Allocate the underlying data structure holding non-zero values of row r if necessary, w/ given size.voidallocate(int r, int ennz, int maxnnz)Allocate the underlying data structure holding non-zero values of row r w/ the specified estimated nnz and max nnz.voidappend(int r, int c, double v)Append a value to the end of the physical representation.booleancheckValidity(int rlen, int clen, long nnz, boolean strict)Validate the correctness of the internal data structures of the different sparse block implementations.int[]colPointers()Get raw access to underlying array of column pointers For use in GPU codevoidcompact(int r)Re-allocate physical row if physical size exceeds logical size plus resize factor.voiddeleteIndexRange(int r, int cl, int cu)Deletes all non-zero values of the given column range [cl,cu) in row r.voiddeleteIndexRangeCol(int c, int rl, int ru)static longestimateSizeInMemory(long nrows, long ncols, double sparsity)Get the estimated in-memory size of the sparse block in CSC with the given dimensions w/o accounting for overallocation.SparseRowget(int r)Get values of row r in the format of a sparse row.doubleget(int r, int c)Get value of matrix cell (r,c).SparseRowgetCol(int c)longgetExactSizeInMemory()Computes the exact size in memory of the materialized blockIterator<Integer>getNonEmptyColumnsIterator(int cl, int cu)Iterator<Integer>getNonEmptyRowsIterator(int rl, int ru)int[]indexes()Get raw access to underlying array of row indices For use in GPU codeint[]indexes(int r)Get the sorted array of column indexes of all non-zero entries in row r.int[]indexesCol(int c)voidinitSparse(int clen, int nnz, DataInput in)Initializes the CSC sparse block from an ordered input stream of sparse columns (colnnz, iv-pairs*).voidinitUltraSparse(int nnz, DataInput in)Initializes the CSC sparse block from an ordered input stream of ultra-sparse ijv triples.booleanisAllocated(int r)Indicates if the underlying data structure for a given row is already allocated.booleanisContiguous()Indicates if the underlying data structures returned by values and indexes are contiguous arrays, which can be exploited for more efficient operations.booleanisEmpty(int r)Get information if row r is empty, i.e., does not contain non-zero values.booleanisEmptyCol(int c)booleanisThreadSafe()Indicates if the underlying implementation allows thread-safe row updates if concurrent threads update disjoint rows.intnumCols()Get the number of columns in the CSC blockintnumRows()Get the number of rows in the sparse block.intpos(int r)Get the starting position of row r in the indexes/values arrays returned by indexes(r) and values(r).intposCol(int c)intposFIndexGT(int r, int c)Get position of first column index greater than column c in row r.intposFIndexGTCol(int r, int c)intposFIndexGTE(int r, int c)Get position of first column index greater than or equal column c in row r.intposFIndexGTECol(int r, int c)intposFIndexLTE(int r, int c)Get position of first column index lower than or equal column c in row r.intposFIndexLTECol(int r, int c)voidreset()Clears the sparse block by deleting non-zero values.voidreset(int ennz, int maxnnz)Clears the sparse block by deleting non-zero values.voidreset(int r, int ennz, int maxnnz)Clears row r of the sparse block by deleting non-zero values.voidresetCol(int c)booleanset(int r, int c, double v)Set the value of a matrix cell (r,c).voidset(int r, SparseRow row, boolean deep)Set the values of row r to the given sparse row.voidsetCol(int c, SparseRow col, boolean deep)voidsetIndexRange(int r, int cl, int cu, double[] v, int[] vix, int vpos, int vlen)Sets a sparse array of non-zeros values and indexes into the column range [cl,cu) in row r.voidsetIndexRange(int r, int cl, int cu, double[] v, int vix, int vlen)Sets a dense array of non-zeros values into the column range [cl,cu) in row r.voidsetIndexRangeCol(int c, int rl, int ru, double[] v, int[] vix, int vpos, int vlen)voidsetIndexRangeCol(int c, int rl, int ru, double[] v, int vix, int vlen)longsize()Get the number of non-zero values in the sparse block.intsize(int r)Get the number of non-zero values in row r.longsize(int rl, int ru)Get the number of non-zeros values in the row range of [rl, ru).longsize(int rl, int ru, int cl, int cu)Get the number of non-zeros values in the row and column range of [rl/cl, ru/cu);intsizeCol(int c)longsizeCol(int cl, int cu)voidsort()Sort all non-zero value/index pairs of the sparse block by row and column index.voidsort(int r)Sort all non-zero value/index pairs of row r column index.voidsortCol(int c)StringtoString()double[]values()Get raw access to underlying array of values For use in GPU codedouble[]values(int r)Get the array of all non-zero entries in row r, sorted by their column indexes.double[]valuesCol(int c)- 
Methods inherited from class org.apache.sysds.runtime.data.SparseBlock
contains, contains, equals, equals, equals, equals, getIterator, getIterator, getIterator, getIterator, getNonEmptyRows, getNonEmptyRows, isAligned, isAligned 
 - 
 
 - 
 
- 
- 
Constructor Detail
- 
SparseBlockCSC
public SparseBlockCSC(int rlen, int clen) 
- 
SparseBlockCSC
public SparseBlockCSC(int clen, int capacity, int size) 
- 
SparseBlockCSC
public SparseBlockCSC(int[] rowPtr, int[] rowInd, double[] values, int nnz) 
- 
SparseBlockCSC
public SparseBlockCSC(SparseBlock sblock, int clen)
 
- 
SparseBlockCSC
public SparseBlockCSC(SparseBlock sblock)
 
- 
SparseBlockCSC
public SparseBlockCSC(SparseRow[] cols, int nnz)
 
- 
SparseBlockCSC
public SparseBlockCSC(int cols, int[] rowInd, int[] colInd, double[] values) 
- 
SparseBlockCSC
public SparseBlockCSC(int cols, int nnz, int[] rowInd) 
 - 
 
- 
Method Detail
- 
initUltraSparse
public void initUltraSparse(int nnz, DataInput in) throws IOExceptionInitializes the CSC sparse block from an ordered input stream of ultra-sparse ijv triples.- Parameters:
 nnz- number of non-zeros to readin- data input stream of ijv triples, ordered by column and row indices- Throws:
 IOException- if deserialization error occurs
 
- 
initSparse
public void initSparse(int clen, int nnz, DataInput in) throws IOExceptionInitializes the CSC sparse block from an ordered input stream of sparse columns (colnnz, iv-pairs*).- Parameters:
 clen- number of columnsnnz- number of non-zeros to readin- data input stream of sparse columns, ordered by column index- Throws:
 IOException- if deserialization error occurs
 
- 
estimateSizeInMemory
public static long estimateSizeInMemory(long nrows, long ncols, double sparsity)Get the estimated in-memory size of the sparse block in CSC with the given dimensions w/o accounting for overallocation.- Parameters:
 nrows- number of rowsncols- number of columnssparsity- sparsity ratio- Returns:
 - memory estimate
 
 
- 
colPointers
public int[] colPointers()
Get raw access to underlying array of column pointers For use in GPU code- Returns:
 - array of column pointers
 
 
- 
indexes
public int[] indexes()
Get raw access to underlying array of row indices For use in GPU code- Returns:
 - array of row indexes
 
 
- 
values
public double[] values()
Get raw access to underlying array of values For use in GPU code- Returns:
 - array of values
 
 
- 
allocate
public void allocate(int r)
Description copied from class:SparseBlockAllocate the underlying data structure holding non-zero values of row r if necessary.- Specified by:
 allocatein classSparseBlock- Parameters:
 r- row index
 
- 
allocate
public void allocate(int r, int nnz)Description copied from class:SparseBlockAllocate the underlying data structure holding non-zero values of row r if necessary, w/ given size.- Specified by:
 allocatein classSparseBlock- Parameters:
 r- row indexnnz- number of non-zeros
 
- 
allocate
public void allocate(int r, int ennz, int maxnnz)Description copied from class:SparseBlockAllocate the underlying data structure holding non-zero values of row r w/ the specified estimated nnz and max nnz.- Specified by:
 allocatein classSparseBlock- Parameters:
 r- row indexennz- estimated non-zerosmaxnnz- max non-zeros
 
- 
compact
public void compact(int r)
Description copied from class:SparseBlockRe-allocate physical row if physical size exceeds logical size plus resize factor.- Specified by:
 compactin classSparseBlock- Parameters:
 r- row index
 
- 
numRows
public int numRows()
Description copied from class:SparseBlockGet the number of rows in the sparse block.- Specified by:
 numRowsin classSparseBlock- Returns:
 - number of rows
 
 
- 
numCols
public int numCols()
Get the number of columns in the CSC block- Returns:
 - number of columns
 
 
- 
isThreadSafe
public boolean isThreadSafe()
Description copied from class:SparseBlockIndicates if the underlying implementation allows thread-safe row updates if concurrent threads update disjoint rows.- Specified by:
 isThreadSafein classSparseBlock- Returns:
 - true if thread-safe row updates
 
 
- 
isContiguous
public boolean isContiguous()
Description copied from class:SparseBlockIndicates if the underlying data structures returned by values and indexes are contiguous arrays, which can be exploited for more efficient operations.- Specified by:
 isContiguousin classSparseBlock- Returns:
 - true if underlying data structures are contiguous arrays
 
 
- 
isAllocated
public boolean isAllocated(int r)
Description copied from class:SparseBlockIndicates if the underlying data structure for a given row is already allocated.- Specified by:
 isAllocatedin classSparseBlock- Parameters:
 r- row index- Returns:
 - true if already allocated
 
 
- 
reset
public void reset()
Description copied from class:SparseBlockClears the sparse block by deleting non-zero values. After this call all size() calls are guaranteed to return 0.- Specified by:
 resetin classSparseBlock
 
- 
reset
public void reset(int ennz, int maxnnz)Description copied from class:SparseBlockClears the sparse block by deleting non-zero values. After this call all size() calls are guaranteed to return 0.- Specified by:
 resetin classSparseBlock- Parameters:
 ennz- estimated non-zerosmaxnnz- max non-zeros
 
- 
reset
public void reset(int r, int ennz, int maxnnz)Description copied from class:SparseBlockClears row r of the sparse block by deleting non-zero values. After this call size(r) is guaranteed to return 0.- Specified by:
 resetin classSparseBlock- Parameters:
 r- row indexennz- estimated non-zerosmaxnnz- max non-zeros
 
- 
resetCol
public void resetCol(int c)
 
- 
size
public long size()
Description copied from class:SparseBlockGet the number of non-zero values in the sparse block.- Specified by:
 sizein classSparseBlock- Returns:
 - number of non-zero values in sparse block
 
 
- 
size
public int size(int r)
Description copied from class:SparseBlockGet the number of non-zero values in row r.- Specified by:
 sizein classSparseBlock- Parameters:
 r- row index starting at 0- Returns:
 - number of non-zero values in row r
 
 
- 
sizeCol
public int sizeCol(int c)
 
- 
size
public long size(int rl, int ru)Description copied from class:SparseBlockGet the number of non-zeros values in the row range of [rl, ru).- Specified by:
 sizein classSparseBlock- Parameters:
 rl- row lower indexru- row upper index- Returns:
 - number of non-zero values in the row range
 
 
- 
sizeCol
public long sizeCol(int cl, int cu) 
- 
size
public long size(int rl, int ru, int cl, int cu)Description copied from class:SparseBlockGet the number of non-zeros values in the row and column range of [rl/cl, ru/cu);- Specified by:
 sizein classSparseBlock- Parameters:
 rl- row lower indexru- row upper indexcl- column lower indexcu- column upper index- Returns:
 - number of non-zero values in the row and column range
 
 
- 
isEmpty
public boolean isEmpty(int r)
Description copied from class:SparseBlockGet information if row r is empty, i.e., does not contain non-zero values. Equivalent to size(r)==0. Users should do this check if it is unknown if the underlying row data structure is allocated.- Specified by:
 isEmptyin classSparseBlock- Parameters:
 r- row index starting at 0- Returns:
 - true if row does not contain non-zero values
 
 
- 
isEmptyCol
public boolean isEmptyCol(int c)
 
- 
checkValidity
public boolean checkValidity(int rlen, int clen, long nnz, boolean strict)Description copied from class:SparseBlockValidate the correctness of the internal data structures of the different sparse block implementations.- Specified by:
 checkValidityin classSparseBlock- Parameters:
 rlen- number of rowsclen- number of columnsnnz- number of non zerosstrict- enforce optional properties- Returns:
 - true if the sparse block is valid wrt the corresponding format such as COO, CSR, MCSR.
 
 
- 
getExactSizeInMemory
public long getExactSizeInMemory()
Description copied from class:SparseBlockComputes the exact size in memory of the materialized block- Specified by:
 getExactSizeInMemoryin classSparseBlock- Returns:
 - the exact size in memory
 
 
- 
indexes
public int[] indexes(int r)
Description copied from class:SparseBlockGet the sorted array of column indexes of all non-zero entries in row r. Note that - for flexibility of the implementing format - the returned array may be larger, where the range for row r is given by [pos(r),pos(r)+size(r)).- Specified by:
 indexesin classSparseBlock- Parameters:
 r- row index starting at 0- Returns:
 - sorted array of column indexes
 
 
- 
indexesCol
public int[] indexesCol(int c)
 
- 
values
public double[] values(int r)
Description copied from class:SparseBlockGet the array of all non-zero entries in row r, sorted by their column indexes. Note that - for flexibility of the implementing format - the returned array may be larger, where the range for row r is given by [pos(r),pos(r)+size(r)).- Specified by:
 valuesin classSparseBlock- Parameters:
 r- row index starting at 0- Returns:
 - array of all non-zero entries in row r sorted by column indexes
 
 
- 
valuesCol
public double[] valuesCol(int c)
 
- 
pos
public int pos(int r)
Description copied from class:SparseBlockGet the starting position of row r in the indexes/values arrays returned by indexes(r) and values(r).- Specified by:
 posin classSparseBlock- Parameters:
 r- row index starting at 0- Returns:
 - starting position of row r
 
 
- 
posCol
public int posCol(int c)
 
- 
set
public boolean set(int r, int c, double v)Description copied from class:SparseBlockSet the value of a matrix cell (r,c). This might update an existing non-zero value, insert a new non-zero value, or delete a non-zero value.- Specified by:
 setin classSparseBlock- Parameters:
 r- row index starting at 0c- column index starting at 0v- zero or non-zero value- Returns:
 - true, if number of non-zeros changed
 
 
- 
set
public void set(int r, SparseRow row, boolean deep)Description copied from class:SparseBlockSet the values of row r to the given sparse row. This might update existing non-zero values, insert a new row, or delete a row. NOTE: This method exists for incremental runtime integration and might be deleted in the future.- Specified by:
 setin classSparseBlock- Parameters:
 r- row index starting at 0row- sparse rowdeep- indicator to create deep copy of sparse row
 
- 
setCol
public void setCol(int c, SparseRow col, boolean deep) 
- 
add
public boolean add(int r, int c, double v)Description copied from class:SparseBlockAdd a value to a matrix cell (r,c). This might update an existing non-zero value, or insert a new non-zero value.- Specified by:
 addin classSparseBlock- Parameters:
 r- row index starting at 0c- column index starting at 0v- zero or non-zero value- Returns:
 - true, if number of non-zeros changed
 
 
- 
append
public void append(int r, int c, double v)Description copied from class:SparseBlockAppend a value to the end of the physical representation. This should only be used for operations with sequential write pattern or if followed by a sort() operation. Note that this operation does not perform any matrix cell updates.- Specified by:
 appendin classSparseBlock- Parameters:
 r- row index starting at 0c- column index starting at 0v- zero or non-zero value
 
- 
setIndexRange
public void setIndexRange(int r, int cl, int cu, double[] v, int vix, int vlen)Description copied from class:SparseBlockSets a dense array of non-zeros values into the column range [cl,cu) in row r. The passed value array may be larger and the relevant range is given by [vix,vix+len).- Specified by:
 setIndexRangein classSparseBlock- Parameters:
 r- row index starting at 0cl- lower column index starting at 0cu- upper column index starting at 0v- value arrayvix- start index in value arrayvlen- number of relevant values
 
- 
setIndexRangeCol
public void setIndexRangeCol(int c, int rl, int ru, double[] v, int vix, int vlen) 
- 
setIndexRange
public void setIndexRange(int r, int cl, int cu, double[] v, int[] vix, int vpos, int vlen)Description copied from class:SparseBlockSets a sparse array of non-zeros values and indexes into the column range [cl,cu) in row r. The passed value array may be larger.- Specified by:
 setIndexRangein classSparseBlock- Parameters:
 r- row index starting at 0cl- lower column index starting at 0cu- upper column index starting at 0v- value arrayvix- column index arrayvpos- start index in value and index arraysvlen- number of relevant values
 
- 
setIndexRangeCol
public void setIndexRangeCol(int c, int rl, int ru, double[] v, int[] vix, int vpos, int vlen) 
- 
deleteIndexRange
public void deleteIndexRange(int r, int cl, int cu)Description copied from class:SparseBlockDeletes all non-zero values of the given column range [cl,cu) in row r.- Specified by:
 deleteIndexRangein classSparseBlock- Parameters:
 r- row index starting at 0cl- lower column index starting at 0cu- upper column index starting at 0
 
- 
deleteIndexRangeCol
public void deleteIndexRangeCol(int c, int rl, int ru) 
- 
sort
public void sort()
Description copied from class:SparseBlockSort all non-zero value/index pairs of the sparse block by row and column index.- Specified by:
 sortin classSparseBlock
 
- 
sort
public void sort(int r)
Description copied from class:SparseBlockSort all non-zero value/index pairs of row r column index.- Specified by:
 sortin classSparseBlock- Parameters:
 r- row index starting at 0
 
- 
sortCol
public void sortCol(int c)
 
- 
get
public double get(int r, int c)Description copied from class:SparseBlockGet value of matrix cell (r,c). In case of non existing values this call returns 0.- Specified by:
 getin interfaceBlock- Specified by:
 getin classSparseBlock- Parameters:
 r- row index starting at 0c- column index starting at 0- Returns:
 - value of cell at position (r,c)
 
 
- 
get
public SparseRow get(int r)
Description copied from class:SparseBlockGet values of row r in the format of a sparse row.- Specified by:
 getin classSparseBlock- Parameters:
 r- row index starting at 0- Returns:
 - values of row r as a sparse row
 
 
- 
getCol
public SparseRow getCol(int c)
 
- 
posFIndexLTE
public int posFIndexLTE(int r, int c)Description copied from class:SparseBlockGet position of first column index lower than or equal column c in row r. The position is relative to the indexes/values arrays returned by indexes(r) and values(r). If no such value exists, this call returns -1.- Specified by:
 posFIndexLTEin classSparseBlock- Parameters:
 r- row index starting at 0c- column index starting at 0- Returns:
 - position of the first column index lower than or equal to column c in row r
 
 
- 
posFIndexLTECol
public int posFIndexLTECol(int r, int c) 
- 
posFIndexGTE
public int posFIndexGTE(int r, int c)Description copied from class:SparseBlockGet position of first column index greater than or equal column c in row r. The position is relative to the indexes/values arrays returned by indexes(r) and values(r). If no such value exists, this call returns -1. Note if CSR the pos(r) is subtracted from the result.- Specified by:
 posFIndexGTEin classSparseBlock- Parameters:
 r- row index starting at 0c- column index starting at 0- Returns:
 - position of the first column index greater than or equal to column c in row r
 
 
- 
posFIndexGTECol
public int posFIndexGTECol(int r, int c) 
- 
posFIndexGT
public int posFIndexGT(int r, int c)Description copied from class:SparseBlockGet position of first column index greater than column c in row r. The position is relative to the indexes/values arrays returned by indexes(r) and values(r). If no such value exists, this call returns -1.- Specified by:
 posFIndexGTin classSparseBlock- Parameters:
 r- row index starting at 0c- column index starting at 0- Returns:
 - position of the first column index greater than column c in row r
 
 
- 
posFIndexGTCol
public int posFIndexGTCol(int r, int c) 
- 
toString
public String toString()
- Specified by:
 toStringin classSparseBlock
 
- 
getNonEmptyRowsIterator
public Iterator<Integer> getNonEmptyRowsIterator(int rl, int ru)
- Specified by:
 getNonEmptyRowsIteratorin classSparseBlock
 
 - 
 
 -