Class CacheableData<T extends CacheBlock<?>>
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.cp.Data
-
- org.apache.sysds.runtime.controlprogram.caching.CacheableData<T>
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
FrameObject
,MatrixObject
,TensorObject
public abstract class CacheableData<T extends CacheBlock<?>> extends Data
Each object of this class is a cache envelope for some large piece of data called "cache block". For example, the body of a matrix can be the cache block. The term cache block refers strictly to the cacheable portion of the data object, often excluding metadata and auxiliary parameters, as defined in the subclasses. Under the protection of the envelope, the data blob may be evicted to the file system; then the subclass must set its reference tonull
to allow Java garbage collection. If other parts of the system continue keep references to the cache block, its eviction will not release any memory.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CacheableData.CacheStatus
Defines all possible cache status types for a data blob.
-
Field Summary
Fields Modifier and Type Field Description static String
cacheEvictionLocalFilePath
static String
cacheEvictionLocalFilePrefix
static boolean
CACHING_ASYNC_FILECLEANUP
static boolean
CACHING_ASYNC_SERIALIZE
static boolean
CACHING_BUFFER_PAGECACHE
static LazyWriteBuffer.RPolicy
CACHING_BUFFER_POLICY
static String
CACHING_COUNTER_GROUP_NAME
static String
CACHING_EVICTION_FILEEXTENSION
static long
CACHING_THRESHOLD
static boolean
CACHING_WRITE_CACHE_ON_READ
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description T
acquireModify(T newData)
Acquires the exclusive "write" lock for a thread that wants to throw away the old cache block data and link up with new cache block data.T
acquireRead()
Acquires a shared "read-only" lock, produces the reference to the cache block, restores the cache block to main memory, reads from HDFS if needed.T
acquireReadAndRelease()
static void
addBroadcastSize(long size)
static void
cleanupCacheDir()
static void
cleanupCacheDir(boolean withDir)
Deletes the DML-script-specific caching working dir.void
clearData()
void
clearData(long tid)
Sets the cache block reference tonull
, abandons the old block.static void
disableCaching()
static void
enableCaching()
void
enableCleanup(boolean flag)
Enables or disables the cleanup of the associated data object on clearData().void
exportData()
void
exportData(int replication)
Writes, or flushes, the cache block data to HDFS.void
exportData(String fName, String outputFormat)
void
exportData(String fName, String outputFormat, int replication, FileFormatProperties formatProperties)
Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop).void
exportData(String fName, String outputFormat, FileFormatProperties formatProperties)
void
freeEvictedBlob()
Low-level cache I/O method that deletes the file containing the evicted data blob, without reading it.int
getBlocksize()
BroadcastObject<T>
getBroadcastHandle()
static long
getBroadcastSize()
LineageItem
getCacheLineage()
long
getCompressedSize()
DataCharacteristics
getDataCharacteristics()
long
getDataSize()
String
getDebugName()
long
getDim(int dim)
FederationMap
getFedMapping()
Gets the mapping of indices ranges to federated objects.FileFormatProperties
getFileFormatProperties()
String
getFileName()
GPUObject
getGPUObject(GPUContext gCtx)
MetaData
getMetaData()
long
getNumColumns()
long
getNumRows()
RDDObject
getRDDHandle()
CacheableData.CacheStatus
getStatus()
long
getUniqueID()
boolean
hasBroadcastHandle()
boolean
hasRDDHandle()
boolean
hasValidLineage()
static void
initCaching()
Inits caching with the default uuid of DMLScriptstatic void
initCaching(String uuid)
Creates the DML-script-specific caching working dir.static boolean
isBelowCachingThreshold(CacheBlock<?> data)
boolean
isCached(boolean inclCachedNoWrite)
static boolean
isCachingActive()
boolean
isCleanupEnabled()
Indicates if cleanup of the associated data object is enabled on clearData().boolean
isCompressed()
boolean
isDirty()
true
if the in-memory or evicted matrix may be different from the matrix located at_hdfsFileName
;false
if the two matrices are supposed to be the same.boolean
isFederated()
Check if object is federated.boolean
isFederated(FTypes.FType type)
boolean
isFederatedExcept(FTypes.FType type)
boolean
isHDFSFileExists()
boolean
isPendingRDDOps()
boolean
moveData(String fName, String outputFormat)
abstract void
refreshMetaData()
void
release()
Releases the shared ("read-only") or exclusive ("write") lock.void
removeGPUObject(GPUContext gCtx)
void
removeMetaData()
void
setBroadcastHandle(BroadcastObject bc)
void
setCacheLineage(LineageItem li)
void
setCompressedSize(long size)
void
setDirty(boolean flag)
void
setEmptyStatus()
void
setFedMapping(FederationMap fedMapping)
Sets the mapping of indices ranges to federated objects.void
setFileFormatProperties(FileFormatProperties props)
void
setFileName(String file)
void
setGPUObject(GPUContext gCtx, GPUObject gObj)
void
setHDFSFileExists(boolean flag)
void
setMetaData(MetaData md)
void
setRDDHandle(RDDObject rdd)
String
toString()
-
Methods inherited from class org.apache.sysds.runtime.instructions.cp.Data
getDataType, getPrivacyConstraint, getValueType, setPrivacyConstraints, updateDataCharacteristics
-
-
-
-
Field Detail
-
CACHING_THRESHOLD
public static final long CACHING_THRESHOLD
-
CACHING_BUFFER_POLICY
public static final LazyWriteBuffer.RPolicy CACHING_BUFFER_POLICY
-
CACHING_BUFFER_PAGECACHE
public static final boolean CACHING_BUFFER_PAGECACHE
- See Also:
- Constant Field Values
-
CACHING_WRITE_CACHE_ON_READ
public static final boolean CACHING_WRITE_CACHE_ON_READ
- See Also:
- Constant Field Values
-
CACHING_COUNTER_GROUP_NAME
public static final String CACHING_COUNTER_GROUP_NAME
- See Also:
- Constant Field Values
-
CACHING_EVICTION_FILEEXTENSION
public static final String CACHING_EVICTION_FILEEXTENSION
- See Also:
- Constant Field Values
-
CACHING_ASYNC_FILECLEANUP
public static final boolean CACHING_ASYNC_FILECLEANUP
- See Also:
- Constant Field Values
-
CACHING_ASYNC_SERIALIZE
public static final boolean CACHING_ASYNC_SERIALIZE
- See Also:
- Constant Field Values
-
cacheEvictionLocalFilePath
public static String cacheEvictionLocalFilePath
-
cacheEvictionLocalFilePrefix
public static String cacheEvictionLocalFilePrefix
-
-
Method Detail
-
enableCleanup
public void enableCleanup(boolean flag)
Enables or disables the cleanup of the associated data object on clearData().- Parameters:
flag
- true if cleanup
-
isCleanupEnabled
public boolean isCleanupEnabled()
Indicates if cleanup of the associated data object is enabled on clearData().- Returns:
- true if cleanup enabled
-
getStatus
public CacheableData.CacheStatus getStatus()
-
isHDFSFileExists
public boolean isHDFSFileExists()
-
setHDFSFileExists
public void setHDFSFileExists(boolean flag)
-
getFileName
public String getFileName()
-
getUniqueID
public long getUniqueID()
-
setFileName
public void setFileName(String file)
-
isDirty
public boolean isDirty()
true
if the in-memory or evicted matrix may be different from the matrix located at_hdfsFileName
;false
if the two matrices are supposed to be the same.- Returns:
- true if dirty
-
setDirty
public void setDirty(boolean flag)
-
getFileFormatProperties
public FileFormatProperties getFileFormatProperties()
-
setFileFormatProperties
public void setFileFormatProperties(FileFormatProperties props)
-
setMetaData
public void setMetaData(MetaData md)
- Overrides:
setMetaData
in classData
-
setCompressedSize
public void setCompressedSize(long size)
-
isCompressed
public boolean isCompressed()
-
getCompressedSize
public long getCompressedSize()
-
getMetaData
public MetaData getMetaData()
- Overrides:
getMetaData
in classData
-
removeMetaData
public void removeMetaData()
- Overrides:
removeMetaData
in classData
-
getDataCharacteristics
public DataCharacteristics getDataCharacteristics()
-
getDim
public long getDim(int dim)
-
getNumRows
public long getNumRows()
-
getNumColumns
public long getNumColumns()
-
getBlocksize
public int getBlocksize()
-
refreshMetaData
public abstract void refreshMetaData()
-
getCacheLineage
public LineageItem getCacheLineage()
-
setCacheLineage
public void setCacheLineage(LineageItem li)
-
hasValidLineage
public boolean hasValidLineage()
-
isFederated
public boolean isFederated()
Check if object is federated.- Returns:
- true if federated else false
-
isFederated
public boolean isFederated(FTypes.FType type)
-
isFederatedExcept
public boolean isFederatedExcept(FTypes.FType type)
-
getFedMapping
public FederationMap getFedMapping()
Gets the mapping of indices ranges to federated objects.- Returns:
- fedMapping mapping
-
setFedMapping
public void setFedMapping(FederationMap fedMapping)
Sets the mapping of indices ranges to federated objects.- Parameters:
fedMapping
- mapping
-
getRDDHandle
public RDDObject getRDDHandle()
-
setRDDHandle
public void setRDDHandle(RDDObject rdd)
-
hasRDDHandle
public boolean hasRDDHandle()
-
getBroadcastHandle
public BroadcastObject<T> getBroadcastHandle()
-
hasBroadcastHandle
public boolean hasBroadcastHandle()
-
setBroadcastHandle
public void setBroadcastHandle(BroadcastObject bc)
-
getGPUObject
public GPUObject getGPUObject(GPUContext gCtx)
-
setGPUObject
public void setGPUObject(GPUContext gCtx, GPUObject gObj)
-
removeGPUObject
public void removeGPUObject(GPUContext gCtx)
-
acquireReadAndRelease
public T acquireReadAndRelease()
-
acquireRead
public T acquireRead()
Acquires a shared "read-only" lock, produces the reference to the cache block, restores the cache block to main memory, reads from HDFS if needed. Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). In-Status: EMPTY, EVICTABLE, EVICTED, READ; Out-Status: READ(+1).- Returns:
- cacheable data
-
acquireModify
public T acquireModify(T newData)
Acquires the exclusive "write" lock for a thread that wants to throw away the old cache block data and link up with new cache block data. Abandons the old data without reading it and sets the new data reference. In-Status: EMPTY, EVICTABLE, EVICTED; Out-Status: MODIFY.- Parameters:
newData
- new data- Returns:
- cacheable data
-
release
public void release()
Releases the shared ("read-only") or exclusive ("write") lock. Updates size information, last-access time, metadata, etc. Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). In-Status: READ, MODIFY; Out-Status: READ(-1), EVICTABLE, EMPTY.
-
clearData
public void clearData()
-
clearData
public void clearData(long tid)
Sets the cache block reference tonull
, abandons the old block. Makes the "envelope" empty. Run it to finalize the object (otherwise the evicted cache block file may remain undeleted). In-Status: EMPTY, EVICTABLE, EVICTED; Out-Status: EMPTY.- Parameters:
tid
- thread ID
-
exportData
public void exportData()
-
exportData
public void exportData(int replication)
Writes, or flushes, the cache block data to HDFS. In-Status: EMPTY, EVICTABLE, EVICTED, READ; Out-Status: EMPTY, EVICTABLE, EVICTED, READ.- Parameters:
replication
- ?
-
exportData
public void exportData(String fName, String outputFormat, FileFormatProperties formatProperties)
-
exportData
public void exportData(String fName, String outputFormat, int replication, FileFormatProperties formatProperties)
Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). If all threads export the same data object concurrently it results in errors because they all write to the same file. Efficiency for loops and parallel threads is achieved by checking if the in-memory block is dirty. NOTE: MB: we do not use dfs copy from local (evicted) to HDFS because this would ignore the output format and most importantly would bypass reblocking during write (which effects the potential degree of parallelism). However, we copy files on HDFS if certain criteria are given.- Parameters:
fName
- file nameoutputFormat
- formatreplication
- ?formatProperties
- file format properties
-
freeEvictedBlob
public final void freeEvictedBlob()
Low-level cache I/O method that deletes the file containing the evicted data blob, without reading it. Must be defined by a subclass, never called by users.
-
isBelowCachingThreshold
public static boolean isBelowCachingThreshold(CacheBlock<?> data)
-
getDataSize
public long getDataSize()
-
getDebugName
public String getDebugName()
- Specified by:
getDebugName
in classData
-
isCached
public boolean isCached(boolean inclCachedNoWrite)
-
setEmptyStatus
public void setEmptyStatus()
-
isPendingRDDOps
public boolean isPendingRDDOps()
-
addBroadcastSize
public static void addBroadcastSize(long size)
-
getBroadcastSize
public static long getBroadcastSize()
-
cleanupCacheDir
public static void cleanupCacheDir()
-
cleanupCacheDir
public static void cleanupCacheDir(boolean withDir)
Deletes the DML-script-specific caching working dir.- Parameters:
withDir
- if true, delete directory
-
initCaching
public static void initCaching() throws IOException
Inits caching with the default uuid of DMLScript- Throws:
IOException
- if IOException occurs
-
initCaching
public static void initCaching(String uuid) throws IOException
Creates the DML-script-specific caching working dir. Takes the UUID in order to allow for custom uuid, e.g., for remote parfor caching- Parameters:
uuid
- ID- Throws:
IOException
- if IOException occurs
-
isCachingActive
public static boolean isCachingActive()
-
disableCaching
public static void disableCaching()
-
enableCaching
public static void enableCaching()
-
-