Class CacheableData<T extends CacheBlock<?>>

  • All Implemented Interfaces:
    Serializable
    Direct Known Subclasses:
    FrameObject, MatrixObject, TensorObject

    public abstract class CacheableData<T extends CacheBlock<?>>
    extends Data
    Each object of this class is a cache envelope for some large piece of data called "cache block". For example, the body of a matrix can be the cache block. The term cache block refers strictly to the cacheable portion of the data object, often excluding metadata and auxiliary parameters, as defined in the subclasses. Under the protection of the envelope, the data blob may be evicted to the file system; then the subclass must set its reference to null to allow Java garbage collection. If other parts of the system continue keep references to the cache block, its eviction will not release any memory.
    See Also:
    Serialized Form
    • Field Detail

      • CACHING_THRESHOLD

        public static final long CACHING_THRESHOLD
      • CACHING_BUFFER_PAGECACHE

        public static final boolean CACHING_BUFFER_PAGECACHE
        See Also:
        Constant Field Values
      • CACHING_WRITE_CACHE_ON_READ

        public static final boolean CACHING_WRITE_CACHE_ON_READ
        See Also:
        Constant Field Values
      • CACHING_ASYNC_FILECLEANUP

        public static final boolean CACHING_ASYNC_FILECLEANUP
        See Also:
        Constant Field Values
      • CACHING_ASYNC_SERIALIZE

        public static final boolean CACHING_ASYNC_SERIALIZE
        See Also:
        Constant Field Values
      • cacheEvictionLocalFilePath

        public static String cacheEvictionLocalFilePath
      • cacheEvictionLocalFilePrefix

        public static String cacheEvictionLocalFilePrefix
    • Method Detail

      • enableCleanup

        public void enableCleanup​(boolean flag)
        Enables or disables the cleanup of the associated data object on clearData().
        Parameters:
        flag - true if cleanup
      • isCleanupEnabled

        public boolean isCleanupEnabled()
        Indicates if cleanup of the associated data object is enabled on clearData().
        Returns:
        true if cleanup enabled
      • isHDFSFileExists

        public boolean isHDFSFileExists()
      • setHDFSFileExists

        public void setHDFSFileExists​(boolean flag)
      • getFileName

        public String getFileName()
      • getUniqueID

        public long getUniqueID()
      • setFileName

        public void setFileName​(String file)
      • isDirty

        public boolean isDirty()
        true if the in-memory or evicted matrix may be different from the matrix located at _hdfsFileName; false if the two matrices are supposed to be the same.
        Returns:
        true if dirty
      • setDirty

        public void setDirty​(boolean flag)
      • setCompressedSize

        public void setCompressedSize​(long size)
      • isCompressed

        public boolean isCompressed()
      • getCompressedSize

        public long getCompressedSize()
      • getDim

        public long getDim​(int dim)
      • getNumRows

        public long getNumRows()
      • getNumColumns

        public long getNumColumns()
      • getBlocksize

        public int getBlocksize()
      • refreshMetaData

        public abstract void refreshMetaData()
      • getCacheLineage

        public LineageItem getCacheLineage()
      • setCacheLineage

        public void setCacheLineage​(LineageItem li)
      • hasValidLineage

        public boolean hasValidLineage()
      • isFederated

        public boolean isFederated()
        Check if object is federated.
        Returns:
        true if federated else false
      • isFederated

        public boolean isFederated​(FTypes.FType type)
      • isFederatedExcept

        public boolean isFederatedExcept​(FTypes.FType type)
      • getFedMapping

        public FederationMap getFedMapping()
        Gets the mapping of indices ranges to federated objects.
        Returns:
        fedMapping mapping
      • setFedMapping

        public void setFedMapping​(FederationMap fedMapping)
        Sets the mapping of indices ranges to federated objects.
        Parameters:
        fedMapping - mapping
      • getRDDHandle

        public RDDObject getRDDHandle()
      • setRDDHandle

        public void setRDDHandle​(RDDObject rdd)
      • hasRDDHandle

        public boolean hasRDDHandle()
      • hasBroadcastHandle

        public boolean hasBroadcastHandle()
      • setBroadcastHandle

        public void setBroadcastHandle​(BroadcastObject bc)
      • removeGPUObject

        public void removeGPUObject​(GPUContext gCtx)
      • acquireReadAndRelease

        public T acquireReadAndRelease()
      • acquireRead

        public T acquireRead()
        Acquires a shared "read-only" lock, produces the reference to the cache block, restores the cache block to main memory, reads from HDFS if needed. Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). In-Status: EMPTY, EVICTABLE, EVICTED, READ; Out-Status: READ(+1).
        Returns:
        cacheable data
      • acquireModify

        public T acquireModify​(T newData)
        Acquires the exclusive "write" lock for a thread that wants to throw away the old cache block data and link up with new cache block data. Abandons the old data without reading it and sets the new data reference. In-Status: EMPTY, EVICTABLE, EVICTED; Out-Status: MODIFY.
        Parameters:
        newData - new data
        Returns:
        cacheable data
      • release

        public void release()
        Releases the shared ("read-only") or exclusive ("write") lock. Updates size information, last-access time, metadata, etc. Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). In-Status: READ, MODIFY; Out-Status: READ(-1), EVICTABLE, EMPTY.
      • clearData

        public void clearData()
      • clearData

        public void clearData​(long tid)
        Sets the cache block reference to null, abandons the old block. Makes the "envelope" empty. Run it to finalize the object (otherwise the evicted cache block file may remain undeleted). In-Status: EMPTY, EVICTABLE, EVICTED; Out-Status: EMPTY.
        Parameters:
        tid - thread ID
      • exportData

        public void exportData()
      • exportData

        public void exportData​(int replication)
        Writes, or flushes, the cache block data to HDFS. In-Status: EMPTY, EVICTABLE, EVICTED, READ; Out-Status: EMPTY, EVICTABLE, EVICTED, READ.
        Parameters:
        replication - ?
      • exportData

        public void exportData​(String fName,
                               String outputFormat)
      • exportData

        public void exportData​(String fName,
                               String outputFormat,
                               int replication,
                               FileFormatProperties formatProperties)
        Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). If all threads export the same data object concurrently it results in errors because they all write to the same file. Efficiency for loops and parallel threads is achieved by checking if the in-memory block is dirty. NOTE: MB: we do not use dfs copy from local (evicted) to HDFS because this would ignore the output format and most importantly would bypass reblocking during write (which effects the potential degree of parallelism). However, we copy files on HDFS if certain criteria are given.
        Parameters:
        fName - file name
        outputFormat - format
        replication - ?
        formatProperties - file format properties
      • freeEvictedBlob

        public final void freeEvictedBlob()
        Low-level cache I/O method that deletes the file containing the evicted data blob, without reading it. Must be defined by a subclass, never called by users.
      • isBelowCachingThreshold

        public static boolean isBelowCachingThreshold​(CacheBlock<?> data)
      • getDataSize

        public long getDataSize()
      • isCached

        public boolean isCached​(boolean inclCachedNoWrite)
      • setEmptyStatus

        public void setEmptyStatus()
      • isPendingRDDOps

        public boolean isPendingRDDOps()
      • addBroadcastSize

        public static void addBroadcastSize​(long size)
      • getBroadcastSize

        public static long getBroadcastSize()
      • cleanupCacheDir

        public static void cleanupCacheDir()
      • cleanupCacheDir

        public static void cleanupCacheDir​(boolean withDir)
        Deletes the DML-script-specific caching working dir.
        Parameters:
        withDir - if true, delete directory
      • initCaching

        public static void initCaching()
                                throws IOException
        Inits caching with the default uuid of DMLScript
        Throws:
        IOException - if IOException occurs
      • initCaching

        public static void initCaching​(String uuid)
                                throws IOException
        Creates the DML-script-specific caching working dir. Takes the UUID in order to allow for custom uuid, e.g., for remote parfor caching
        Parameters:
        uuid - ID
        Throws:
        IOException - if IOException occurs
      • isCachingActive

        public static boolean isCachingActive()
      • disableCaching

        public static void disableCaching()
      • enableCaching

        public static void enableCaching()
      • moveData

        public boolean moveData​(String fName,
                                String outputFormat)