public class UnifiedMemoryManager
extends Object
Unified Memory Manager - Initial Design
Motivation:
The Unified Memory Manager, henceforth UMM, will act as a central manager of in-memory
matrix (uncompressed and compressed), frame, and tensor blocks within SystemDS control
program. So far, operation memory (70%) and buffer pool memory (15%, LazyWriteBuffer)
are managed independently, which causes unnecessary evictions. New components like the
LineageCache also use and manage statically provisioned memory areas. Ultimately, the
UMM aims to eliminate these shortcomings by providing a central, potentially thread-local,
memory management.
Memory Areas:
Initially, the UMM only handles CacheBlock objects (e.g., MatrixBlock, FrameBlock, and
TensorBlock), and manages two memory areas:
(1) operation memory (pinned cache blocks and reserved memory) and
(2) dirty objects (dirty cache blocks that need to be written to local FS before eviction)
The UMM is configured with a capacity (absolute size in byte). Relative to this capacity,
the operations and buffer pool memory areas each will have a min and max amount of memory
they can occupy, meaning that the boundary for the areas can shift dynamically depending
on the current load. Most importantly, though, dirty objects must not be counted twice
when pinning such an object for an operation. The min/max constraints are not exposed but
configured internally. A good starting point are the following constraints (relative to
JVM max heap size):
___________________________
| operations | 0% | 70% | (pin requests always accepted)
| buffer pool | 15% | 85% | (eviction on demand)
Object Lifecycle:
The UMM will also need to keep track of the current state of individual cache blocks, for
which it will have a few member variables. A queue similar to the current EvictionQueue is
used to add/remove entries with LRU as its eviction policy. In general, there are three
properties of object status to consider:
(1) Non-dirty/dirty: non-dirty objects have a representation on HDFS or can be recomputed
from lineage trace (e.g., rand/seq outputs), while dirty objects need to be preserved.
(2) FS Persisted: on eviction, dirty objects need to be written to local file system.
As long the local file representation exist, dirty objects can simply be dropped.
(3) Pinned/unpinned: For operations, objects are pinned into memory to guard against
eviction. All pin requests have to be accepted, and once a non-dirty object is released
(unpinned) it can be dropped without persisting it to local FS.
Example Scenarios for an Operation:
(1) Inputs are available in the UMM, enough space left for the output.
(2) Some inputs are pre-evicted. Read and pin those in the operational memory.
(3) Inputs are available in the UMM, not enough space left for the output.
Evict cached objects to reserve worst-case output memory.
(4) Some inputs are pre-evicted and not enough space left for the inputs
and output. Evict cached objects to make space for the inputs.
Evict cached objects to reserve worst-case output memory.
Thread-safeness:
Initially, the UMM will be used in an instance-based manner. For global visibility and
use in parallel for loops, the UMM would need to provide a static, synchronized API, but
this constitutes a source of severe contention. In the future, we will consider a design
with thread-local UMMs for the individual parfor workers.