Apache SystemDS^™ 3.1.0 Release Notes

Release notes for SystemDS 3.1.0

SystemDS 3.1 is a minor release. Release 3.1 contains new features and major improvements to existing features.

Features and Improvements

Performance codegen kmeans mnist80m w/ compression
Prefetch instruction
Broadcast instruction
Create apply functions for cleaning primitives
LogicalEnumerator change with transitions concept and cleanups
Flatten the nested loop for parallel pipelines execution
Adding apply_pipeline() builtin for cleaning pipelines API
Release docker images with GitHub actions
Add monitoring tool testing workflows
Asynchronous Execution and Persist Spark Transformations
Future-based asynchronous execution of Spark actions
New operator linearization order to maximize inter-operator parallelism
Lineage-based reuse of Spark actions
Push down rmvar instructions for asynchronous instructions
Lineage-based reuse of asynchronous operators
Persist and reuse Spark RDDs
Refactor to add LOP rewrite step in compilation
Federated Compression Instruction
CLA IO Compressed Matrices
Compressed Max/Min Index support.
Federated async compression
Federated Workload-aware Compression
Python 3.9 support
Parallel Compressed Encode
New builtin function auc (area under ROC curve)
Unique() function for performance

Bug

Fix memory configuration in sparkDML.sh
OOM Error On Binary Write
Out of memory error
CLA Improved Run estimation
AttributeError: Function definition not found
applySchema built-in to set the schema of frame from DML
CSR TSMM left with filled rows bug
Sparse TSMM dense row blocks CSR
py4j.Py4JException: Method exceptionString([class org.apache.spark.SparkConf]) does not exist
MatrixBlock size using CSR when allowed
Federated Nan Values
countDistinctApprox() operation in AggregateUnaryCPInstruction is inefficient for row/col aggregations
Correct the release artifact generation date
Log4j incompatible dependencies
ConcurrentModificationException in federated execution
Jackson Core missing for json writing and reading in reduced binary
Fix Java doc warnings
Enque output not UTF-8 python
Read CSV directly without mtd python
Python configuration not loading defaults
Matrix Multiplication crash in Spark
Pipelines failing in Hybrid execution
Built-in tests failure in Git actions
Cleaning Pipelines failed with No space left on device
IndexOutOfBounds due to int overflow on replace
Cleaning Pipelines: Replace function failure in hybrid execution
Cleaning Pipelines: Block Sizes mismatch
Cleaning Pipelines in hybrid mode: Invalid block dimensions error
Federated Statistics print in non federated scenario
Spark Aggregate Binary operations parse to Fed instruction
FederationUtils.bindResponses causes out of memory because of sparse matrices.
Python IDE test Docs fail
MSVM robustness for non-existing classes
CLA ArrayOutOfBounds in sample
CLA Invalid Unique estimate DDC
Federated read cache cannot be disabled
Monitoring Heavy hitters not always correct list
Slow Federated Mlogreg on Criteo (dummy-coded)
Incorrect warning when reading scalars
Spark with default settings
Cleaning Pipelines: Task Parallel Experiments failing in spark mode
Unique() crashes with iterator EOF on vectors with >1K distinct items
Perftest: Mlogreg on 1M_1k_dense w/ unnecessary spark jobs
Perftest: lmDS on 1M_1k_dense with unnecessary spark tsmm
Java doc warnings

Apache SystemDS™ 3.1.0 Release Notes

Release notes for SystemDS 3.1.0

Features and Improvements

Bug

Apache SystemDS^™ 3.1.0 Release Notes