Apache SystemDS 3.1.0 Release Notes

Release notes for SystemDS 3.1.0

SystemDS 3.1 is a minor release. Release 3.1 contains new features and major improvements to existing features.

Features and Improvements

  • Performance codegen kmeans mnist80m w/ compression
  • Prefetch instruction
  • Broadcast instruction
  • Create apply functions for cleaning primitives
  • LogicalEnumerator change with transitions concept and cleanups
  • Flatten the nested loop for parallel pipelines execution
  • Adding apply_pipeline() builtin for cleaning pipelines API
  • Release docker images with GitHub actions
  • Add monitoring tool testing workflows
  • Asynchronous Execution and Persist Spark Transformations
  • Future-based asynchronous execution of Spark actions
  • New operator linearization order to maximize inter-operator parallelism
  • Lineage-based reuse of Spark actions
  • Push down rmvar instructions for asynchronous instructions
  • Lineage-based reuse of asynchronous operators
  • Persist and reuse Spark RDDs
  • Refactor to add LOP rewrite step in compilation
  • Federated Compression Instruction
  • CLA IO Compressed Matrices
  • Compressed Max/Min Index support.
  • Federated async compression
  • Federated Workload-aware Compression
  • Python 3.9 support
  • Parallel Compressed Encode
  • New builtin function auc (area under ROC curve)
  • Unique() function for performance

Bug

  • Fix memory configuration in sparkDML.sh
  • OOM Error On Binary Write
  • Out of memory error
  • CLA Improved Run estimation
  • AttributeError: Function definition not found
  • applySchema built-in to set the schema of frame from DML
  • CSR TSMM left with filled rows bug
  • Sparse TSMM dense row blocks CSR
  • py4j.Py4JException: Method exceptionString([class org.apache.spark.SparkConf]) does not exist
  • MatrixBlock size using CSR when allowed
  • Federated Nan Values
  • countDistinctApprox() operation in AggregateUnaryCPInstruction is inefficient for row/col aggregations
  • Correct the release artifact generation date
  • Log4j incompatible dependencies
  • ConcurrentModificationException in federated execution
  • Jackson Core missing for json writing and reading in reduced binary
  • Fix Java doc warnings
  • Enque output not UTF-8 python
  • Read CSV directly without mtd python
  • Python configuration not loading defaults
  • Matrix Multiplication crash in Spark
  • Pipelines failing in Hybrid execution
  • Built-in tests failure in Git actions
  • Cleaning Pipelines failed with No space left on device
  • IndexOutOfBounds due to int overflow on replace
  • Cleaning Pipelines: Replace function failure in hybrid execution
  • Cleaning Pipelines: Block Sizes mismatch
  • Cleaning Pipelines in hybrid mode: Invalid block dimensions error
  • Federated Statistics print in non federated scenario
  • Spark Aggregate Binary operations parse to Fed instruction
  • FederationUtils.bindResponses causes out of memory because of sparse matrices.
  • Python IDE test Docs fail
  • MSVM robustness for non-existing classes
  • CLA ArrayOutOfBounds in sample
  • CLA Invalid Unique estimate DDC
  • Federated read cache cannot be disabled
  • Monitoring Heavy hitters not always correct list
  • Slow Federated Mlogreg on Criteo (dummy-coded)
  • Incorrect warning when reading scalars
  • Spark with default settings
  • Cleaning Pipelines: Task Parallel Experiments failing in spark mode
  • Unique() crashes with iterator EOF on vectors with >1K distinct items
  • Perftest: Mlogreg on 1M_1k_dense w/ unnecessary spark jobs
  • Perftest: lmDS on 1M_1k_dense with unnecessary spark tsmm
  • Java doc warnings