SystemDSContext

All operations using SystemDS need a java instance running. The connection is ensured by an SystemDSContext object. An SystemDSContext object can be created using

from systemds.context import SystemDSContext
sds = SystemDSContext()

When the calculations are finished the context has to be closed again

sds.close()

Since it is annoying that it is always necessary to close the context, SystemDSContext implements the python context management protocol, which supports the following syntax

with SystemDSContext() as sds:
  # do something with sds which is an SystemDSContext
  pass

This will automatically close the SystemDSContext once the with-block is left.

Note

Creating a context is an expensive procedure, because a sub-process starting a JVM might have to start, therefore try to do this only once for your program, or always leave at least one context open.

class systemds.context.SystemDSContext(port: int = -1, capture_statistics: bool = False, capture_stdout: bool = False, logging_level: int = 20, py4j_logging_level: int = 50)

A context with a connection to a java instance with which SystemDS operations are executed. The java process is started and is running using a random tcp port for instruction parsing.

This class is used as the starting point for all SystemDS execution. It gives the ability to create all the different objects and adding them to the execution.

__init__(port: int = -1, capture_statistics: bool = False, capture_stdout: bool = False, logging_level: int = 20, py4j_logging_level: int = 50)

Starts a new instance of SystemDSContext, in which the connection to a JVM systemds instance is handled Any new instance of this SystemDS Context, would start a separate new JVM.

Standard out and standard error form the JVM is also handled in this class, filling up Queues, that can be read from to get the printed statements from the JVM.

Parameters:
  • port – default -1, giving a random port for communication with JVM

  • capture_statistics – If the statistics of the execution in SystemDS should be captured

  • capture_stdout – If the standard out should be captured in Java SystemDS and maintained in ques

  • logging_level – Specify the logging level used for informative messages, default 20 indicating INFO. The logging levels are as follows: 10 DEBUG, 20 INFO, 30 WARNING, 40 ERROR, 50 CRITICAL.

  • py4j_logging_level – The logging level for Py4j to use, since all communication to the JVM is done through this, it can be verbose if not set high.

array(*args: Sequence[DAGNode | str | int | float | bool]) List

Create a List object containing the given nodes.

Note that only a sequence is allowed, or a dictionary, not both at the same time. :param args: A Sequence that will be inserted to a list :param kwargs: A Dictionary that will return a dictionary, (internally handled as a list) :return: A List

capture_stats(enable: bool = True)

Enable (or disable) capturing of execution statistics. :param enable: if True enable capturing, else disable it

capture_stats_context()

Context for capturing statistics. Should be used in a with statement. Afterwards capturing will be reset to the state it was before.

Example:

# `Python # with sds.capture_stats_context(): #     a = some_computation.compute() #     b = another_computation.compute() # print(sds.take_stats()) # `

Returns:

a context object to be used in a with statement

clear_stats()

Clears the captured statistics.

close()

Close the connection to the java process and do necessary cleanup.

combine(*args: Sequence[DAGNode | str | int | float | bool]) Combine

combine nodes to call compute on multiple operations.

This is usefull for the case of having multiple writes in one script and wanting to execute all in one execution reusing intermediates.

Note this combine does not allow to return anything to the user, so if used, please only use nodes that end with either writing or printing elements.

Parameters:

args – A sequence that will be executed with call to compute()

dict(**kwargs: Dict[str, DAGNode | str | int | float | bool]) List

Create a List object containing the given nodes.

Note that only a sequence is allowed, or a dictionary, not both at the same time. :param args: A Sequence that will be inserted to a list :param kwargs: A Dictionary that will return a dictionary, (internally handled as a list) :return: A List

exception_and_close(exception, trace_back_limit: int | None = None)

Method for printing exception, printing stdout and error, while also closing the context correctly.

Parameters:

e – the exception thrown

federated(addresses: Iterable[str], ranges: Iterable[Tuple[Iterable[int], Iterable[int]]], *args, **kwargs: Dict[str, DAGNode | str | int | float | bool]) Matrix

Create federated matrix object.

Parameters:
  • sds_context – the SystemDS context

  • addresses – addresses of the federated workers

  • ranges – for each federated worker a pair of begin and end index of their held matrix

  • args – unnamed params

  • kwargs – named params

Returns:

The Matrix containing the Federated data.

from_numpy(mat: array, *args: Sequence[DAGNode | str | int | float | bool], **kwargs: Dict[str, DAGNode | str | int | float | bool]) Matrix

Generate DAGNode representing matrix with data given by a numpy array, which will be sent to SystemDS on need.

Parameters:
  • mat – the numpy array

  • args – unnamed parameters

  • kwargs – named parameters

Returns:

A Matrix

from_pandas(df: DataFrame, *args: Sequence[DAGNode | str | int | float | bool], **kwargs: Dict[str, DAGNode | str | int | float | bool]) Frame

Generate DAGNode representing frame with data given by a pandas dataframe, which will be sent to SystemDS on need.

Parameters:
  • df – the pandas dataframe

  • args – unnamed parameters

  • kwargs – named parameters

Returns:

A Frame

full(shape: Tuple[int, int], value: float | int) Matrix

Generates a matrix completely filled with a value

Parameters:
  • sds_context – SystemDS context

  • shape – shape (rows and cols) of the matrix TODO tensor

  • value – the value to fill all cells with

Returns:

the OperationNode representing this operation

get_stats()

Get the captured statistics. Will not clear the captured statistics.

See take_stats() for an option that also clears the captured statistics. :return: The captured statistics

get_stderr(lines: int = -1)

Getter for the stderr of the java subprocess The output is taken from the stderr queue and returned in a new list. :param lines: The number of lines to try to read from the stderr queue. default -1 prints all current lines in the queue.

get_stdout(lines: int = -1)

Getter for the stdout of the java subprocess The output is taken from the stdout queue and returned in a new list. :param lines: The number of lines to try to read from the stdout queue. default -1 prints all current lines in the queue.

list(*args: Sequence[DAGNode | str | int | float | bool], **kwargs: Dict[str, DAGNode | str | int | float | bool]) List

Create a List object containing the given nodes.

Note that only a sequence is allowed, or a dictionary, not both at the same time. :param args: A Sequence that will be inserted to a list :param kwargs: A Dictionary that will return a dictionary, (internally handled as a list) :return: A List

rand(rows: int, cols: int, min: float | int | None = None, max: float | int | None = None, pdf: str = 'uniform', sparsity: float | int | None = None, seed: float | int | None = None, lamb: float | int = 1) Matrix

Generates a matrix filled with random values

Parameters:
  • sds_context – SystemDS context

  • rows – number of rows

  • cols – number of cols

  • min – min value for cells

  • max – max value for cells

  • pdf – probability distribution function: “uniform”/”normal”/”poison” distribution

  • sparsity – fraction of non-zero cells

  • seed – random seed

  • lamb – lambda value for “poison” distribution

Returns:

read(path: PathLike, **kwargs: Dict[str, DAGNode | str | int | float | bool]) OperationNode

Read an file from disk. Supported types include: CSV, Matrix Market(coordinate), Text(i,j,v), SystemDS Binary, etc. See: http://apache.github.io/systemds/site/dml-language-reference#readwrite-built-in-functions for more details :return: an Operation Node, containing the read data the operationNode read can be of types, Matrix, Frame or Scalar.

scalar(v: Dict[str, DAGNode | str | int | float | bool]) Scalar

Construct an scalar value, this can contain str, float, double, integers and booleans. :return: A scalar containing the given value.

seq(start: float | int, stop: float | int | None = None, step: float | int = 1) Matrix

Create a single column vector with values from start to stop and an increment of step. If no stop is defined and only one parameter is given, then start will be 0 and the parameter will be interpreted as stop.

Parameters:
  • sds_context – SystemDS context

  • start – the starting value

  • stop – the maximum value

  • step – the step size

Returns:

the OperationNode representing this operation

source(path: str, name: str) Source

Import methods from a given dml file.

The importing is done through the DML command source, and adds all defined methods from the script to the Source object returned in python. This gives the flexibility to call the methods directly on the object returned.

In systemds a method called func_01 can then be imported using

`python res = self.sds.source("PATH_TO_FILE", "UNIQUE_NAME").func_01().compute(verbose = True) `

Parameters:
  • path – The absolute or relative path to the file to import

  • name – The name to give the imported file in the script, this name must be unique

take_stats()

Get the captured statistics and clear the captured statistics.

See get_stats() for an option that does not clear the captured statistics. :return: The captured statistics