Algorithms

SystemDS support different Machine learning algorithms out of the box.

As an example the lm algorithm can be used as follows:

# Import numpy and SystemDS
import numpy as np
from systemds.context import SystemDSContext
from systemds.operator.algorithm import lm

# Set a seed
np.random.seed(0)
# Generate matrix of feature vectors
features = np.random.rand(10, 15)
# Generate a 1-column matrix of response values
y = np.random.rand(10, 1)

# compute the weights
with SystemDSContext() as sds:
  weights = lm(sds.from_numpy(features), sds.from_numpy(y)).compute()
  print(weights)

The output should be similar to

[[-0.11538199]
[-0.20386541]
[-0.39956035]
[ 1.04078623]
[ 0.4327084 ]
[ 0.18954599]
[ 0.49858968]
[-0.26812763]
[ 0.09961844]
[-0.57000751]
[-0.43386048]
[ 0.55358873]
[-0.54638565]
[ 0.2205885 ]
[ 0.37957689]]

systemds.operator.algorithm.WoE(X: Matrix, Y: Matrix, mask: Matrix)

function Weight of evidence / information gain

Parameters:

X –
—
Y –
—
mask –
—

Returns:

Weighted X matrix where the entropy mask is applied

Returns:

A entropy matrix to apply to data

systemds.operator.algorithm.WoEApply(X: Matrix, Y: Matrix, entropyMatrix: Matrix)

function Weight of evidence / information gain apply on new data

Parameters:

X –
—
Y –
—
entropyMatrix –
—

Returns:

Weighted X matrix where the entropy mask is applied

systemds.operator.algorithm.abstain(X: Matrix, Y: Matrix, threshold: float, **kwargs: Dict[str, DAGNode | str | int | float | bool])

This function calls the multiLogReg-function in which solves Multinomial Logistic Regression using Trust Region method

Parameters:

X – matrix of feature vectors
Y – matrix with category labels
threshold – threshold to clear otherwise return X and Y unmodified
verbose – flag specifying if logging information should be printed

Returns:

abstained output X

Returns:

abstained output Y

systemds.operator.algorithm.als(X: Matrix, **kwargs: Dict[str, DAGNode | str | int | float | bool])

This script computes an approximate factorization of a low-rank matrix X into two matrices U and V using different implementations of the Alternating-Least-Squares (ALS) algorithm. Matrices U and V are computed by minimizing a loss function (with regularization).

Parameters:

X – Location to read the input matrix X to be factorized
rank – Rank of the factorization
regType – Regularization: “L2” = L2 regularization; f (U, V) = 0.5 * sum (W * (U %*% V - X) ^ 2) + 0.5 * reg * (sum (U ^ 2) + sum (V ^ 2)) “wL2” = weighted L2 regularization f (U, V) = 0.5 * sum (W * (U %*% V - X) ^ 2) + 0.5 * reg * (sum (U ^ 2 * row_nonzeros) + sum (V ^ 2 * col_nonzeros))
reg – Regularization parameter, no regularization if 0.0
maxi – Maximum number of iterations
check – Check for convergence after every iteration, i.e., updating U and V once
thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared if the decrease in loss in any two consecutive iterations falls below this threshold; if check is FALSE thr is ignored
seed – The seed to random parts of the algorithm
verbose – If the algorithm should run verbosely

Returns: