Algorithms

SystemDS support different Machine learning algorithms out of the box.

As an example the lm algorithm can be used as follows:

# Import numpy and SystemDS
import numpy as np
from systemds.context import SystemDSContext
from systemds.operator.algorithm import lm

# Set a seed
np.random.seed(0)
# Generate matrix of feature vectors
features = np.random.rand(10, 15)
# Generate a 1-column matrix of response values
y = np.random.rand(10, 1)

# compute the weights
with SystemDSContext() as sds:
  weights = lm(sds.from_numpy(features), sds.from_numpy(y)).compute()
  print(weights)

The output should be similar to

[[-0.11538199]
[-0.20386541]
[-0.39956035]
[ 1.04078623]
[ 0.4327084 ]
[ 0.18954599]
[ 0.49858968]
[-0.26812763]
[ 0.09961844]
[-0.57000751]
[-0.43386048]
[ 0.55358873]
[-0.54638565]
[ 0.2205885 ]
[ 0.37957689]]
systemds.operator.algorithm.als(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Location to read the input matrix X to be factorized

  • rank – Rank of the factorization

  • reg – Regularization:

  • lambda – Regularization parameter, no regularization if 0.0

  • maxi – Maximum number of iterations

  • check – Check for convergence after every iteration, i.e., updating U and V once

  • thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

  • if – in loss in any two consecutive iterations falls below this threshold;

  • if – FALSE thr is ignored

Returns

‘OperationNode’ containing x n matrix v

systemds.operator.algorithm.alsCG(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Location to read the input matrix X to be factorized

  • rank – Rank of the factorization

  • reg – Regularization:

  • lambda – Regularization parameter, no regularization if 0.0

  • maxi – Maximum number of iterations

  • check – Check for convergence after every iteration, i.e., updating U and V once

  • thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

  • if – in loss in any two consecutive iterations falls below this threshold;

  • if – FALSE thr is ignored

Returns

‘OperationNode’ containing x n matrix v

systemds.operator.algorithm.alsDS(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • V – Location to read the input matrix V to be factorized

  • L – Location to write the factor matrix L

  • R – Location to write the factor matrix R

  • rank – Rank of the factorization

  • lambda – Regularization parameter, no regularization if 0.0

  • maxi – Maximum number of iterations

  • check – Check for convergence after every iteration, i.e., updating L and R once

  • thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

  • if – in loss in any two consecutive iterations falls below this threshold;

  • if – FALSE thr is ignored

Returns

‘OperationNode’ containing x n matrix r

systemds.operator.algorithm.alsTopkPredict(userIDs: systemds.operator.nodes.matrix.Matrix, I: systemds.operator.nodes.matrix.Matrix, L: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • userIDs – Column vector of user-ids (n x 1)

  • I – Indicator matrix user-id x user-id to exclude from scoring

  • L – The factor matrix L: user-id x feature-id

  • R – The factor matrix R: feature-id x item-id

  • K – The number of top-K items

Returns

‘OperationNode’ containing users (rows) & a matrix containing the top-k predicted ratings for the specified users (rows)

systemds.operator.algorithm.arima(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – The input Matrix to apply Arima on.

  • max_func_invoc

    ?

  • p – non-seasonal AR order

  • d – non-seasonal differencing order

  • q – non-seasonal MA order

  • P – seasonal AR order

  • D – seasonal differencing order

  • Q – seasonal MA order

  • s – period in terms of number of time-steps

  • include_mean – center to mean 0, and include in result

  • solver – solver, is either “cg” or “jacobi”

Returns

‘OperationNode’ containing the calculated coefficients

systemds.operator.algorithm.bivar(X: systemds.operator.nodes.matrix.Matrix, S1: systemds.operator.nodes.matrix.Matrix, S2: systemds.operator.nodes.matrix.Matrix, T1: systemds.operator.nodes.matrix.Matrix, T2: systemds.operator.nodes.matrix.Matrix, verbose: bool)
Parameters

verbose – Print bivar stats

Returns

‘OperationNode’ containing

systemds.operator.algorithm.confusionMatrix(P: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix)
Parameters
  • P – vector of Predictions

  • Y – vector of Golden standard One Hot Encoded; the one hot encoded vector of actual labels

Returns

‘OperationNode’ containing the confusion matrix sums of classifications & the confusion matrix averages of each true class

systemds.operator.algorithm.cox(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, F: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Location to read the input matrix X containing the survival data

  • containing – information

  • TE – Column indices of X as a column vector which contain timestamp

  • F – Column indices of X as a column vector which are to be used for

  • fitting – model

  • R – If factors (categorical variables) are available in the input matrix

  • the – X

  • each – needs to be removed from X; in this case the start

  • and – corresponding to the baseline level need to be the same;

  • if – not provided by default all variables are considered to be continuous

  • alpha – Parameter to compute a 100*(1-alpha)% confidence interval for the betas

  • tol – Tolerance (“epsilon”)

  • moi – Max. number of outer (Newton) iterations

  • mii – Max. number of inner (conjugate gradient) iterations, 0 = no max

Returns

‘OperationNode’ containing a summary of some statistics of the fitted model: & matrix rt that contains the order-preserving recoded timestamps from x & which is matrix x with sorted timestamps & matrix mf that contains the column indices of x with the baseline factors removed (if available)

systemds.operator.algorithm.decisionTree(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • a – vector, other positive Integers indicate the number of categories

  • If – not provided by default all variables are assumed to be scale

  • bins – Number of equiheight bins per scale feature to choose thresholds

  • depth – Maximum depth of the learned tree

  • verbose – boolean specifying if the algorithm should print information while executing

Returns

‘OperationNode’ containing looks at if j is an internal node, otherwise 0 & 6,7,… if j is categorical & a leaf node: number of misclassified samples reaching at node j & feature chosen for j is categorical rows 6,7,… depict the value subset chosen for j & a leaf node 1 if j is impure and the number of samples at j > threshold, otherwise 0

systemds.operator.algorithm.deepWalk(Graph: systemds.operator.nodes.matrix.Matrix, w: int, d: int, gamma: int, t: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • Graph – adjacency matrix of a graph (n x n)

  • w – window size

  • d – embedding size

  • gamma – walks per vertex

  • t – walk length

  • alpha – learning rate

  • beta – factor for decreasing learning rate

Returns

‘OperationNode’ containing matrix of vertex/word representation (n x d)

systemds.operator.algorithm.executePipeline(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Returns

‘OperationNode’ containing encoding of categorical features & features & ohe call, to call inside eval as a function & to call inside eval as a function & doing relative over-sampling & count & replace the null with default values & version of pca

systemds.operator.algorithm.ffTrain(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, out_activation: str, loss_fcn: str, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • batch_size – Batch size

  • epochs – Number of epochs

  • learning_rate – Learning rate

  • out_activation – User specified ouptut activation function. Possible values:

  • loss_fcn – User specified loss function. Possible values:

  • shuffle – Flag which indicates if dataset should be shuffled or not

  • validation_split – Fraction of training set used as validation set

  • seed – Seed for model initialization

  • verbose – Flag which indicates if function should print to stdout

  • Supported – by the model

  • Supported – by the model

Returns

‘OperationNode’ containing

systemds.operator.algorithm.garch(X: systemds.operator.nodes.matrix.Matrix, kmax: int, momentum: float, start_stepsize: float, end_stepsize: float, start_vicinity: float, end_vicinity: float, sim_seed: int, verbose: bool)
Parameters
  • X – The input Matrix to apply Arima on.

  • kmax – Number of iterations

  • momentum – Momentum for momentum-gradient descent (set to 0 to deactivate)

  • start_stepsize – Initial gradient-descent stepsize

  • end_stepsize – gradient-descent stepsize at end (linear descent)

  • start_vicinity – proportion of randomness of restart-location for gradient descent at beginning

  • end_vicinity – same at end (linear decay)

  • sim_seed – seed for simulation of process on fitted coefficients

  • verbose – verbosity, comments during fitting

Returns

‘OperationNode’ containing simulated garch(1,1) process on fitted coefficients & variances of simulated fitted process & constant term of fitted process & 1-st arch-coefficient of fitted process & 1-st garch-coefficient of fitted process & drawbacks: slow convergence of optimization (sort of simulated annealing/gradient descent)

systemds.operator.algorithm.gaussianClassifier(D: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • varSmoothing – Smoothing factor for variances

  • verbose – Print accuracy of the training set

Returns

‘OperationNode’ containing

systemds.operator.algorithm.gmmPredict(X: systemds.operator.nodes.matrix.Matrix, weight: systemds.operator.nodes.matrix.Matrix, mu: systemds.operator.nodes.matrix.Matrix, precisions_cholesky: systemds.operator.nodes.matrix.Matrix, model: str)
Parameters
  • X – Matrix X (instances to be clustered)

  • weight – Weight of learned model

  • mu – fitted clusters mean

  • precisions_cholesky – fitted precision matrix for each mixture

  • model – fitted model

Returns

‘OperationNode’ containing predicted cluster labels & probabilities of belongingness & for new instances given the variance and mean of fitted data

systemds.operator.algorithm.hospitalResidencyMatch(R: systemds.operator.nodes.matrix.Matrix, H: systemds.operator.nodes.matrix.Matrix, capacity: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • R – Residents matrix R.

  • It – an ORDERED matrix.

  • H – Hospitals matrix H.

  • It – an UNORDRED matrix.

  • capacity – capacity of Hospitals matrix C.

  • It – a [n*1] matrix with non zero values.

  • with – and vice-versa (higher is better).

Returns

‘OperationNode’ containing result matrix & result matrix & an ordered matrix, this means that resident 1 (row 1) likes hospital 2 the most, followed by hospital 1 and hospital 3. & unordered, this would mean that resident 1 (row 1) likes hospital 3 the most (since the value at [1,3] is the row max), & 1 (2.0 preference value) and hospital 2 (1.0 preference value). & an unordered matrix this means that hospital 1 (row 1) likes resident 1 the most (since the value at [1,1] is the row max). & matched with hospital 3 (since [1,3] is non-zero) at a preference level of 2.0. & matched with hospital 1 (since [2,1] is non-zero) at a preference level of 1.0. & matched with hospital 2 (since [3,2] is non-zero) at a preference level of 2.0.

systemds.operator.algorithm.img_cutout(img_in: systemds.operator.nodes.matrix.Matrix, x: int, y: int, width: int, height: int, fill_value: float)
Parameters
  • img_in – Input image as 2D matrix with top left corner at [1, 1]

  • x – Column index of the top left corner of the rectangle (starting at 1)

  • y – Row index of the top left corner of the rectangle (starting at 1)

  • width – Width of the rectangle (must be positive)

  • height – Height of the rectangle (must be positive)

  • fill_value – The value to set for the rectangle

Returns

‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]

systemds.operator.algorithm.img_invert(img_in: systemds.operator.nodes.matrix.Matrix, max_value: float)
Parameters
  • img_in – Input image

  • max_value – The maximum value pixels can have

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_posterize(img_in: systemds.operator.nodes.matrix.Matrix, bits: int)
Parameters
  • img_in – Input image

  • bits – The number of bits keep for the values.

  • 1 – and white, 8 means every integer between 0 and 255.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_rotate(img_in: systemds.operator.nodes.matrix.Matrix, radians: float, fill_value: float)
Parameters
  • img_in – Input image as 2D matrix with top left corner at [1, 1]

  • radians – The value by which to rotate in radian.

  • fill_value – The background color revealed by the rotation

Returns

‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]

systemds.operator.algorithm.img_sample_pairing(img_in1: systemds.operator.nodes.matrix.Matrix, img_in2: systemds.operator.nodes.matrix.Matrix, weight: float)
Parameters
  • img_in1 – First input image

  • img_in2 – Second input image

  • weight – The weight given to the second image.

  • 0 – img_in1, 1 means only img_in2 will be visible

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_shear(img_in: systemds.operator.nodes.matrix.Matrix, shear_x: float, shear_y: float, fill_value: float)
Parameters
  • img_in – Input image as 2D matrix with top left corner at [1, 1]

  • shear_x – Shearing factor for horizontal shearing

  • shear_y – Shearing factor for vertical shearing

  • fill_value – The background color revealed by the shearing

Returns

‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]

systemds.operator.algorithm.img_transform(img_in: systemds.operator.nodes.matrix.Matrix, out_w: int, out_h: int, a: float, b: float, c: float, d: float, e: float, f: float, fill_value: float)
Parameters
  • img_in – Input image as 2D matrix with top left corner at [1, 1]

  • out_w – Width of the output image

  • out_h – Height of the output image

  • abcdef – The first two rows of the affine matrix in row-major order

  • fill_value – The background of the image

Returns

‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]

systemds.operator.algorithm.img_translate(img_in: systemds.operator.nodes.matrix.Matrix, offset_x: float, offset_y: float, out_w: int, out_h: int, fill_value: float)
Parameters
  • img_in – Input image as 2D matrix with top left corner at [1, 1]

  • offset_x – The distance to move the image in x direction

  • offset_y – The distance to move the image in y direction

  • out_w – Width of the output image

  • out_h – Height of the output image

  • fill_value – The background of the image

Returns

‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]

systemds.operator.algorithm.km(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, GI: systemds.operator.nodes.matrix.Matrix, SI: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Input matrix X containing the survival data:

  • number – (categorical features) for grouping and/or stratifying

  • TE – Column indices of X which contain timestamps (first entry) and event

  • GI – Column indices of X corresponding to the factors to be used for grouping

  • SI – Column indices of X corresponding to the factors to be used for stratifying

  • alpha – Parameter to compute 100*(1-alpha)% confidence intervals for the survivor

  • function – median

  • err_type – Parameter to specify the error type according to “greenwood” (the default) or “peto”

  • conf_type – Parameter to modify the confidence interval; “plain” keeps the lower and

  • upper – the confidence interval unmodified, “log” (the default)

  • corresponds – transformation and “log-log” corresponds to the

  • test_type – If survival data for multiple groups is available specifies which test to

  • perform – survival data across multiple groups: “none” (the default)

Returns

‘OperationNode’ containing 7 consecutive columns in km corresponds to a unique combination of groups and strata in the data & schema & whose dimension depends on the number of groups (g) and strata (s) in the data (k denotes the number & for grouping ,i.e., ncol(gi) and l denotes the number of factors used for stratifying, i.e., ncol(si)) & of groups and strata is equal to 1, m will have 4 columns with & data from multiple groups available and ttype=log-rank or wilcoxon, a 1 x 4 matrix t and an g x 5 matrix t_groups_oe with

systemds.operator.algorithm.kmeans(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – The input Matrix to do KMeans on.

  • k – Number of centroids

  • runs – Number of runs (with different initial centroids)

  • max_iter – Maximum number of iterations per run

  • eps – Tolerance (epsilon) for WCSS change ratio

  • is_verbose – do not print per-iteration stats

  • avg_sample_size_per_centroid – Average number of records per centroid in data samples

  • seed – The seed used for initial sampling. If set to -1 random seeds are selected.

Returns

‘OperationNode’ containing the mapping of records to centroids & the output matrix with the centroids

systemds.operator.algorithm.kmeansPredict(X: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix)
Parameters
  • X – The input Matrix to do KMeans on.

  • C – The input Centroids to map X onto.

Returns

‘OperationNode’ containing the mapping of records to centroids

systemds.operator.algorithm.l2svm(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – matrix X of feature vectors

  • Y – matrix Y of class labels have to be a single column

  • intercept – No Intercept ( If set to TRUE then a constant bias column is added to X)

  • epsilon – Procedure terminates early if the reduction in objective function value is less than epsilon (tolerance) times the initial objective function value.

  • lambda – Regularization parameter (lambda) for L2 regularization

  • maxIterations – Maximum number of conjugate gradient iterations

  • maxii

  • verbose – Set to true if one wants print statements updating on loss.

  • columnId – The column Id used if one wants to add a ID to the print statement, Specificly usefull when L2SVM is used in MSVM.

Returns

‘OperationNode’ containing model matrix

systemds.operator.algorithm.l2svmPredict(X: systemds.operator.nodes.matrix.Matrix, W: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – matrix X of feature vectors to classify

  • W – matrix of the trained variables

  • verbose – Set to true if one wants print statements.

Returns

‘OperationNode’ containing classification labels maxed to ones and zeros.

systemds.operator.algorithm.lasso(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – input feature matrix

  • y – matrix Y columns of the design matrix

  • tol – target convergence tolerance

  • M – history length

  • tau – regularization component

  • maxi – maximum number of iterations until convergence

Returns

‘OperationNode’ containing

systemds.operator.algorithm.lm(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Matrix of feature vectors.

  • y – 1-column matrix of response values.

  • icpt – Intercept presence, shifting and rescaling the columns of X

  • reg – Regularization constant (lambda) for L2-regularization. set to nonzero for highly dependant/sparse/numerous features

  • tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2 norm of the beta-residual is less than tolerance * its initial norm

  • maxi – Maximum number of conjugate gradient iterations. 0 = no maximum

  • verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing the model fit

systemds.operator.algorithm.matrixProfile(ts: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • ts – Time series to profile

  • window_size – Sliding window size

  • sample_percent – Degree of approximation

  • between – one (1

  • computes – solution)

  • is_verbose – Print debug information

Returns

‘OperationNode’ containing the computed matrix profile & indices of least distances

systemds.operator.algorithm.msvmPredict(X: systemds.operator.nodes.matrix.Matrix, W: systemds.operator.nodes.matrix.Matrix)
Parameters
  • X – matrix X of feature vectors to classify

  • W – matrix of the trained variables

Returns

‘OperationNode’ containing classification labels maxed to ones and zeros.

systemds.operator.algorithm.multiLogReg(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Location to read the matrix of feature vectors

  • Y – Location to read the matrix with category labels

  • icpt – Intercept presence, shifting and rescaling X columns: 0 = no intercept, no shifting, no rescaling; 1 = add intercept, but neither shift nor rescale X; 2 = add intercept, shift & rescale X columns to mean = 0, variance = 1

  • tol – tolerance (“epsilon”)

  • reg – regularization parameter (lambda = 1/C); intercept is not regularized

  • maxi – max. number of outer (Newton) iterations

  • maxii – max. number of inner (conjugate gradient) iterations, 0 = no max

  • verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing betas as output for prediction

systemds.operator.algorithm.multiLogRegPredict(X: systemds.operator.nodes.matrix.Matrix, B: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Data Matrix X

  • B – Regression parameters betas

  • Y – Response vector Y

  • verbose

    /

Returns

‘OperationNode’ containing matrix m of predicted means/probabilities & predicted response vector & scalar value of accuracy

systemds.operator.algorithm.pca(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Input feature matrix

  • K – Number of reduced dimensions (i.e., columns)

  • Center – Indicates whether or not to center the feature matrix

  • Scale – Indicates whether or not to scale the feature matrix

Returns

‘OperationNode’ containing output dominant eigen vectors (can be used for projections) & the column means of the input, subtracted to construct the pca & the scaling of the values, to make each dimension same size.

systemds.operator.algorithm.ppca(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – n x m input feature matrix

  • k – indicates dimension of the new vector space constructed from eigen vectors

  • maxi – maximum number of iterations until convergence

  • tolobj – objective function tolerance value to stop ppca algorithm

  • tolrecerr – reconstruction error tolerance value to stop the algorithm

  • verbose – verbose debug output

Returns

‘OperationNode’ containing output feature matrix with k columns & output dominant eigen vectors (can be used for projections)

systemds.operator.algorithm.randomForest(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Feature matrix X; note that X needs to be both recoded and dummy coded

  • Y – Label matrix Y; note that Y needs to be both recoded and dummy coded

  • R – ” Matrix which for each feature in X contains the following information

  • If – not provided by default all variables are assumed to be scale

  • bins – Number of equiheight bins per scale feature to choose thresholds

  • depth – Maximum depth of the learned tree

  • num_leaf – Number of samples when splitting stops and a leaf node is added

  • num_samples – Number of samples at which point we switch to in-memory subtree building

  • num_trees – Number of trees to be learned in the random forest model

  • subsamp_rate – Parameter controlling the size of each tree in the forest; samples are selected from a

  • Poisson – parameter subsamp_rate (the default value is 1.0)

  • feature_subset – Parameter that controls the number of feature used as candidates for splitting at each tree node

  • as – of number of features in the dataset;

  • by – root of features (i.e., feature_subset = 0.5) are used at each tree node

  • impurity – Impurity measure: entropy or Gini (the default)

Returns

‘OperationNode’ containing tree and each row contains the following information: & that leaf node j is supposed to predict & 7,8,… if j is categorical & chosen for j is categorical rows 7,8,… depict the value subset chosen for j & c containing the number of times samples are chosen in each tree of the random forest & from scale feature ids to global feature ids & from categorical feature ids to global feature ids

systemds.operator.algorithm.stableMarriage(P: systemds.operator.nodes.matrix.Matrix, A: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • P – proposer matrix P.

  • It – a square matrix with no zeros.

  • A – acceptor matrix A.

  • It – a square matrix with no zeros.

  • ordered – If true, P and A are assumed to be ordered,

  • index – vice-versa (higher is better).

Returns

‘OperationNode’ containing result matrix & 1 (2.0 preference value) and acceptor 2 (1.0 preference value). & 3 (2.0 preference value) and proposer 2 (1.0 preference value). & matched with proposer 3 (since [1,3] is non-zero) at a preference level of 3.0. & matched with proposer 2 (since [2,2] is non-zero) at a preference level of 3.0. & matched with proposer 1 (since [3,1] is non-zero) at a preference level of 1.0.

systemds.operator.algorithm.tSNE(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Data Matrix of shape

  • reduced_dims – Output dimensionality

  • perplexity – Perplexity Parameter

  • lr – Learning rate

  • momentum – Momentum Parameter

  • max_iter – Number of iterations

  • seed – The seed used for initial values.

  • If – -1 random seeds are selected.

  • is_verbose – Print debug information

Returns

‘OperationNode’ containing data matrix of shape (number of data points, reduced_dims)

systemds.operator.algorithm.toOneHot(X: systemds.operator.nodes.matrix.Matrix, numClasses: int)
Parameters
  • X – vector with N integer entries between 1 and numClasses

  • numclasses – number of columns, must be >= largest value in X

Returns

‘OperationNode’ containing matrix with shape (n, numclasses)

Parameters
  • X – Data Matrix (nxm)

  • y – Label Matrix (nx1), greater than zero

Returns

‘OperationNode’ containing

systemds.operator.algorithm.xgboostPredictClassification(X: systemds.operator.nodes.matrix.Matrix, M: systemds.operator.nodes.matrix.Matrix, learning_rate: float)
Parameters
  • X – Matrix of feature vectors we want to predict (X_test)

  • M – The model created at xgboost

  • learning_rate – the learning rate used in the model

Returns

‘OperationNode’ containing the predictions of the samples using the given xgboost model. (y_prediction)

systemds.operator.algorithm.xgboostPredictRegression(X: systemds.operator.nodes.matrix.Matrix, M: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Matrix of feature vectors we want to predict (X_test)

  • M – The model created at xgboost

  • learning_rate – the learning rate used in the model

Returns

‘OperationNode’ containing the predictions of the samples using the given xgboost model. (y_prediction)