Algorithms
SystemDS support different Machine learning algorithms out of the box.
As an example the lm algorithm can be used as follows:
# Import numpy and SystemDS
import numpy as np
from systemds.context import SystemDSContext
from systemds.operator.algorithm import lm
# Set a seed
np.random.seed(0)
# Generate matrix of feature vectors
features = np.random.rand(10, 15)
# Generate a 1-column matrix of response values
y = np.random.rand(10, 1)
# compute the weights
with SystemDSContext() as sds:
weights = lm(sds.from_numpy(features), sds.from_numpy(y)).compute()
print(weights)
The output should be similar to
[[-0.11538199]
[-0.20386541]
[-0.39956035]
[ 1.04078623]
[ 0.4327084 ]
[ 0.18954599]
[ 0.49858968]
[-0.26812763]
[ 0.09961844]
[-0.57000751]
[-0.43386048]
[ 0.55358873]
[-0.54638565]
[ 0.2205885 ]
[ 0.37957689]]
- systemds.operator.algorithm.als(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Location to read the input matrix X to be factorized
rank – Rank of the factorization
reg – Regularization:
lambda – Regularization parameter, no regularization if 0.0
maxi – Maximum number of iterations
check – Check for convergence after every iteration, i.e., updating U and V once
thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared
if – in loss in any two consecutive iterations falls below this threshold;
if – FALSE thr is ignored
- Returns
‘OperationNode’ containing x n matrix v
- systemds.operator.algorithm.alsCG(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Location to read the input matrix X to be factorized
rank – Rank of the factorization
reg – Regularization:
lambda – Regularization parameter, no regularization if 0.0
maxi – Maximum number of iterations
check – Check for convergence after every iteration, i.e., updating U and V once
thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared
if – in loss in any two consecutive iterations falls below this threshold;
if – FALSE thr is ignored
- Returns
‘OperationNode’ containing x n matrix v
- systemds.operator.algorithm.alsDS(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
V – Location to read the input matrix V to be factorized
L – Location to write the factor matrix L
R – Location to write the factor matrix R
rank – Rank of the factorization
lambda – Regularization parameter, no regularization if 0.0
maxi – Maximum number of iterations
check – Check for convergence after every iteration, i.e., updating L and R once
thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared
if – in loss in any two consecutive iterations falls below this threshold;
if – FALSE thr is ignored
- Returns
‘OperationNode’ containing x n matrix r
- systemds.operator.algorithm.alsTopkPredict(userIDs: systemds.operator.nodes.matrix.Matrix, I: systemds.operator.nodes.matrix.Matrix, L: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
userIDs – Column vector of user-ids (n x 1)
I – Indicator matrix user-id x user-id to exclude from scoring
L – The factor matrix L: user-id x feature-id
R – The factor matrix R: feature-id x item-id
K – The number of top-K items
- Returns
‘OperationNode’ containing users (rows) & a matrix containing the top-k predicted ratings for the specified users (rows)
- systemds.operator.algorithm.arima(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – The input Matrix to apply Arima on.
max_func_invoc –
?
p – non-seasonal AR order
d – non-seasonal differencing order
q – non-seasonal MA order
P – seasonal AR order
D – seasonal differencing order
Q – seasonal MA order
s – period in terms of number of time-steps
include_mean – center to mean 0, and include in result
solver – solver, is either “cg” or “jacobi”
- Returns
‘OperationNode’ containing the calculated coefficients
- systemds.operator.algorithm.bivar(X: systemds.operator.nodes.matrix.Matrix, S1: systemds.operator.nodes.matrix.Matrix, S2: systemds.operator.nodes.matrix.Matrix, T1: systemds.operator.nodes.matrix.Matrix, T2: systemds.operator.nodes.matrix.Matrix, verbose: bool)
- Parameters
verbose – Print bivar stats
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.confusionMatrix(P: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix)
- Parameters
P – vector of Predictions
Y – vector of Golden standard One Hot Encoded; the one hot encoded vector of actual labels
- Returns
‘OperationNode’ containing the confusion matrix sums of classifications & the confusion matrix averages of each true class
- systemds.operator.algorithm.cox(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, F: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Location to read the input matrix X containing the survival data
containing – information
TE – Column indices of X as a column vector which contain timestamp
F – Column indices of X as a column vector which are to be used for
fitting – model
R – If factors (categorical variables) are available in the input matrix
the – X
each – needs to be removed from X; in this case the start
and – corresponding to the baseline level need to be the same;
if – not provided by default all variables are considered to be continuous
alpha – Parameter to compute a 100*(1-alpha)% confidence interval for the betas
tol – Tolerance (“epsilon”)
moi – Max. number of outer (Newton) iterations
mii – Max. number of inner (conjugate gradient) iterations, 0 = no max
- Returns
‘OperationNode’ containing a summary of some statistics of the fitted model: & matrix rt that contains the order-preserving recoded timestamps from x & which is matrix x with sorted timestamps & matrix mf that contains the column indices of x with the baseline factors removed (if available)
- systemds.operator.algorithm.decisionTree(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
a – vector, other positive Integers indicate the number of categories
If – not provided by default all variables are assumed to be scale
bins – Number of equiheight bins per scale feature to choose thresholds
depth – Maximum depth of the learned tree
verbose – boolean specifying if the algorithm should print information while executing
- Returns
‘OperationNode’ containing looks at if j is an internal node, otherwise 0 & 6,7,… if j is categorical & a leaf node: number of misclassified samples reaching at node j & feature chosen for j is categorical rows 6,7,… depict the value subset chosen for j & a leaf node 1 if j is impure and the number of samples at j > threshold, otherwise 0
- systemds.operator.algorithm.deepWalk(Graph: systemds.operator.nodes.matrix.Matrix, w: int, d: int, gamma: int, t: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
Graph – adjacency matrix of a graph (n x n)
w – window size
d – embedding size
gamma – walks per vertex
t – walk length
alpha – learning rate
beta – factor for decreasing learning rate
- Returns
‘OperationNode’ containing matrix of vertex/word representation (n x d)
- systemds.operator.algorithm.executePipeline(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Returns
‘OperationNode’ containing encoding of categorical features & features & ohe call, to call inside eval as a function & to call inside eval as a function & doing relative over-sampling & count & replace the null with default values & version of pca
- systemds.operator.algorithm.ffTrain(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, out_activation: str, loss_fcn: str, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
batch_size – Batch size
epochs – Number of epochs
learning_rate – Learning rate
out_activation – User specified ouptut activation function. Possible values:
loss_fcn – User specified loss function. Possible values:
shuffle – Flag which indicates if dataset should be shuffled or not
validation_split – Fraction of training set used as validation set
seed – Seed for model initialization
verbose – Flag which indicates if function should print to stdout
Supported – by the model
Supported – by the model
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.garch(X: systemds.operator.nodes.matrix.Matrix, kmax: int, momentum: float, start_stepsize: float, end_stepsize: float, start_vicinity: float, end_vicinity: float, sim_seed: int, verbose: bool)
- Parameters
X – The input Matrix to apply Arima on.
kmax – Number of iterations
momentum – Momentum for momentum-gradient descent (set to 0 to deactivate)
start_stepsize – Initial gradient-descent stepsize
end_stepsize – gradient-descent stepsize at end (linear descent)
start_vicinity – proportion of randomness of restart-location for gradient descent at beginning
end_vicinity – same at end (linear decay)
sim_seed – seed for simulation of process on fitted coefficients
verbose – verbosity, comments during fitting
- Returns
‘OperationNode’ containing simulated garch(1,1) process on fitted coefficients & variances of simulated fitted process & constant term of fitted process & 1-st arch-coefficient of fitted process & 1-st garch-coefficient of fitted process & drawbacks: slow convergence of optimization (sort of simulated annealing/gradient descent)
- systemds.operator.algorithm.gaussianClassifier(D: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
varSmoothing – Smoothing factor for variances
verbose – Print accuracy of the training set
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.gmmPredict(X: systemds.operator.nodes.matrix.Matrix, weight: systemds.operator.nodes.matrix.Matrix, mu: systemds.operator.nodes.matrix.Matrix, precisions_cholesky: systemds.operator.nodes.matrix.Matrix, model: str)
- Parameters
X – Matrix X (instances to be clustered)
weight – Weight of learned model
mu – fitted clusters mean
precisions_cholesky – fitted precision matrix for each mixture
model – fitted model
- Returns
‘OperationNode’ containing predicted cluster labels & probabilities of belongingness & for new instances given the variance and mean of fitted data
- systemds.operator.algorithm.hospitalResidencyMatch(R: systemds.operator.nodes.matrix.Matrix, H: systemds.operator.nodes.matrix.Matrix, capacity: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
R – Residents matrix R.
It – an ORDERED matrix.
H – Hospitals matrix H.
It – an UNORDRED matrix.
capacity – capacity of Hospitals matrix C.
It – a [n*1] matrix with non zero values.
with – and vice-versa (higher is better).
- Returns
‘OperationNode’ containing result matrix & result matrix & an ordered matrix, this means that resident 1 (row 1) likes hospital 2 the most, followed by hospital 1 and hospital 3. & unordered, this would mean that resident 1 (row 1) likes hospital 3 the most (since the value at [1,3] is the row max), & 1 (2.0 preference value) and hospital 2 (1.0 preference value). & an unordered matrix this means that hospital 1 (row 1) likes resident 1 the most (since the value at [1,1] is the row max). & matched with hospital 3 (since [1,3] is non-zero) at a preference level of 2.0. & matched with hospital 1 (since [2,1] is non-zero) at a preference level of 1.0. & matched with hospital 2 (since [3,2] is non-zero) at a preference level of 2.0.
- systemds.operator.algorithm.img_cutout(img_in: systemds.operator.nodes.matrix.Matrix, x: int, y: int, width: int, height: int, fill_value: float)
- Parameters
img_in – Input image as 2D matrix with top left corner at [1, 1]
x – Column index of the top left corner of the rectangle (starting at 1)
y – Row index of the top left corner of the rectangle (starting at 1)
width – Width of the rectangle (must be positive)
height – Height of the rectangle (must be positive)
fill_value – The value to set for the rectangle
- Returns
‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]
- systemds.operator.algorithm.img_invert(img_in: systemds.operator.nodes.matrix.Matrix, max_value: float)
- Parameters
img_in – Input image
max_value – The maximum value pixels can have
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.img_posterize(img_in: systemds.operator.nodes.matrix.Matrix, bits: int)
- Parameters
img_in – Input image
bits – The number of bits keep for the values.
1 – and white, 8 means every integer between 0 and 255.
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.img_rotate(img_in: systemds.operator.nodes.matrix.Matrix, radians: float, fill_value: float)
- Parameters
img_in – Input image as 2D matrix with top left corner at [1, 1]
radians – The value by which to rotate in radian.
fill_value – The background color revealed by the rotation
- Returns
‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]
- systemds.operator.algorithm.img_sample_pairing(img_in1: systemds.operator.nodes.matrix.Matrix, img_in2: systemds.operator.nodes.matrix.Matrix, weight: float)
- Parameters
img_in1 – First input image
img_in2 – Second input image
weight – The weight given to the second image.
0 – img_in1, 1 means only img_in2 will be visible
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.img_shear(img_in: systemds.operator.nodes.matrix.Matrix, shear_x: float, shear_y: float, fill_value: float)
- Parameters
img_in – Input image as 2D matrix with top left corner at [1, 1]
shear_x – Shearing factor for horizontal shearing
shear_y – Shearing factor for vertical shearing
fill_value – The background color revealed by the shearing
- Returns
‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]
- systemds.operator.algorithm.img_transform(img_in: systemds.operator.nodes.matrix.Matrix, out_w: int, out_h: int, a: float, b: float, c: float, d: float, e: float, f: float, fill_value: float)
- Parameters
img_in – Input image as 2D matrix with top left corner at [1, 1]
out_w – Width of the output image
out_h – Height of the output image
abcdef – The first two rows of the affine matrix in row-major order
fill_value – The background of the image
- Returns
‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]
- systemds.operator.algorithm.img_translate(img_in: systemds.operator.nodes.matrix.Matrix, offset_x: float, offset_y: float, out_w: int, out_h: int, fill_value: float)
- Parameters
img_in – Input image as 2D matrix with top left corner at [1, 1]
offset_x – The distance to move the image in x direction
offset_y – The distance to move the image in y direction
out_w – Width of the output image
out_h – Height of the output image
fill_value – The background of the image
- Returns
‘OperationNode’ containing output image as 2d matrix with top left corner at [1, 1]
- systemds.operator.algorithm.km(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, GI: systemds.operator.nodes.matrix.Matrix, SI: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Input matrix X containing the survival data:
number – (categorical features) for grouping and/or stratifying
TE – Column indices of X which contain timestamps (first entry) and event
GI – Column indices of X corresponding to the factors to be used for grouping
SI – Column indices of X corresponding to the factors to be used for stratifying
alpha – Parameter to compute 100*(1-alpha)% confidence intervals for the survivor
function – median
err_type – Parameter to specify the error type according to “greenwood” (the default) or “peto”
conf_type – Parameter to modify the confidence interval; “plain” keeps the lower and
upper – the confidence interval unmodified, “log” (the default)
corresponds – transformation and “log-log” corresponds to the
test_type – If survival data for multiple groups is available specifies which test to
perform – survival data across multiple groups: “none” (the default)
- Returns
‘OperationNode’ containing 7 consecutive columns in km corresponds to a unique combination of groups and strata in the data & schema & whose dimension depends on the number of groups (g) and strata (s) in the data (k denotes the number & for grouping ,i.e., ncol(gi) and l denotes the number of factors used for stratifying, i.e., ncol(si)) & of groups and strata is equal to 1, m will have 4 columns with & data from multiple groups available and ttype=log-rank or wilcoxon, a 1 x 4 matrix t and an g x 5 matrix t_groups_oe with
- systemds.operator.algorithm.kmeans(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – The input Matrix to do KMeans on.
k – Number of centroids
runs – Number of runs (with different initial centroids)
max_iter – Maximum number of iterations per run
eps – Tolerance (epsilon) for WCSS change ratio
is_verbose – do not print per-iteration stats
avg_sample_size_per_centroid – Average number of records per centroid in data samples
seed – The seed used for initial sampling. If set to -1 random seeds are selected.
- Returns
‘OperationNode’ containing the mapping of records to centroids & the output matrix with the centroids
- systemds.operator.algorithm.kmeansPredict(X: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix)
- Parameters
X – The input Matrix to do KMeans on.
C – The input Centroids to map X onto.
- Returns
‘OperationNode’ containing the mapping of records to centroids
- systemds.operator.algorithm.l2svm(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – matrix X of feature vectors
Y – matrix Y of class labels have to be a single column
intercept – No Intercept ( If set to TRUE then a constant bias column is added to X)
epsilon – Procedure terminates early if the reduction in objective function value is less than epsilon (tolerance) times the initial objective function value.
lambda – Regularization parameter (lambda) for L2 regularization
maxIterations – Maximum number of conjugate gradient iterations
maxii –
verbose – Set to true if one wants print statements updating on loss.
columnId – The column Id used if one wants to add a ID to the print statement, Specificly usefull when L2SVM is used in MSVM.
- Returns
‘OperationNode’ containing model matrix
- systemds.operator.algorithm.l2svmPredict(X: systemds.operator.nodes.matrix.Matrix, W: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – matrix X of feature vectors to classify
W – matrix of the trained variables
verbose – Set to true if one wants print statements.
- Returns
‘OperationNode’ containing classification labels maxed to ones and zeros.
- systemds.operator.algorithm.lasso(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – input feature matrix
y – matrix Y columns of the design matrix
tol – target convergence tolerance
M – history length
tau – regularization component
maxi – maximum number of iterations until convergence
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.lm(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Matrix of feature vectors.
y – 1-column matrix of response values.
icpt – Intercept presence, shifting and rescaling the columns of X
reg – Regularization constant (lambda) for L2-regularization. set to nonzero for highly dependant/sparse/numerous features
tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2 norm of the beta-residual is less than tolerance * its initial norm
maxi – Maximum number of conjugate gradient iterations. 0 = no maximum
verbose – If TRUE print messages are activated
- Returns
‘OperationNode’ containing the model fit
- systemds.operator.algorithm.matrixProfile(ts: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
ts – Time series to profile
window_size – Sliding window size
sample_percent – Degree of approximation
between – one (1
computes – solution)
is_verbose – Print debug information
- Returns
‘OperationNode’ containing the computed matrix profile & indices of least distances
- systemds.operator.algorithm.msvmPredict(X: systemds.operator.nodes.matrix.Matrix, W: systemds.operator.nodes.matrix.Matrix)
- Parameters
X – matrix X of feature vectors to classify
W – matrix of the trained variables
- Returns
‘OperationNode’ containing classification labels maxed to ones and zeros.
- systemds.operator.algorithm.multiLogReg(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Location to read the matrix of feature vectors
Y – Location to read the matrix with category labels
icpt – Intercept presence, shifting and rescaling X columns: 0 = no intercept, no shifting, no rescaling; 1 = add intercept, but neither shift nor rescale X; 2 = add intercept, shift & rescale X columns to mean = 0, variance = 1
tol – tolerance (“epsilon”)
reg – regularization parameter (lambda = 1/C); intercept is not regularized
maxi – max. number of outer (Newton) iterations
maxii – max. number of inner (conjugate gradient) iterations, 0 = no max
verbose – flag specifying if logging information should be printed
- Returns
‘OperationNode’ containing betas as output for prediction
- systemds.operator.algorithm.multiLogRegPredict(X: systemds.operator.nodes.matrix.Matrix, B: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Data Matrix X
B – Regression parameters betas
Y – Response vector Y
verbose –
/
- Returns
‘OperationNode’ containing matrix m of predicted means/probabilities & predicted response vector & scalar value of accuracy
- systemds.operator.algorithm.pca(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Input feature matrix
K – Number of reduced dimensions (i.e., columns)
Center – Indicates whether or not to center the feature matrix
Scale – Indicates whether or not to scale the feature matrix
- Returns
‘OperationNode’ containing output dominant eigen vectors (can be used for projections) & the column means of the input, subtracted to construct the pca & the scaling of the values, to make each dimension same size.
- systemds.operator.algorithm.ppca(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – n x m input feature matrix
k – indicates dimension of the new vector space constructed from eigen vectors
maxi – maximum number of iterations until convergence
tolobj – objective function tolerance value to stop ppca algorithm
tolrecerr – reconstruction error tolerance value to stop the algorithm
verbose – verbose debug output
- Returns
‘OperationNode’ containing output feature matrix with k columns & output dominant eigen vectors (can be used for projections)
- systemds.operator.algorithm.randomForest(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Feature matrix X; note that X needs to be both recoded and dummy coded
Y – Label matrix Y; note that Y needs to be both recoded and dummy coded
R – ” Matrix which for each feature in X contains the following information
If – not provided by default all variables are assumed to be scale
bins – Number of equiheight bins per scale feature to choose thresholds
depth – Maximum depth of the learned tree
num_leaf – Number of samples when splitting stops and a leaf node is added
num_samples – Number of samples at which point we switch to in-memory subtree building
num_trees – Number of trees to be learned in the random forest model
subsamp_rate – Parameter controlling the size of each tree in the forest; samples are selected from a
Poisson – parameter subsamp_rate (the default value is 1.0)
feature_subset – Parameter that controls the number of feature used as candidates for splitting at each tree node
as – of number of features in the dataset;
by – root of features (i.e., feature_subset = 0.5) are used at each tree node
impurity – Impurity measure: entropy or Gini (the default)
- Returns
‘OperationNode’ containing tree and each row contains the following information: & that leaf node j is supposed to predict & 7,8,… if j is categorical & chosen for j is categorical rows 7,8,… depict the value subset chosen for j & c containing the number of times samples are chosen in each tree of the random forest & from scale feature ids to global feature ids & from categorical feature ids to global feature ids
- systemds.operator.algorithm.stableMarriage(P: systemds.operator.nodes.matrix.Matrix, A: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
P – proposer matrix P.
It – a square matrix with no zeros.
A – acceptor matrix A.
It – a square matrix with no zeros.
ordered – If true, P and A are assumed to be ordered,
index – vice-versa (higher is better).
- Returns
‘OperationNode’ containing result matrix & 1 (2.0 preference value) and acceptor 2 (1.0 preference value). & 3 (2.0 preference value) and proposer 2 (1.0 preference value). & matched with proposer 3 (since [1,3] is non-zero) at a preference level of 3.0. & matched with proposer 2 (since [2,2] is non-zero) at a preference level of 3.0. & matched with proposer 1 (since [3,1] is non-zero) at a preference level of 1.0.
- systemds.operator.algorithm.tSNE(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Data Matrix of shape
reduced_dims – Output dimensionality
perplexity – Perplexity Parameter
lr – Learning rate
momentum – Momentum Parameter
max_iter – Number of iterations
seed – The seed used for initial values.
If – -1 random seeds are selected.
is_verbose – Print debug information
- Returns
‘OperationNode’ containing data matrix of shape (number of data points, reduced_dims)
- systemds.operator.algorithm.toOneHot(X: systemds.operator.nodes.matrix.Matrix, numClasses: int)
- Parameters
X – vector with N integer entries between 1 and numClasses
numclasses – number of columns, must be >= largest value in X
- Returns
‘OperationNode’ containing matrix with shape (n, numclasses)
- systemds.operator.algorithm.tomeklink(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix)
- Parameters
X – Data Matrix (nxm)
y – Label Matrix (nx1), greater than zero
- Returns
‘OperationNode’ containing
- systemds.operator.algorithm.xgboostPredictClassification(X: systemds.operator.nodes.matrix.Matrix, M: systemds.operator.nodes.matrix.Matrix, learning_rate: float)
- Parameters
X – Matrix of feature vectors we want to predict (X_test)
M – The model created at xgboost
learning_rate – the learning rate used in the model
- Returns
‘OperationNode’ containing the predictions of the samples using the given xgboost model. (y_prediction)
- systemds.operator.algorithm.xgboostPredictRegression(X: systemds.operator.nodes.matrix.Matrix, M: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
- Parameters
X – Matrix of feature vectors we want to predict (X_test)
M – The model created at xgboost
learning_rate – the learning rate used in the model
- Returns
‘OperationNode’ containing the predictions of the samples using the given xgboost model. (y_prediction)