Package org.apache.sysds.runtime.util
Class DataConverter
- java.lang.Object
-
- org.apache.sysds.runtime.util.DataConverter
-
public class DataConverter extends Object
This class provides methods to read and write matrix blocks from to HDFS using different data formats. Those functionalities are used especially for CP read/write and exporting in-memory matrices to HDFS (before executing MR jobs).
-
-
Constructor Summary
Constructors Constructor Description DataConverter()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.commons.math3.linear.Array2DRowRealMatrix
convertToArray2DRowRealMatrix(MatrixBlock mb)
Helper method that converts SystemDS matrix variable (varname
) into a Array2DRowRealMatrix format, which is useful in invoking Apache CommonsMath.static org.apache.commons.math3.linear.BlockRealMatrix
convertToBlockRealMatrix(MatrixBlock mb)
static boolean[]
convertToBooleanVector(MatrixBlock mb)
static DenseBlock
convertToDenseBlock(MatrixBlock mb)
static DenseBlock
convertToDenseBlock(MatrixBlock mb, boolean deep)
static List<Double>
convertToDoubleList(MatrixBlock mb)
static double[][]
convertToDoubleMatrix(MatrixBlock mb)
Creates a two-dimensional double matrix of the input matrix block.static double[]
convertToDoubleVector(MatrixBlock mb)
static double[]
convertToDoubleVector(MatrixBlock mb, boolean deep)
static double[]
convertToDoubleVector(MatrixBlock mb, boolean deep, boolean allowNull)
static FrameBlock
convertToFrameBlock(String[][] data)
Converts a two dimensions string array into a frame block of value type string.static FrameBlock
convertToFrameBlock(String[][] data, Types.ValueType[] schema)
static FrameBlock
convertToFrameBlock(String[][] data, Types.ValueType[] schema, String[] colnames)
static FrameBlock
convertToFrameBlock(MatrixBlock mb)
Converts a matrix block into a frame block of value type double.static FrameBlock
convertToFrameBlock(MatrixBlock mb, int k)
Converts a matrix block into a frame block of value type double.static FrameBlock
convertToFrameBlock(MatrixBlock mb, Types.ValueType vt)
Converts a matrix block into a frame block of value type given.static FrameBlock
convertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema)
Converts a matrix block into a frame block of with the given schemastatic FrameBlock
convertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema, int k)
Converts a matrix block into a frame block of with the given schemastatic FrameBlock
convertToFrameBlock(MatrixBlock mb, Types.ValueType vt, int k)
Converts a matrix block into a frame block of a given value type.static int[]
convertToIntVector(MatrixBlock mb)
static long[]
convertToLongVector(MatrixBlock mb)
static MatrixBlock
convertToMatrixBlock(double[][] data)
Creates a dense Matrix Block and copies the given double matrix into it.static MatrixBlock
convertToMatrixBlock(double[] data, boolean columnVector)
Creates a dense Matrix Block and copies the given double vector into it.static MatrixBlock
convertToMatrixBlock(int[][] data)
Converts an Integer matrix to an MatrixBlockstatic MatrixBlock
convertToMatrixBlock(HashMap<MatrixIndexes,Double> map)
static MatrixBlock
convertToMatrixBlock(HashMap<MatrixIndexes,Double> map, int rlen, int clen)
NOTE: this method also ensures the specified matrix dimensionsstatic MatrixBlock
convertToMatrixBlock(org.apache.commons.math3.linear.RealMatrix rm)
static MatrixBlock
convertToMatrixBlock(FrameBlock frame)
Converts a frame block with arbitrary schema into a matrix block.static MatrixBlock
convertToMatrixBlock(CTableMap map)
static MatrixBlock
convertToMatrixBlock(CTableMap map, int rlen, int clen)
NOTE: this method also ensures the specified matrix dimensionsstatic MatrixBlock[]
convertToMatrixBlockPartitions(MatrixBlock mb, boolean colwise)
static String[][]
convertToStringFrame(FrameBlock frame)
Converts a frame block with arbitrary schema into a two dimensional string array.static TensorBlock
convertToTensorBlock(MatrixBlock mb, Types.ValueType vt, boolean toBasicTensor)
static int[]
convertVectorToIndexList(MatrixBlock mb)
static void
copyToDoubleVector(MatrixBlock mb, double[] dest, int destPos)
static int[]
getTensorDimensions(ExecutionContext ec, CPOperand dims)
static MatrixBlock
readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen)
static MatrixBlock
readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, boolean localFS)
static MatrixBlock
readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz)
static MatrixBlock
readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, boolean localFS)
static MatrixBlock
readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, FileFormatProperties formatProperties)
static MatrixBlock
readMatrixFromHDFS(ReadProperties prop)
Core method for reading matrices in format textcell, matrixmarket, binarycell, or binaryblock from HDFS into main memory.static TensorBlock
readTensorFromHDFS(String dir, Types.FileFormat fmt, long[] dims, int blen, Types.ValueType[] schema)
static BitSet
toBitSet(double[] data)
static double[]
toDouble(float[] data)
static double[]
toDouble(int[] data)
static double[]
toDouble(long[] data)
static double[]
toDouble(String[] data)
static double[]
toDouble(BitSet data, int len)
static float[]
toFloat(double[] data)
static int[]
toInt(double[] data)
static long[]
toLong(double[] data)
static String[]
toString(double[] data)
static String
toString(TensorBlock tb)
static String
toString(TensorBlock tb, boolean sparse, String separator, String lineseparator, String leftBorder, String rightBorder, int rowsToPrint, int colsToPrint, int decimal)
Returns a string representation of a tensorstatic String
toString(FrameBlock fb)
static String
toString(FrameBlock fb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)
static String
toString(ListObject list, int rows, int cols, boolean sparse, String separator, String lineSeparator, int rowsToPrint, int colsToPrint, int decimal)
static String
toString(MatrixBlock mb)
static String
toString(MatrixBlock mb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)
Returns a string representation of a matrixstatic void
writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc)
static void
writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties)
static void
writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties, boolean diag)
static void
writeTensorToHDFS(TensorBlock tensor, String dir, Types.FileFormat fmt, DataCharacteristics dc)
-
-
-
Method Detail
-
writeMatrixToHDFS
public static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc) throws IOException
- Throws:
IOException
-
writeMatrixToHDFS
public static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties) throws IOException
- Throws:
IOException
-
writeMatrixToHDFS
public static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties, boolean diag) throws IOException
- Throws:
IOException
-
writeTensorToHDFS
public static void writeTensorToHDFS(TensorBlock tensor, String dir, Types.FileFormat fmt, DataCharacteristics dc) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, boolean localFS) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, boolean localFS) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, FileFormatProperties formatProperties) throws IOException
- Throws:
IOException
-
readTensorFromHDFS
public static TensorBlock readTensorFromHDFS(String dir, Types.FileFormat fmt, long[] dims, int blen, Types.ValueType[] schema) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(ReadProperties prop) throws IOException
Core method for reading matrices in format textcell, matrixmarket, binarycell, or binaryblock from HDFS into main memory. For expected dense matrices we directly copy value- or block-at-a-time into the target matrix. In contrast, for sparse matrices, we append (column-value)-pairs and do a final sort if required in order to prevent large reorg overheads and increased memory consumption in case of unordered inputs. DENSE MxN input: * best/average/worst: O(M*N) SPARSE MxN input * best (ordered, or binary block w/ clen<=blen): O(M*N) * average (unordered): O(M*N*log(N)) * worst (descending order per row): O(M * N^2) NOTE: providing an exact estimate of 'expected sparsity' can prevent a full copy of the result matrix block (required for changing sparse->dense, or vice versa)- Parameters:
prop
- read properties- Returns:
- matrix block
- Throws:
IOException
- if IOException occurs
-
convertToDoubleMatrix
public static double[][] convertToDoubleMatrix(MatrixBlock mb)
Creates a two-dimensional double matrix of the input matrix block.- Parameters:
mb
- matrix block- Returns:
- 2d double array
-
convertToBooleanVector
public static boolean[] convertToBooleanVector(MatrixBlock mb)
-
convertVectorToIndexList
public static int[] convertVectorToIndexList(MatrixBlock mb)
-
convertToIntVector
public static int[] convertToIntVector(MatrixBlock mb)
-
convertToLongVector
public static long[] convertToLongVector(MatrixBlock mb)
-
convertToDenseBlock
public static DenseBlock convertToDenseBlock(MatrixBlock mb)
-
convertToDenseBlock
public static DenseBlock convertToDenseBlock(MatrixBlock mb, boolean deep)
-
convertToDoubleVector
public static double[] convertToDoubleVector(MatrixBlock mb)
-
convertToDoubleVector
public static double[] convertToDoubleVector(MatrixBlock mb, boolean deep)
-
convertToDoubleVector
public static double[] convertToDoubleVector(MatrixBlock mb, boolean deep, boolean allowNull)
-
convertToDoubleList
public static List<Double> convertToDoubleList(MatrixBlock mb)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(double[][] data)
Creates a dense Matrix Block and copies the given double matrix into it.- Parameters:
data
- 2d double array- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(int[][] data)
Converts an Integer matrix to an MatrixBlock- Parameters:
data
- Int matrix input that is converted to double MatrixBlock- Returns:
- The matrixBlock constructed.
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(double[] data, boolean columnVector)
Creates a dense Matrix Block and copies the given double vector into it.- Parameters:
data
- double arraycolumnVector
- if true, create matrix with single column. if false, create matrix with single row- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(HashMap<MatrixIndexes,Double> map)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(HashMap<MatrixIndexes,Double> map, int rlen, int clen)
NOTE: this method also ensures the specified matrix dimensions- Parameters:
map
- map of matrix index keys and double valuesrlen
- number of rowsclen
- number of columns- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(CTableMap map)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(CTableMap map, int rlen, int clen)
NOTE: this method also ensures the specified matrix dimensions- Parameters:
map
- ?rlen
- number of rowsclen
- number of columns- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(FrameBlock frame)
Converts a frame block with arbitrary schema into a matrix block. Since matrix block only supports value type double, we do a best effort conversion of non-double types which might result in errors for non-numerical data.- Parameters:
frame
- frame block- Returns:
- matrix block
-
convertToStringFrame
public static String[][] convertToStringFrame(FrameBlock frame)
Converts a frame block with arbitrary schema into a two dimensional string array.- Parameters:
frame
- frame block- Returns:
- 2d string array
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(String[][] data)
Converts a two dimensions string array into a frame block of value type string. If the given array is null or of length 0, we return an empty frame block.- Parameters:
data
- 2d string array- Returns:
- frame block
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(String[][] data, Types.ValueType[] schema)
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(String[][] data, Types.ValueType[] schema, String[] colnames)
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb)
Converts a matrix block into a frame block of value type double.- Parameters:
mb
- matrix block- Returns:
- frame block of type double
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb, int k)
Converts a matrix block into a frame block of value type double.- Parameters:
mb
- matrix blockk
- parallelization degree- Returns:
- frame block of type double
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType vt)
Converts a matrix block into a frame block of value type given.- Parameters:
mb
- matrix blockvt
- value type target- Returns:
- frame block of type given
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType vt, int k)
Converts a matrix block into a frame block of a given value type.- Parameters:
mb
- matrix blockvt
- value typek
- parallelization degree- Returns:
- a return frame block with the given schema
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema)
Converts a matrix block into a frame block of with the given schema- Parameters:
mb
- matrix blockschema
- schema- Returns:
- a return frame block with the given schema
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema, int k)
Converts a matrix block into a frame block of with the given schema- Parameters:
mb
- matrix blockschema
- schemak
- parallelization degree- Returns:
- a return frame block with the given schema
-
convertToTensorBlock
public static TensorBlock convertToTensorBlock(MatrixBlock mb, Types.ValueType vt, boolean toBasicTensor)
-
convertToMatrixBlockPartitions
public static MatrixBlock[] convertToMatrixBlockPartitions(MatrixBlock mb, boolean colwise)
-
convertToArray2DRowRealMatrix
public static org.apache.commons.math3.linear.Array2DRowRealMatrix convertToArray2DRowRealMatrix(MatrixBlock mb)
Helper method that converts SystemDS matrix variable (varname
) into a Array2DRowRealMatrix format, which is useful in invoking Apache CommonsMath.- Parameters:
mb
- matrix object- Returns:
- matrix as a commons-math3 Array2DRowRealMatrix
-
convertToBlockRealMatrix
public static org.apache.commons.math3.linear.BlockRealMatrix convertToBlockRealMatrix(MatrixBlock mb)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(org.apache.commons.math3.linear.RealMatrix rm)
-
copyToDoubleVector
public static void copyToDoubleVector(MatrixBlock mb, double[] dest, int destPos)
-
toString
public static String toString(MatrixBlock mb)
-
toString
public static String toString(MatrixBlock mb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)
Returns a string representation of a matrix- Parameters:
mb
- matrix blocksparse
- if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the matrix blockseparator
- Separator string between each element in a row, or between the columns in sparse formatlineseparator
- Separator string between each rowrowsToPrint
- maximum number of rows to print, -1 for allcolsToPrint
- maximum number of columns to print, -1 for alldecimal
- number of decimal places to print, -1 for default- Returns:
- matrix as a string
-
toString
public static String toString(TensorBlock tb)
-
toString
public static String toString(TensorBlock tb, boolean sparse, String separator, String lineseparator, String leftBorder, String rightBorder, int rowsToPrint, int colsToPrint, int decimal)
Returns a string representation of a tensor- Parameters:
tb
- tensor blocksparse
- if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the tensor blockseparator
- Separator string between each element in a row, or between the columns in sparse formatlineseparator
- Separator string between each rowleftBorder
- Characters placed at the start of a new dimension levelrightBorder
- Characters placed at the end of a new dimension levelrowsToPrint
- maximum number of rows to print, -1 for allcolsToPrint
- maximum number of columns to print, -1 for alldecimal
- number of decimal places to print, -1 for default- Returns:
- tensor as a string
-
toString
public static String toString(FrameBlock fb)
-
toString
public static String toString(FrameBlock fb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)
-
toString
public static String toString(ListObject list, int rows, int cols, boolean sparse, String separator, String lineSeparator, int rowsToPrint, int colsToPrint, int decimal)
-
getTensorDimensions
public static int[] getTensorDimensions(ExecutionContext ec, CPOperand dims)
-
toDouble
public static double[] toDouble(float[] data)
-
toDouble
public static double[] toDouble(long[] data)
-
toDouble
public static double[] toDouble(int[] data)
-
toDouble
public static double[] toDouble(BitSet data, int len)
-
toDouble
public static double[] toDouble(String[] data)
-
toFloat
public static float[] toFloat(double[] data)
-
toInt
public static int[] toInt(double[] data)
-
toLong
public static long[] toLong(double[] data)
-
toBitSet
public static BitSet toBitSet(double[] data)
-
toString
public static String[] toString(double[] data)
-
-