FrameRDDConverterUtils (SystemML 1.2.0 API)

java.lang.Object
- org.apache.sysml.runtime.instructions.spark.utils.FrameRDDConverterUtils

public class FrameRDDConverterUtils
extends Object

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`FrameRDDConverterUtils.LongFrameToLongWritableFrameFunction`
`static class`	`FrameRDDConverterUtils.LongWritableFrameToLongFrameFunction`
`static class`	`FrameRDDConverterUtils.LongWritableTextToLongTextFunction`
`static class`	`FrameRDDConverterUtils.LongWritableToSerFunction`

Constructor Summary

Constructors
Constructor and Description

FrameRDDConverterUtils()

Constructors
Constructor and Description
`FrameRDDConverterUtils()`

Method Summary

All Methods Static Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`static org.apache.spark.api.java.JavaRDD<String>`	`binaryBlockToCsv(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in, MatrixCharacteristics mcIn, org.apache.sysml.runtime.io.FileFormatPropertiesCSV props, boolean strict)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`binaryBlockToDataFrame(org.apache.spark.sql.SparkSession sparkSession, org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in, MatrixCharacteristics mc, org.apache.sysml.parser.Expression.ValueType[] schema)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`binaryBlockToDataFrame(org.apache.spark.sql.SQLContext sqlContext, org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in, MatrixCharacteristics mc, org.apache.sysml.parser.Expression.ValueType[] schema)` Deprecated.
`static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>`	`binaryBlockToMatrixBlock(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> input, MatrixCharacteristics mcIn, MatrixCharacteristics mcOut)`
`static org.apache.spark.api.java.JavaRDD<String>`	`binaryBlockToTextCell(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> input, MatrixCharacteristics mcIn)`
`static int`	`convertDFSchemaToFrameSchema(org.apache.spark.sql.types.StructType dfschema, String[] colnames, org.apache.sysml.parser.Expression.ValueType[] fschema, boolean containsID)` NOTE: regarding the support of vector columns, we make the following schema restriction: single vector column, which allows inference of the vector length without data access and covers the common case.
`static org.apache.spark.sql.types.StructType`	`convertFrameSchemaToDFSchema(org.apache.sysml.parser.Expression.ValueType[] fschema, boolean containsID)` This function will convert Frame schema into DataFrame schema
`static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>`	`csvToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> input, MatrixCharacteristics mc, org.apache.sysml.parser.Expression.ValueType[] schema, boolean hasHeader, String delim, boolean fill, double fillValue)`
`static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>`	`csvToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.api.java.JavaRDD<String> input, MatrixCharacteristics mcOut, org.apache.sysml.parser.Expression.ValueType[] schema, boolean hasHeader, String delim, boolean fill, double fillValue)`
`static org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row>`	`csvToRowRDD(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.api.java.JavaRDD<String> dataRdd, String delim, org.apache.sysml.parser.Expression.ValueType[] schema)`
`static org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row>`	`csvToRowRDD(org.apache.spark.api.java.JavaSparkContext sc, String fnameIn, String delim, org.apache.sysml.parser.Expression.ValueType[] schema)`
`static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>`	`dataFrameToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, MatrixCharacteristics mc, boolean containsID)`
`static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>`	`dataFrameToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, MatrixCharacteristics mc, boolean containsID, Pair<String[],org.apache.sysml.parser.Expression.ValueType[]> out)`
`static org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,FrameBlock>`	`matrixBlockToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input, MatrixCharacteristics mcIn)`
`static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>`	`matrixBlockToBinaryBlockLongIndex(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input, MatrixCharacteristics mcIn)`
`static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>`	`textCellToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> in, MatrixCharacteristics mcOut, org.apache.sysml.parser.Expression.ValueType[] schema)`
`static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock>`	`textCellToBinaryBlockLongIndex(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.api.java.JavaPairRDD<Long,org.apache.hadoop.io.Text> input, MatrixCharacteristics mc, org.apache.sysml.parser.Expression.ValueType[] schema)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- FrameRDDConverterUtils
```
public FrameRDDConverterUtils()
```

Method Detail

csvToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> csvToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                      org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> input,
                                                                                      MatrixCharacteristics mc,
                                                                                      org.apache.sysml.parser.Expression.ValueType[] schema,
                                                                                      boolean hasHeader,
                                                                                      String delim,
                                                                                      boolean fill,
                                                                                      double fillValue)

csvToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> csvToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                      org.apache.spark.api.java.JavaRDD<String> input,
                                                                                      MatrixCharacteristics mcOut,
                                                                                      org.apache.sysml.parser.Expression.ValueType[] schema,
                                                                                      boolean hasHeader,
                                                                                      String delim,
                                                                                      boolean fill,
                                                                                      double fillValue)

binaryBlockToCsv

public static org.apache.spark.api.java.JavaRDD<String> binaryBlockToCsv(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in,
                                                                         MatrixCharacteristics mcIn,
                                                                         org.apache.sysml.runtime.io.FileFormatPropertiesCSV props,
                                                                         boolean strict)

textCellToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> textCellToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                           org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> in,
                                                                                           MatrixCharacteristics mcOut,
                                                                                           org.apache.sysml.parser.Expression.ValueType[] schema)

textCellToBinaryBlockLongIndex

public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> textCellToBinaryBlockLongIndex(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                    org.apache.spark.api.java.JavaPairRDD<Long,org.apache.hadoop.io.Text> input,
                                                                                                    MatrixCharacteristics mc,
                                                                                                    org.apache.sysml.parser.Expression.ValueType[] schema)

binaryBlockToTextCell

public static org.apache.spark.api.java.JavaRDD<String> binaryBlockToTextCell(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> input,
                                                                              MatrixCharacteristics mcIn)

matrixBlockToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,FrameBlock> matrixBlockToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                                           org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input,
                                                                                                                           MatrixCharacteristics mcIn)

matrixBlockToBinaryBlockLongIndex

public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> matrixBlockToBinaryBlockLongIndex(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                       org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input,
                                                                                                       MatrixCharacteristics mcIn)

binaryBlockToMatrixBlock

public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> binaryBlockToMatrixBlock(org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> input,
                                                                                                        MatrixCharacteristics mcIn,
                                                                                                        MatrixCharacteristics mcOut)

dataFrameToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> dataFrameToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                            org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df,
                                                                                            MatrixCharacteristics mc,
                                                                                            boolean containsID)

dataFrameToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> dataFrameToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                            org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df,
                                                                                            MatrixCharacteristics mc,
                                                                                            boolean containsID,
                                                                                            Pair<String[],org.apache.sysml.parser.Expression.ValueType[]> out)

binaryBlockToDataFrame

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> binaryBlockToDataFrame(org.apache.spark.sql.SparkSession sparkSession,
                                                                                            org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in,
                                                                                            MatrixCharacteristics mc,
                                                                                            org.apache.sysml.parser.Expression.ValueType[] schema)

binaryBlockToDataFrame

@Deprecated
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> binaryBlockToDataFrame(org.apache.spark.sql.SQLContext sqlContext,
                                                                                                        org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in,
                                                                                                        MatrixCharacteristics mc,
                                                                                                        org.apache.sysml.parser.Expression.ValueType[] schema)

Deprecated.

convertFrameSchemaToDFSchema

public static org.apache.spark.sql.types.StructType convertFrameSchemaToDFSchema(org.apache.sysml.parser.Expression.ValueType[] fschema,
                                                                                 boolean containsID)

This function will convert Frame schema into DataFrame schema

Parameters:: fschema - frame schema; containsID - true if contains ID column
Returns:: Spark StructType of StructFields representing schema

convertDFSchemaToFrameSchema
```
public static int convertDFSchemaToFrameSchema(org.apache.spark.sql.types.StructType dfschema,
                                               String[] colnames,
                                               org.apache.sysml.parser.Expression.ValueType[] fschema,
                                               boolean containsID)
```
NOTE: regarding the support of vector columns, we make the following schema restriction: single vector column, which allows inference of the vector length without data access and covers the common case.

Parameters:

dfschema - schema as StructType

colnames - column names

fschema - array of SystemML ValueTypes

containsID - if true, contains ID column

Returns:

0-based column index of vector column, -1 if no vector.

csvToRowRDD

public static org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> csvToRowRDD(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                      String fnameIn,
                                                                                      String delim,
                                                                                      org.apache.sysml.parser.Expression.ValueType[] schema)

csvToRowRDD

public static org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> csvToRowRDD(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                      org.apache.spark.api.java.JavaRDD<String> dataRdd,
                                                                                      String delim,
                                                                                      org.apache.sysml.parser.Expression.ValueType[] schema)

Class FrameRDDConverterUtils

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

FrameRDDConverterUtils

Method Detail

csvToBinaryBlock

csvToBinaryBlock

binaryBlockToCsv

textCellToBinaryBlock

textCellToBinaryBlockLongIndex

binaryBlockToTextCell

matrixBlockToBinaryBlock

matrixBlockToBinaryBlockLongIndex

binaryBlockToMatrixBlock

dataFrameToBinaryBlock

dataFrameToBinaryBlock

binaryBlockToDataFrame

binaryBlockToDataFrame

convertFrameSchemaToDFSchema

convertDFSchemaToFrameSchema

csvToRowRDD

csvToRowRDD