IOUtilFunctions (SystemDS 2.0.0 API)

java.lang.Object
- org.apache.sysds.runtime.io.IOUtilFunctions

public class IOUtilFunctions
extends Object

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class IOUtilFunctions.CountRowsTask

Nested Classes
Modifier and Type	Class and Description
`static class`	`IOUtilFunctions.CountRowsTask`

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`EMPTY_TEXT_LINE`
`static org.apache.hadoop.fs.PathFilter`	`hiddenFileFilter`
`static String`	`LIBSVM_DELIM`
`static String`	`LIBSVM_INDEX_DELIM`

Constructor Summary

Constructors
Constructor and Description

IOUtilFunctions()

Constructors
Constructor and Description
`IOUtilFunctions()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static int`	`baToInt(byte[] ba, int off)`
`static long`	`baToLong(byte[] ba, int off)`
`static int`	`baToShort(byte[] ba, int off)`
`static void`	`checkAndRaiseErrorCSVEmptyField(String row, boolean fill, boolean emptyFound)`
`static void`	`checkAndRaiseErrorCSVNumColumns(String fname, String line, String[] parts, long ncol)`
`static void`	`closeSilently(Closeable io)`
`static void`	`closeSilently(org.apache.hadoop.mapred.RecordReader<?,?> rr)`
`static int`	`countNnz(String[] cols)` Returns the number of non-zero entries but avoids the expensive string to double parsing.
`static int`	`countNnz(String[] cols, int pos, int len)` Returns the number of non-zero entries but avoids the expensive string to double parsing.
`static int`	`countNumColumnsCSV(org.apache.hadoop.mapred.InputSplit[] splits, org.apache.hadoop.mapred.InputFormat informat, org.apache.hadoop.mapred.JobConf job, String delim)` Counts the number of columns in a given collection of csv file splits.
`static int`	`countTokensCSV(String str, String delim)` Counts the number of tokens defined by the given delimiter, respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.
`static void`	`deleteCrcFilesFromLocalFileSystem(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)` Delete the CRC files from the local file system associated with a particular file and its metadata file.
`static <T> T`	`get(Future<T> in)`
`static byte[]`	`getBytes(ByteBuffer buff)`
`static org.apache.hadoop.fs.FileSystem`	`getFileSystem(org.apache.hadoop.conf.Configuration conf)`
`static org.apache.hadoop.fs.FileSystem`	`getFileSystem(org.apache.hadoop.fs.Path fname)`
`static org.apache.hadoop.fs.FileSystem`	`getFileSystem(org.apache.hadoop.fs.Path fname, org.apache.hadoop.conf.Configuration conf)`
`static org.apache.hadoop.fs.FileSystem`	`getFileSystem(String fname)`
`static String`	`getPartFileName(int pos)`
`static org.apache.hadoop.fs.Path[]`	`getSequenceFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file)`
`static int`	`getUTFSize(String value)` Returns the serialized size in bytes of the given string value, following the modified UTF-8 specification as used by Java's DataInput/DataOutput.
`static void`	`intToBa(int val, byte[] ba, int off)`
`static boolean`	`isObjectStoreFileScheme(org.apache.hadoop.fs.Path path)`
`static boolean`	`isSameFileScheme(org.apache.hadoop.fs.Path path1, org.apache.hadoop.fs.Path path2)`
`static void`	`longToBa(long val, byte[] ba, int off)`
`static FileFormatPropertiesMM`	`readAndParseMatrixMarketHeader(String filename)`
`static String[]`	`readMatrixMarketHeader(String filename)`
`static void`	`shortToBa(int val, byte[] ba, int off)`
`static org.apache.hadoop.mapred.InputSplit[]`	`sortInputSplits(org.apache.hadoop.mapred.InputSplit[] splits)`
`static String[]`	`split(String str, String delim)` Splits a string by a specified delimiter into all tokens, including empty.
`static String[]`	`splitByFirst(String str, String delim)`
`static String[]`	`splitCSV(String str, String delim)` Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.
`static String[]`	`splitCSV(String str, String delim, String[] tokens, Set<String> naStrings)` Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.
`static InputStream`	`toInputStream(String input)`
`static String`	`toString(InputStream input)`

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

hiddenFileFilter

public static final org.apache.hadoop.fs.PathFilter hiddenFileFilter

EMPTY_TEXT_LINE

public static final String EMPTY_TEXT_LINE

See Also:: Constant Field Values

LIBSVM_DELIM

public static final String LIBSVM_DELIM

See Also:: Constant Field Values

LIBSVM_INDEX_DELIM

public static final String LIBSVM_INDEX_DELIM

See Also:: Constant Field Values

Constructor Detail
- IOUtilFunctions
```
public IOUtilFunctions()
```

Method Detail

getFileSystem

public static org.apache.hadoop.fs.FileSystem getFileSystem(String fname)
                                                     throws IOException

Throws:: IOException

getFileSystem

public static org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.fs.Path fname)
                                                     throws IOException

Throws:: IOException

getFileSystem

public static org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.conf.Configuration conf)
                                                     throws IOException

Throws:: IOException

getFileSystem

public static org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.fs.Path fname,
                                                            org.apache.hadoop.conf.Configuration conf)
                                                     throws IOException

Throws:: IOException

isSameFileScheme

public static boolean isSameFileScheme(org.apache.hadoop.fs.Path path1,
                                       org.apache.hadoop.fs.Path path2)

isObjectStoreFileScheme

public static boolean isObjectStoreFileScheme(org.apache.hadoop.fs.Path path)

getPartFileName

public static String getPartFileName(int pos)

closeSilently

public static void closeSilently(Closeable io)

closeSilently

public static void closeSilently(org.apache.hadoop.mapred.RecordReader<?,?> rr)

checkAndRaiseErrorCSVEmptyField

public static void checkAndRaiseErrorCSVEmptyField(String row,
                                                   boolean fill,
                                                   boolean emptyFound)
                                            throws IOException

Throws:: IOException

checkAndRaiseErrorCSVNumColumns

public static void checkAndRaiseErrorCSVNumColumns(String fname,
                                                   String line,
                                                   String[] parts,
                                                   long ncol)
                                            throws IOException

Throws:: IOException

split
```
public static String[] split(String str,
                             String delim)
```
Splits a string by a specified delimiter into all tokens, including empty. NOTE: This method is meant as a faster drop-in replacement of the regular string split.

Parameters:

str - string to split

delim - delimiter

Returns:

string array

splitCSV
```
public static String[] splitCSV(String str,
                                String delim)
```
Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.

Parameters:

str - string to split

delim - delimiter

Returns:

string array of tokens

splitCSV
```
public static String[] splitCSV(String str,
                                String delim,
                                String[] tokens,
                                Set<String> naStrings)
```
Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.

Parameters:

str - string to split

delim - delimiter

tokens - array for tokens, length needs to match the number of tokens

naStrings - the strings to map to null value.

Returns:

string array of tokens

countTokensCSV
```
public static int countTokensCSV(String str,
                                 String delim)
```
Counts the number of tokens defined by the given delimiter, respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.

Parameters:

str - string to split

delim - delimiter

Returns:

number of tokens split by the given delimiter

splitByFirst

public static String[] splitByFirst(String str,
                                    String delim)

readAndParseMatrixMarketHeader

public static FileFormatPropertiesMM readAndParseMatrixMarketHeader(String filename)
                                                             throws DMLRuntimeException

Throws:: DMLRuntimeException

readMatrixMarketHeader

public static String[] readMatrixMarketHeader(String filename)

countNnz
```
public static int countNnz(String[] cols)
```
Returns the number of non-zero entries but avoids the expensive string to double parsing. This function is guaranteed to never underestimate.

Parameters:

cols - string array

Returns:

number of non-zeros

countNnz
```
public static int countNnz(String[] cols,
                           int pos,
                           int len)
```
Returns the number of non-zero entries but avoids the expensive string to double parsing. This function is guaranteed to never underestimate.

Parameters:

cols - string array

pos - starting array index

len - ending array index

Returns:

number of non-zeros

getUTFSize
```
public static int getUTFSize(String value)
```
Returns the serialized size in bytes of the given string value, following the modified UTF-8 specification as used by Java's DataInput/DataOutput. see java docs: docs/api/java/io/DataInput.html#modified-utf-8

Parameters:

value - string value

Returns:

string size for modified UTF-8 specification

toInputStream

public static InputStream toInputStream(String input)

toString

public static String toString(InputStream input)
                       throws IOException

Throws:: IOException

sortInputSplits

public static org.apache.hadoop.mapred.InputSplit[] sortInputSplits(org.apache.hadoop.mapred.InputSplit[] splits)

countNumColumnsCSV
```
public static int countNumColumnsCSV(org.apache.hadoop.mapred.InputSplit[] splits,
                                     org.apache.hadoop.mapred.InputFormat informat,
                                     org.apache.hadoop.mapred.JobConf job,
                                     String delim)
                              throws IOException
```
Counts the number of columns in a given collection of csv file splits. This primitive aborts if a row with more than 0 columns is found and hence is robust against empty file splits etc.

Parameters:

splits - input splits

informat - input format

job - job configruation

delim - delimiter

Returns:

the number of columns in the collection of csv file splits

Throws:

IOException - if IOException occurs

getSequenceFilePaths

public static org.apache.hadoop.fs.Path[] getSequenceFilePaths(org.apache.hadoop.fs.FileSystem fs,
                                                               org.apache.hadoop.fs.Path file)
                                                        throws IOException

Throws:: IOException

deleteCrcFilesFromLocalFileSystem
```
public static void deleteCrcFilesFromLocalFileSystem(org.apache.hadoop.fs.FileSystem fs,
                                                     org.apache.hadoop.fs.Path path)
                                              throws IOException
```
Delete the CRC files from the local file system associated with a particular file and its metadata file.

Parameters:

fs - the file system

path - the path to a file

Throws:

IOException - thrown if error occurred attempting to delete crc files

baToShort

public static int baToShort(byte[] ba,
                            int off)

baToInt

public static int baToInt(byte[] ba,
                          int off)

baToLong

public static long baToLong(byte[] ba,
                            int off)

shortToBa

public static void shortToBa(int val,
                             byte[] ba,
                             int off)

intToBa

public static void intToBa(int val,
                           byte[] ba,
                           int off)

longToBa

public static void longToBa(long val,
                            byte[] ba,
                            int off)

getBytes

public static byte[] getBytes(ByteBuffer buff)

get

public static <T> T get(Future<T> in)

Class IOUtilFunctions

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

hiddenFileFilter

EMPTY_TEXT_LINE

LIBSVM_DELIM

LIBSVM_INDEX_DELIM

Constructor Detail

IOUtilFunctions

Method Detail

getFileSystem

getFileSystem

getFileSystem

getFileSystem

isSameFileScheme

isObjectStoreFileScheme

getPartFileName

closeSilently

closeSilently

checkAndRaiseErrorCSVEmptyField

checkAndRaiseErrorCSVNumColumns

split

splitCSV

splitCSV

countTokensCSV

splitByFirst

readAndParseMatrixMarketHeader

readMatrixMarketHeader

countNnz

countNnz

getUTFSize

toInputStream

toString

sortInputSplits

countNumColumnsCSV

getSequenceFilePaths

deleteCrcFilesFromLocalFileSystem

baToShort

baToInt

baToLong

shortToBa

intToBa

longToBa

getBytes

get