Class DataPartitionerLocal
- java.lang.Object
-
- org.apache.sysds.runtime.controlprogram.parfor.DataPartitioner
-
- org.apache.sysds.runtime.controlprogram.parfor.DataPartitionerLocal
-
public class DataPartitionerLocal extends DataPartitioner
Partitions a given matrix into row or column partitions with a two pass-approach. In the first phase the input matrix is read from HDFS and sorted into block partitions in a staging area in the local file system according to the partition format. In order to allow for scalable partitioning, we process one block at a time. Furthermore, in the second phase, all blocks of a partition are append to a sequence file on HDFS. Block-wise partitioning and write-once semantics of sequence files require the indirection over the local staging area. For scalable computation, we process one sequence file at a time. NOTE: For the resulting partitioned matrix, we store block and cell indexes wrt partition boundaries. This means that the partitioned matrix CANNOT be read as a traditional matrix because there are for example multiple blocks with same index (while the actual index is encoded in the path). In order to enable full read of partition matrices, data converter would need to handle negative row/col offsets for partitioned read. Currently not done in order to avoid overhead from normal read and since partitioning only applied if exclusively indexed access.
-
-
Constructor Summary
Constructors Constructor Description DataPartitionerLocal(ParForProgramBlock.PartitionFormat dpf, int par)
DataPartitionerLocal constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
writeBinaryBlockSequenceFileToHDFS(org.apache.hadoop.mapred.JobConf job, String dir, String lpdir, boolean threadsafe)
void
writeBinaryCellSequenceFileToHDFS(org.apache.hadoop.mapred.JobConf job, String dir, String lpdir)
void
writeTextCellFileToHDFS(org.apache.hadoop.mapred.JobConf job, String dir, String lpdir)
-
Methods inherited from class org.apache.sysds.runtime.controlprogram.parfor.DataPartitioner
createPartitionedMatrixObject, createPartitionedMatrixObject, createPartitionedMatrixObject, createReuseMatrixBlock, disableBinaryCell
-
-
-
-
Constructor Detail
-
DataPartitionerLocal
public DataPartitionerLocal(ParForProgramBlock.PartitionFormat dpf, int par)
DataPartitionerLocal constructor.- Parameters:
dpf
- data partitionformatpar
- -1 for serial otherwise number of threads, can be ignored by implementation
-
-
Method Detail
-
writeBinaryBlockSequenceFileToHDFS
public void writeBinaryBlockSequenceFileToHDFS(org.apache.hadoop.mapred.JobConf job, String dir, String lpdir, boolean threadsafe) throws IOException
- Throws:
IOException
-
writeBinaryCellSequenceFileToHDFS
public void writeBinaryCellSequenceFileToHDFS(org.apache.hadoop.mapred.JobConf job, String dir, String lpdir) throws IOException
- Throws:
IOException
-
writeTextCellFileToHDFS
public void writeTextCellFileToHDFS(org.apache.hadoop.mapred.JobConf job, String dir, String lpdir) throws IOException
- Throws:
IOException
-
-