Class ColumnEncoder
- java.lang.Object
 - 
- org.apache.sysds.runtime.transform.encode.ColumnEncoder
 
 
- 
- All Implemented Interfaces:
 Externalizable,Serializable,Comparable<ColumnEncoder>,Encoder
- Direct Known Subclasses:
 ColumnEncoderBagOfWords,ColumnEncoderBin,ColumnEncoderComposite,ColumnEncoderDummycode,ColumnEncoderFeatureHash,ColumnEncoderPassThrough,ColumnEncoderRecode,ColumnEncoderUDF,ColumnEncoderWordEmbedding
public abstract class ColumnEncoder extends Object implements Encoder, Comparable<ColumnEncoder>
Base class for all transform encoders providing both a row and block interface for decoding frames to matrices.- See Also:
 - Serialized Form
 
 
- 
- 
Nested Class Summary
Nested Classes Modifier and Type Class Description static classColumnEncoder.EncoderType 
- 
Field Summary
Fields Modifier and Type Field Description static intAPPLY_ROW_BLOCKS_PER_COLUMNstatic intBUILD_ROW_BLOCKS_PER_COLUMN 
- 
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description MatrixBlockapply(CacheBlock<?> in, MatrixBlock out, int outputCol)Apply Functions are only used in Single Threaded or Multi-Threaded Dense context.MatrixBlockapply(CacheBlock<?> in, MatrixBlock out, int outputCol, int rowStart, int blk)voidbuild(CacheBlock<?> in, double[] equiHeightMaxs)voidbuild(CacheBlock<?> in, Map<Integer,double[]> equiHeightMaxs)voidbuildPartial(FrameBlock in)Partial build of internal data structures (e.g., in distributed spark operations).intcompareTo(ColumnEncoder o)voidcomputeMapSizeEstimate(CacheBlock<?> in, int[] sampleIndices)List<DependencyTask<?>>getApplyTasks(CacheBlock<?> in, MatrixBlock out, int outputCol, int[] sparseRowPointerOffsets)Callable<Object>getBuildTask(CacheBlock<?> in)List<DependencyTask<?>>getBuildTasks(CacheBlock<?> in)intgetColID()MatrixBlockgetColMapping(FrameBlock meta)Obtain the column mapping of encoded frames based on the passed meta data frame.intgetDomainSize()longgetEstMetaSize()intgetEstNumDistincts()Callable<Object>getPartialBuildTask(CacheBlock<?> in, int startRow, int blockSize, HashMap<Integer,Object> ret, int p)Callable<Object>getPartialMergeBuildTask(HashMap<Integer,?> ret)voidinitEmbeddings(MatrixBlock embeddings)booleanisApplicable()Indicates if this encoder is applicable, i.e, if there is a column to encode.booleanisApplicable(int colID)Indicates if this encoder is applicable for the given column ID, i.e., if it is subject to this transformation.voidmergeAt(ColumnEncoder other)Merges another encoder, of a compatible type, in after a certain position.voidprepareBuildPartial()Allocates internal data structures for partial build.voidreadExternal(ObjectInput in)Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.voidsetColID(int colID)voidsetEstMetaSize(long estSize)voidsetEstNumDistincts(int numDistincts)voidshiftCol(int columnOffset)voidupdateIndexRanges(long[] beginDims, long[] endDims, int colOffset)Update index-ranges to after encoding.voidwriteExternal(ObjectOutput os)Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.- 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
- 
Methods inherited from interface org.apache.sysds.runtime.transform.encode.Encoder
allocateMetaData, build, getMetaData, initMetaData 
 - 
 
 - 
 
- 
- 
Method Detail
- 
initEmbeddings
public void initEmbeddings(MatrixBlock embeddings)
 
- 
apply
public MatrixBlock apply(CacheBlock<?> in, MatrixBlock out, int outputCol)
Apply Functions are only used in Single Threaded or Multi-Threaded Dense context. That's why there is no regard for MT sparse! 
- 
apply
public MatrixBlock apply(CacheBlock<?> in, MatrixBlock out, int outputCol, int rowStart, int blk)
 
- 
isApplicable
public boolean isApplicable()
Indicates if this encoder is applicable, i.e, if there is a column to encode.- Returns:
 - true if a colID is set
 
 
- 
isApplicable
public boolean isApplicable(int colID)
Indicates if this encoder is applicable for the given column ID, i.e., if it is subject to this transformation.- Parameters:
 colID- column ID- Returns:
 - true if encoder is applicable for given column
 
 
- 
prepareBuildPartial
public void prepareBuildPartial()
Allocates internal data structures for partial build.- Specified by:
 prepareBuildPartialin interfaceEncoder
 
- 
getDomainSize
public int getDomainSize()
 
- 
buildPartial
public void buildPartial(FrameBlock in)
Partial build of internal data structures (e.g., in distributed spark operations).- Specified by:
 buildPartialin interfaceEncoder- Parameters:
 in- input frame block
 
- 
build
public void build(CacheBlock<?> in, double[] equiHeightMaxs)
 
- 
build
public void build(CacheBlock<?> in, Map<Integer,double[]> equiHeightMaxs)
 
- 
mergeAt
public void mergeAt(ColumnEncoder other)
Merges another encoder, of a compatible type, in after a certain position. Resizes as necessary.ColumnEncodersare compatible with themselves andEncoderCompositeis compatible with every otherColumnEncoders.MultiColumnEncodersare compatible with every encoder- Parameters:
 other- the encoder that should be merged in
 
- 
updateIndexRanges
public void updateIndexRanges(long[] beginDims, long[] endDims, int colOffset)Update index-ranges to after encoding. Note that only Dummycoding changes the ranges.- Specified by:
 updateIndexRangesin interfaceEncoder- Parameters:
 beginDims- begin dimensions of rangeendDims- end dimensions of rangecolOffset- is applied to begin and endDims
 
- 
getColMapping
public MatrixBlock getColMapping(FrameBlock meta)
Obtain the column mapping of encoded frames based on the passed meta data frame.- Parameters:
 meta- meta data frame block- Returns:
 - matrix with column mapping (one row per attribute)
 
 
- 
writeExternal
public void writeExternal(ObjectOutput os) throws IOException
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.- Specified by:
 writeExternalin interfaceExternalizable- Parameters:
 os- object output- Throws:
 IOException- if IOException occurs
 
- 
readExternal
public void readExternal(ObjectInput in) throws IOException
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.- Specified by:
 readExternalin interfaceExternalizable- Parameters:
 in- object input- Throws:
 IOException- if IOException occur
 
- 
getColID
public int getColID()
 
- 
setColID
public void setColID(int colID)
 
- 
shiftCol
public void shiftCol(int columnOffset)
 
- 
setEstMetaSize
public void setEstMetaSize(long estSize)
 
- 
getEstMetaSize
public long getEstMetaSize()
 
- 
setEstNumDistincts
public void setEstNumDistincts(int numDistincts)
 
- 
getEstNumDistincts
public int getEstNumDistincts()
 
- 
computeMapSizeEstimate
public void computeMapSizeEstimate(CacheBlock<?> in, int[] sampleIndices)
 
- 
compareTo
public int compareTo(ColumnEncoder o)
- Specified by:
 compareToin interfaceComparable<ColumnEncoder>
 
- 
getBuildTasks
public List<DependencyTask<?>> getBuildTasks(CacheBlock<?> in)
 
- 
getBuildTask
public Callable<Object> getBuildTask(CacheBlock<?> in)
 
- 
getPartialBuildTask
public Callable<Object> getPartialBuildTask(CacheBlock<?> in, int startRow, int blockSize, HashMap<Integer,Object> ret, int p)
 
- 
getApplyTasks
public List<DependencyTask<?>> getApplyTasks(CacheBlock<?> in, MatrixBlock out, int outputCol, int[] sparseRowPointerOffsets)
 
 - 
 
 -