Class TokenizerBuilderNgram
- java.lang.Object
-
- org.apache.sysds.runtime.transform.tokenize.builder.TokenizerBuilder
-
- org.apache.sysds.runtime.transform.tokenize.builder.TokenizerBuilderWhitespaceSplit
-
- org.apache.sysds.runtime.transform.tokenize.builder.TokenizerBuilderNgram
-
- All Implemented Interfaces:
Serializable
public class TokenizerBuilderNgram extends TokenizerBuilderWhitespaceSplit
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description int
maxGram
int
minGram
org.apache.sysds.runtime.transform.tokenize.builder.TokenizerBuilderNgram.NgramType
ngramType
-
Fields inherited from class org.apache.sysds.runtime.transform.tokenize.builder.TokenizerBuilderWhitespaceSplit
regex
-
-
Constructor Summary
Constructors Constructor Description TokenizerBuilderNgram(int[] idCols, int tokenizeCol, org.apache.wink.json4j.JSONObject params)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
createInternalRepresentation(FrameBlock in, DocumentRepresentation[] internalRepresentation, int rowStart, int blk)
List<Token>
splitIntoNgrams(Token token, int minGram, int maxGram)
-
Methods inherited from class org.apache.sysds.runtime.transform.tokenize.builder.TokenizerBuilderWhitespaceSplit
splitToTokens
-
Methods inherited from class org.apache.sysds.runtime.transform.tokenize.builder.TokenizerBuilder
createInternalRepresentation, getTasks
-
-
-
-
Method Detail
-
createInternalRepresentation
public void createInternalRepresentation(FrameBlock in, DocumentRepresentation[] internalRepresentation, int rowStart, int blk)
- Overrides:
createInternalRepresentation
in classTokenizerBuilderWhitespaceSplit
-
-