public class DataAugmentation extends Object
Constructor and Description |
---|
DataAugmentation() |
Modifier and Type | Method and Description |
---|---|
static FrameBlock |
dataCorruption(FrameBlock input,
double pTypo,
double pMiss,
double pDrop,
double pOut,
double pSwap)
This function returns a new frame block with error introduced in the data:
Typos in string values, null values, outliers in numeric data and swapped elements.
|
static FrameBlock |
miss(FrameBlock frame,
double pMiss,
double pDrop)
This function modifies the given, preprocessed frame block to add missing values to some of the rows,
marking them with the label missing.
|
static FrameBlock |
outlier(FrameBlock frame,
List<Integer> numerics,
double pOut,
double pPos,
int times)
This function modifies the given, preprocessed frame block to add outliers to some
of the numeric data of the frame, adding or several times the standard deviation,
and marking them with the label outlier.
|
static FrameBlock |
preprocessing(FrameBlock frame,
List<Integer> numerics,
List<Integer> strings,
List<Integer> swappable)
This function returns a new frame block with a labels column added, and build the lists
with column index of the different types of data.
|
static FrameBlock |
swap(FrameBlock frame,
List<Integer> swappable,
double pSwap)
This function modifies the given, preprocessed frame block to add swapped fields of the same ValueType
that are consecutive, marking them with the label swap.
|
static FrameBlock |
typos(FrameBlock frame,
List<Integer> strings,
double pTypo)
This function modifies the given, preprocessed frame block to add typos to the string values,
marking them with the label typos.
|
public static FrameBlock dataCorruption(FrameBlock input, double pTypo, double pMiss, double pDrop, double pOut, double pSwap)
input
- Original frame blockpTypo
- Probability of introducing a typo in a rowpMiss
- Probability of introducing missing values in a rowpDrop
- Probability of dropping a value inside a rowpOut
- Probability of introducing outliers in a rowpSwap
- Probability swapping two elements in a rowpublic static FrameBlock preprocessing(FrameBlock frame, List<Integer> numerics, List<Integer> strings, List<Integer> swappable)
frame
- Original frame blocknumerics
- Empty list to return the numeric positionsstrings
- Empty list to return the string positionsswappable
- Empty list to return the swappable positionspublic static FrameBlock typos(FrameBlock frame, List<Integer> strings, double pTypo)
frame
- Original frame blockstrings
- List with the columns of string type that can be changed, generated during preprocessing or manually selectedpTypo
- Probability of adding a typo to a rowpublic static FrameBlock miss(FrameBlock frame, double pMiss, double pDrop)
frame
- Original frame blockpMiss
- Probability of adding missing values to a rowpDrop
- Probability of dropping a valuepublic static FrameBlock outlier(FrameBlock frame, List<Integer> numerics, double pOut, double pPos, int times)
frame
- Original frame blocknumerics
- List with the columns of numeric type that can be changed, generated during preprocessing or manually selectedpOut
- Probability of introducing an outlier in a rowpPos
- Probability of using positive deviationtimes
- Times the standard deviation is addedpublic static FrameBlock swap(FrameBlock frame, List<Integer> swappable, double pSwap)
frame
- Original frame blockswappable
- List with the columns that are swappable, generated during preprocessingpSwap
- Probability of swapping two fields in a rowCopyright © 2020 The Apache Software Foundation. All rights reserved.