distinctCount
static int distinctCount(int numVals,
int[] frequencies,
int[] freqCounts,
int nRows,
int sampleSize)
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, and Lynne Stokes. 1995. Sampling-Based Estimation of the Number
of Distinct Values of an Attribute. VLDB'95, Section 5.2, recommended estimator by the authors
- Parameters:
numVals - The number of unique values in the sample
frequencies - The Frequencies of the different unique values
freqCounts - The inverse histogram of frequencies. counts extracted
nRows - The original number of rows in the entire input
sampleSize - The number of rows in the sample
- Returns:
- an estimation of number of distinct values.