| Class | Description |
|---|---|
| NonSparseToSparse |
An instance filter that converts all incoming
instances into sparse format.
|
| Randomize |
Randomly shuffles the order of instances passed
through it.
|
| RemoveDuplicates |
Removes all duplicate instances from the first batch of data it receives.
|
| RemoveFolds |
This filter takes a dataset and outputs a specified
fold for cross validation.
|
| RemoveFrequentValues |
Determines which values (frequent or infrequent
ones) of an (nominal) attribute are retained and filters the instances
accordingly.
|
| RemoveMisclassified |
A filter that removes instances which are
incorrectly classified.
|
| RemovePercentage |
A filter that removes a given percentage of a
dataset.
|
| RemoveRange |
A filter that removes a given range of instances of
a dataset.
|
| RemoveWithValues |
Filters instances according to the value of an
attribute.
|
| Resample |
Produces a random subsample of a dataset using
either sampling with replacement or without replacement.
|
| ReservoirSample |
Produces a random subsample of a dataset using the
reservoir sampling Algorithm "R" by Vitter.
|
| SparseToNonSparse |
An instance filter that converts all incoming sparse instances into non-sparse format.
|
| SubsetByExpression |
* Filters instances according to a user-specified expression.
* * Grammar: * * boolexpr_list ::= boolexpr_list boolexpr_part | boolexpr_part; * * boolexpr_part ::= boolexpr:e {: parser.setResult(e); :} ; * * boolexpr ::= BOOLEAN * | true * | false * | expr < expr * | expr <= expr * | expr > expr * | expr >= expr * | expr = expr * | ( boolexpr ) * | not boolexpr * | boolexpr and boolexpr * | boolexpr or boolexpr * | ATTRIBUTE is STRING * | ATTRIBUTE regexp STRING * ; * * expr ::= NUMBER * | ATTRIBUTE * | ( expr ) * | opexpr * | funcexpr * ; * * opexpr ::= expr + expr * | expr - expr * | expr * expr * | expr / expr * ; * * funcexpr ::= abs ( expr ) * | sqrt ( expr ) * | log ( expr ) * | exp ( expr ) * | sin ( expr ) * | cos ( expr ) * | tan ( expr ) * | rint ( expr ) * | floor ( expr ) * | pow ( expr for base , expr for exponent ) * | ceil ( expr ) * ; * * Notes: * - NUMBER * any integer or floating point number * (but not in scientific notation!) * - STRING * any string surrounded by single quotes; * the string may not contain a single quote though. * - ATTRIBUTE * the following placeholders are recognized for * attribute values: * - CLASS for the class value in case a class attribute is set. * - ATTxyz with xyz a number from 1 to # of attributes in the * dataset, representing the value of indexed attribute. * - regexp * A regular expression for pattern matching, e.g., '^id.*$' * * Examples: * - extracting only mammals and birds from the 'zoo' UCI dataset: * (CLASS is 'mammal') or (CLASS is 'bird') * - extracting only animals with at least 2 legs from the 'zoo' UCI dataset: * (ATT14 >= 2) * - extracting only instances with non-missing 'wage-increase-second-year' * from the 'labor' UCI dataset: * not ismissing(ATT3) * * Valid options are: * * |