The function takes in the results from the cms_clusters function (frequency distributions of each run of the 80 percent subsampling of the training data) and calculates #' general attributes as well as graphical distribution outputs.
cms( raw = NULL, runs = NULL, idvariable = NULL, emptysize = NULL, setsize = NULL, variables = NULL, maxPC = 1:4, clusters = 3, seeding = TRUE, showplot = TRUE, legendpos = "top", verbose = FALSE )
raw | raw data input (filter the group variables first; each group requires an extra analysis) |
---|---|
runs | the number of repeats (for feature aggregation) |
idvariable | id variable name of the subjects (e.g. animal_id) |
emptysize | fraction of data that limits the imputation threshold (e.g. everything below emptysize=0.2 will be imputed; avoid when possible) |
setsize | fraction size of the subsets (e.g., setsize=0.8 means that in each run 80 percent of the data are randomly chosen to do the cms) |
variables | explicitly name the variables that shall be included in the cms analysis (yes, all of them!) |
maxPC | the maximum number of principal components (or range) that shall be evaluated (remember: this number must be smaller or equal as the number of analyzed variables). maxPC=2 looks at PC only, while maxPC=1:4 covers the range of PC1 to PC4. |
clusters | the number of clusters that shall be applied to the cms analysis |
seeding | sets the seeding constant (TRUE) or not (FALSE) |
showplot | show the distribution plot of the variables after cms analysis |
legendpos | legend position (shift to right when there are many vars) |
verbose | sets the verbosity on variable handling during the calculation (default=FALSE) |
List with feature frequency information and other useful information.