The function takes in the results from the cms_clusters function (frequency distributions of each run of the 80 percent subsampling of the training data) and calculates #' general attributes as well as graphical distribution outputs.

cms(
  raw = NULL,
  runs = NULL,
  idvariable = NULL,
  emptysize = NULL,
  setsize = NULL,
  variables = NULL,
  maxPC = 1:4,
  clusters = 3,
  seeding = TRUE,
  showplot = TRUE,
  legendpos = "top",
  verbose = FALSE
)

Arguments

raw

raw data input (filter the group variables first; each group requires an extra analysis)

runs

the number of repeats (for feature aggregation)

idvariable

id variable name of the subjects (e.g. animal_id)

emptysize

fraction of data that limits the imputation threshold (e.g. everything below emptysize=0.2 will be imputed; avoid when possible)

setsize

fraction size of the subsets (e.g., setsize=0.8 means that in each run 80 percent of the data are randomly chosen to do the cms)

variables

explicitly name the variables that shall be included in the cms analysis (yes, all of them!)

maxPC

the maximum number of principal components (or range) that shall be evaluated (remember: this number must be smaller or equal as the number of analyzed variables). maxPC=2 looks at PC only, while maxPC=1:4 covers the range of PC1 to PC4.

clusters

the number of clusters that shall be applied to the cms analysis

seeding

sets the seeding constant (TRUE) or not (FALSE)

showplot

show the distribution plot of the variables after cms analysis

legendpos

legend position (shift to right when there are many vars)

verbose

sets the verbosity on variable handling during the calculation (default=FALSE)

Value

List with feature frequency information and other useful information.