Preference tests are a valuable tool to measure the “wants” of individuals and have been proven to be a valid method to rate different commodities. The number of commodities presented at the same time is, however, limited and in classical test settings usually, only two options are presented. In our paper (add reference), we evaluate the option of combining multiple binary choices to rank preferences among a larger number of commodities. The simsalRbim package offers the necessarytools to test selections of commodities and to obtain an estimate of new or incompletely tested items and their relative position.
The package contains an artificial set of data, ZickeZacke, which we use as an example to show the functionality of the package. In addition experimental data from six different preference tests can be downloaded from a separate GitHub repository and can be used directly in this package (e.g., with the
simsalRbim was developed on R (v4.0.3). It depends on the following packages (in no particular order, excluding R base packages) and some of them may have to be installed manually. (If you did not yet install the package devtools, remove the hashtag in the first line.)
The following function can be used to install single packages - or just the missing ones from CRAN.
The development version of the simsalRbim package can be downloaded from GitHub with the following command.
# install.packages("devtools") devtools::install_github("mytalbot/simsalRbim@main") library(simsalRbim)
A CRAN version may be available soon.
Data from a series of binary preference tests may be incomplete, may show ties or new item positions shall be tested without the need of testing ALL possible combinations. The simsalRbim package can address all three cases by using informed and uninformed simulations to find the optimal item positions in the data. Uninformed or informed simulations refer to whether there is knowledge about intransitivity in the ranking. For details see the simsalRbim Examples Vignette.
Preference test data usually come in two formats: quantities (i.e., a subject consumes 200 ml of orange juice but only 17 ml of milk) or binary data (i.e., “I like Star Trek better than Star Wars”). The
bimload function in the package can load both formats, including the side the item is presented at as a test variable. The side information can be examined with the
bimbalance function. However, if a preference test is showing significant side dependencies (i.e., milk is preferred over orange juice if it is presented in a bottle positioned on the left side but not if it is presented in the right bottle), the simulated outcome will be biased. The current package can highlight such experimental imbalances, which should be addressed using a different experimental design.
In case of present ties (A = B), the binary response variable is randomized. By default this happens at a 50% likelihood for an equal commodity selection. However, the user may specify other thresholds for considering a case as a tie, for example, a preference can only be deemed meaningful if one item is chosen at >65%.
Missing item tests and combinations are filled as ties and then simulated in the two simulation functions. In the uninformed simulation, data are continuously randomized and analyzed. With each additional simulation, the number of degrees of freedom decreases and the rank position of the items will become more confident, At a given number of degrees of freedom this will stabilize the 95% confidence intervals on the worth scale. Thus, item positions can be obtained with a reasonable confidence without having to include tests on all possible item combinations. If you assume that the data is following the supposition of transitivity (i.e., if A>B and B>C then A>C) then the informed simulation can help to derive a ranking with higher confidence. In the informed simulation, the number of intransitive relationships within the simulated rank order is also calculated. The number of intransitive relationships in the results can be limited to any percentage between 0% and 100%. Thereby, the item of interest can be placed on the worth scale at much higher confidence.
The package includes example data (ZickeZacke) that can be used to explore the packages functions and gives an example of how the input data must be formatted. Please note, that the item ‘HoiHoiHoi’ deliberately introduces variance in the data as it is (i) incomplete, (ii) introduces a tie, and (iii) introduces an intransitive relationship. Without this item it would be perfectly balanced and complete data which do not need a simulation.
head(simsalRbim::ZickeZacke) #> subjectID optionA optionB quantityA quantityB #> 1 eins Zicke Zacke 50 40 #> 2 zwei Zicke Zacke 30 10 #> 3 drei Zicke Zacke 80 5 #> 4 vier Zicke Zacke 66 15 #> 5 eins Huehner Kacke 30 20 #> 6 zwei Huehner Kacke 50 40
The input table must have the following columns:
subjectID - a unique identifier for each subject in the preference test
optionA - item A
optionB - item B
quantityA - quantitiy of item A (can also be binary 0 or 1)
quantityB - quantitiy of item B (can also be binary 0 or 1)
side - side of option A (this column is optional)
The function takes a
filename (*.txt) with quantitative or binary test data in the format shown the Data section above.
# bimpre - will take care of the data preprocessing. bimpre(dat=NULL, GT=NULL, simOpt=NULL, deviation=0, minQuantity=0, verbose=TRUE)
The function takes the data object (
bimload, a parameter defining the ground truth (
GT) as well as the item that requires simulation (
minQuantity arguments can be used to modify the threshold of the tie selection [e.g, 50% + deviation; with deviation = amount of additional deviation in percent (i.e, 50% + 5% deviation = 55%)].
# bimworth is the central function of the package doing the worth calculations. bimworth(ydata=NULL, GT=NULL, simOpt=NULL, randOP=FALSE, intrans=FALSE, showPlot=FALSE, ylim=c(0,0.8), size = 5, verbose=FALSE)
ydata object takes the output from the preprocessing function (
bimpre) as well as the ground truth (
GT) and the simulated item (
simOpt). With the
randOP argument the randomization process can be controlled. If randOP=TRUE (default; if FALSE, the random seed will be fixed) ties are randomized each time the function is executed. The
intrans argument controls whether the intransitivity of the items shall be computed.
# bimeval evaluates the consensus error in the worth calculations bimeval(ydata=NULL, GT=NULL, simOpt=NULL, worth= NULL, coverage=0.8, showPlot=FALSE, filtersim=FALSE, title="Consensus Analysis",subtitle=NULL, ylim=c(0,1))
The first three input objects are the same as in
bimworth. Additionally, the function requires the output from
bimworth for the evaluation. Be aware that the
bimworth output can be a list when
coverage object defines an arbitrary threshold (default=0.8, ratio of tested subjects/total subjects) for data coverage warnings. There can be two warnings: number of subjects and number of items warnings. The graphical output can be controlled with the
# bimUninformed does an uninformed simulation of item positions to find a # cutoff. bimUninformed(ydata=NULL, GT=NULL, simOpt=NULL, limitToRun=5, seed=TRUE, showPlot=TRUE, ylim=c(-0.5,1.5) )
The function has the same first three input parameters as
bimworth. In addition, the
limitToRun object controls the number of randomizations in the worth calculations. The seeding can be set constant (
seed). The output indicates an optimal cutoff for the number of required randomizations, e.g., for the separation of items, and, therefore, potential item positioning.
The function also plots the results with 95% confidence intervals of the adjusted p-values, derived from a post-hoc ANOVA with multiple comparisons (Tukey) test. There will be perfect item separation when the plotted value reaches zero. Any remaining variance indicates potential overlaps in the 95% confidence intervals. A less stringent (but less confident) threshold can be obtained when the upper level of the 95% confidence intervals falls below (0.05, see green points in the plot). This may be useful when calculation times are long. There will be a warning, if the function did not converge to a threshold.
# bimpos - calculates item positions at a discrete number of randomizations. bimpos(ydata=NULL, GT=NULL, simOpt=NULL, limitToRun=5, showPlot=TRUE)
This function is very similar to
bimUninformed and uses the same input arguments. Contrary to
bimUninformed, however, it will not explore randomization space but rather uses a distinct cutoff to show the items’ relative position. The query item (
simOpt) is shown in “red” in the plot. All items will be shown on the worth scale, together with 95% confidence intervals. Any overlap indicates potential ambiguity in item positioning.
# bimsim - informed position simulation bimsim(rawdat=NULL, GT=GT, simOpt=simOpt, filter.crit="CE", limitToRun=5, tcut=0.9, deviation=0, minQuantity=0, seed=TRUE, showPlot=TRUE,ylim=c(0,0.7))
bimsim function performs an informed position simulation of variable preference test data. The
limitToRun argument from the
bimUninformed output can be used as an estimate for the required number of randomizations. The
tcut argument sets the cutoff for showing the consensus error (%, CE) of the simulated results in the plot OR the transitivity ratios (1-Iratio; with Iratio indicating the number of intransitive triplets per total number of triplets). The selection depends on the
filter.crit parameter (“CE” or “Iratio”), i.e., tcut=0.95 results in a 95% cutoff. The more stringent (= higher tcut in case of transitivity ratios) the criterion, the more informed the simulation becomes. The output is a frequency distribution of
simOpts preferred position at the given
tcut. The plot shows the worth value of the
simOpt item as a function of its position. The bubblesizes code for the consensus error/transitivity ratio.
# simsim(data = NULL, simOpt = NULL, GT = NULL, seeding = TRUE, runs = NULL, truepos = NULL, verbose = FALSE, path = NULL)
simsim function will simulate an item (
simOpt) against any combinations given in the Ground Truth (
GT). A constant seeding ensures that the results remain stable. The number of simulation runs can be set with the
runs object. However, this number is data-dependent and, therefore, a heuristic. When the true position of the simulated item is known (or shall be tested), the
truepos object is used to indicate that position. The
verbose object silences outputs from the
bimpre function. When the
path is given, the resulting table from the simulation will be stored as a *.txt file.
bimristics functions gives an overview about conditional data characteristics. The
simOpt object determines the subgroup for which the conditions are checked (e.g., the number of tested subjects, the number of natural ties, the number of simulated items, etc.)
bimbalance determines how balanced the items were tested in terms of side. Items can be presented left (L) or right (R) - or in some other binary logic. If the data are somehow tested in an imbalanced way, this may introduce bias. For each (included!) item combination, the output shows the number of counts on the right (R) and left (L) side. Also, there is a ratio output which is obtained by dividing the counts(R) by counts(L).
1. Wickham H, François R, Henry L, Müller K: Dplyr: A grammar of data manipulation. 2020.
2. Wickham H: Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York; 2016.
3. Turner H, Firth D: Generalized nonlinear models in r: An overview of the gnm package. 2020.
4. Hatzinger R, Dittrich R: prefmod: An R Package for Modeling Preferences Based on Paired Comparisons, Rankings, or Ratings. Journal of Statistical Software 2012, 48:1–31.
5. Wickham H: Reshaping data with the reshape package. Journal of Statistical Software 2007, 21:1–20.
6. Henry L, Wickham H: Rlang: Functions for base types and core r and ’tidyverse’ features. 2020.
7. Wickham H: Stringr: Simple, consistent wrappers for common string operations. 2019.
8. Garnier S: Viridis: Default color maps from ’matplotlib’. 2018.
9. Slowikowski K: Ggrepel: Automatically position non-overlapping text labels with ’ggplot2’. 2021.