`vignettes/simsalRbim_vignette.Rmd`

`simsalRbim_vignette.Rmd`

Preference tests are a valuable tool to measure the “wants” of individuals and have been proven to be a valid method to rate different commodities. The number of commodities presented at the same time is, however, limited and in classical test settings usually, only two options are presented. In our paper (**add reference**), we evaluate the option of combining multiple binary choices to rank preferences among a larger number of commodities. The **simsalRbim** package offers the necessarytools to test selections of commodities and to obtain an estimate of new or incompletely tested items and their relative position.

The package contains an artificial set of data, ZickeZacke, which we use as an example to show the functionality of the package. In addition experimental data from six different preference tests can be downloaded from a separate GitHub repository and can be used directly in this package (e.g., with the `bimload`

function).

**simsalRbim** was developed on R (v4.0.3). It depends on the following packages (in no particular order, excluding R base packages) and some of them may have to be installed manually. (If you did not yet install the package devtools, remove the hashtag in the first line.)

“dplyr”[1]

“ggplot2”[2]

“gnm”[3]

“prefmod”[4]

“reshape2”[5]

“rlang”[6]

“stringr”[7]

“viridis”[8]

“ggrepel”[9]

The following function can be used to install single packages - or just the missing ones from CRAN.

```
install.packages("paste missing package name here")
```

The development version of the simsalRbim package can be downloaded from GitHub with the following command.

```
# install.packages("devtools")
devtools::install_github("mytalbot/simsalRbim@main")
library(simsalRbim)
```

A CRAN version may be available soon.

Data from a series of binary preference tests may be incomplete, may show ties or new item positions shall be tested without the need of testing ALL possible combinations. The simsalRbim package can address all three cases by using informed and uninformed simulations to find the optimal item positions in the data. Uninformed or informed simulations refer to whether there is knowledge about intransitivity in the ranking. For details see the simsalRbim Examples Vignette.

Preference test data usually come in two formats: quantities (i.e., a subject consumes 200 ml of orange juice but only 17 ml of milk) or binary data (i.e., “I like Star Trek better than Star Wars”). The `bimload`

function in the package can load both formats, including the side the item is presented at as a test variable. The side information can be examined with the `bimbalance`

function. However, if a preference test is showing significant side dependencies (i.e., milk is preferred over orange juice if it is presented in a bottle positioned on the left side but not if it is presented in the right bottle), the simulated outcome will be biased. The current package can highlight such experimental imbalances, which should be addressed using a different experimental design.

In case of present ties (A = B), the binary response variable is randomized. By default this happens at a 50% likelihood for an equal commodity selection. However, the user may specify other thresholds for considering a case as a tie, for example, a preference can only be deemed meaningful if one item is chosen at >65%.

Missing item tests and combinations are filled as ties and then simulated in the two simulation functions. In the uninformed simulation, data are continuously randomized and analyzed. With each additional simulation, the number of degrees of freedom decreases and the rank position of the items will become more confident, At a given number of degrees of freedom this will stabilize the 95% confidence intervals on the worth scale. Thus, item positions can be obtained with a reasonable confidence without having to include tests on all possible item combinations. If you assume that the data is following the supposition of transitivity (i.e., if A>B and B>C then A>C) then the informed simulation can help to derive a ranking with higher confidence. In the informed simulation, the number of intransitive relationships within the simulated rank order is also calculated. The number of intransitive relationships in the results can be limited to any percentage between 0% and 100%. Thereby, the item of interest can be placed on the worth scale at much higher confidence.

The package includes example data (ZickeZacke) that can be used to explore the packages functions and gives an example of how the input data must be formatted. Please note, that the item ‘HoiHoiHoi’ deliberately introduces variance in the data as it is (i) incomplete, (ii) introduces a tie, and (iii) introduces an intransitive relationship. Without this item it would be perfectly balanced and complete data which do not need a simulation.

```
head(simsalRbim::ZickeZacke)
#> subjectID optionA optionB quantityA quantityB
#> 1 eins Zicke Zacke 50 40
#> 2 zwei Zicke Zacke 30 10
#> 3 drei Zicke Zacke 80 5
#> 4 vier Zicke Zacke 66 15
#> 5 eins Huehner Kacke 30 20
#> 6 zwei Huehner Kacke 50 40
```

The input table must have the following columns:

**subjectID** - a unique identifier for each subject in the preference test**optionA** - item A**optionB** - item B**quantityA** - quantitiy of item A (can also be binary 0 or 1)**quantityB** - quantitiy of item B (can also be binary 0 or 1)**side** - side of option A (this column is optional)

```
# bimload - load function allows easy loading of data.
bimload(filename)
```

The function takes a `filename`

(*.txt) with quantitative or binary test data in the format shown the Data section above.

```
# bimpre - will take care of the data preprocessing.
bimpre(dat=NULL, GT=NULL, simOpt=NULL, deviation=0, minQuantity=0,
verbose=TRUE)
```

The function takes the data object (`dat`

) from `bimload`

, a parameter defining the ground truth (`GT`

) as well as the item that requires simulation (`simOpt`

). The `deviation`

and `minQuantity`

arguments can be used to modify the threshold of the tie selection [e.g, 50% + deviation; with deviation = amount of additional deviation in percent (i.e, 50% + 5% deviation = 55%)].

```
# bimworth is the central function of the package doing the worth calculations.
bimworth(ydata=NULL, GT=NULL, simOpt=NULL, randOP=FALSE, intrans=FALSE,
showPlot=FALSE, ylim=c(0,0.8), size = 5, verbose=FALSE)
```

The `ydata`

object takes the output from the preprocessing function (`bimpre`

) as well as the ground truth (`GT`

) and the simulated item (`simOpt`

). With the `randOP`

argument the randomization process can be controlled. If randOP=TRUE (default; if FALSE, the random seed will be fixed) ties are randomized each time the function is executed. The `intrans`

argument controls whether the intransitivity of the items shall be computed.

```
# bimeval evaluates the consensus error in the worth calculations
bimeval(ydata=NULL, GT=NULL, simOpt=NULL, worth= NULL, coverage=0.8,
showPlot=FALSE, filtersim=FALSE, title="Consensus Analysis",subtitle=NULL,
ylim=c(0,1))
```

The first three input objects are the same as in `bimworth`

. Additionally, the function requires the output from `bimworth`

for the evaluation. Be aware that the `bimworth`

output can be a list when `intrans=TRUE`

. The `coverage`

object defines an arbitrary threshold (default=0.8, ratio of tested subjects/total subjects) for data coverage warnings. There can be two warnings: number of subjects and number of items warnings. The graphical output can be controlled with the `showPlot`

object.

```
# bimUninformed does an uninformed simulation of item positions to find a
# cutoff.
bimUninformed(ydata=NULL, GT=NULL, simOpt=NULL, limitToRun=5, seed=TRUE,
showPlot=TRUE, ylim=c(-0.5,1.5) )
```

The function has the same first three input parameters as `bimworth`

. In addition, the `limitToRun`

object controls the number of randomizations in the worth calculations. The seeding can be set constant (`seed`

). The output indicates an optimal cutoff for the number of required randomizations, e.g., for the separation of items, and, therefore, potential item positioning.

The function also plots the results with 95% confidence intervals of the adjusted p-values, derived from a post-hoc ANOVA with multiple comparisons (Tukey) test. There will be perfect item separation when the plotted value reaches zero. Any remaining variance indicates potential overlaps in the 95% confidence intervals. A less stringent (but less confident) threshold can be obtained when the upper level of the 95% confidence intervals falls below (0.05, see green points in the plot). This may be useful when calculation times are long. There will be a warning, if the function did not converge to a threshold.

```
# bimpos - calculates item positions at a discrete number of randomizations.
bimpos(ydata=NULL, GT=NULL, simOpt=NULL, limitToRun=5, showPlot=TRUE)
```

This function is very similar to `bimUninformed`

and uses the same input arguments. Contrary to `bimUninformed`

, however, it will not explore randomization space but rather uses a distinct cutoff to show the items’ relative position. The query item (`simOpt`

) is shown in “red” in the plot. All items will be shown on the worth scale, together with 95% confidence intervals. Any overlap indicates potential ambiguity in item positioning.

```
# bimsim - informed position simulation
bimsim(rawdat=NULL, GT=GT, simOpt=simOpt, filter.crit="CE", limitToRun=5,
tcut=0.9, deviation=0, minQuantity=0, seed=TRUE, showPlot=TRUE,ylim=c(0,0.7))
```

The `bimsim`

function performs an informed position simulation of variable preference test data. The `limitToRun`

argument from the `bimUninformed`

output can be used as an estimate for the required number of randomizations. The `tcut`

argument sets the cutoff for showing the consensus error (%, CE) of the simulated results in the plot **OR** the transitivity ratios (1-Iratio; with Iratio indicating the number of intransitive triplets per total number of triplets). The selection depends on the `filter.crit`

parameter (“CE” or “Iratio”), i.e., tcut=0.95 results in a 95% cutoff. The more stringent (= higher tcut in case of transitivity ratios) the criterion, the more informed the simulation becomes. The output is a frequency distribution of `simOpts`

preferred position at the given `tcut`

. The plot shows the worth value of the `simOpt`

item as a function of its position. The bubblesizes code for the consensus error/transitivity ratio.

```
# simsim(data = NULL, simOpt = NULL, GT = NULL, seeding = TRUE, runs = NULL,
truepos = NULL, verbose = FALSE, path = NULL)
```

The `simsim`

function will simulate an item (`simOpt`

) against any combinations given in the Ground Truth (`GT`

). A constant seeding ensures that the results remain stable. The number of simulation runs can be set with the `runs`

object. However, this number is data-dependent and, therefore, a heuristic. When the true position of the simulated item is known (or shall be tested), the `truepos`

object is used to indicate that position. The `verbose`

object silences outputs from the `bimpre`

function. When the `path`

is given, the resulting table from the simulation will be stored as a *.txt file.

```
# bimristics - data characteristics
bimristics(predat=NULL, simOpt=NULL)
```

The `bimristics`

functions gives an overview about conditional data characteristics. The `simOpt`

object determines the subgroup for which the conditions are checked (e.g., the number of tested subjects, the number of natural ties, the number of simulated items, etc.)

```
# bimbalance - data characteristics
bimbalance(dat=NULL, sidevar="sideA")
```

The `bimbalance`

determines how balanced the items were tested in terms of side. Items can be presented left (L) or right (R) - or in some other binary logic. If the data are somehow tested in an imbalanced way, this may introduce bias. For each (included!) item combination, the output shows the number of counts on the right (R) and left (L) side. Also, there is a ratio output which is obtained by dividing the counts(R) by counts(L).

1. Wickham H, François R, Henry L, Müller K: *Dplyr: A grammar of data manipulation*. 2020.

2. Wickham H: *Ggplot2: Elegant graphics for data analysis*. Springer-Verlag New York; 2016.

3. Turner H, Firth D: *Generalized nonlinear models in r: An overview of the gnm package*. 2020.

4. Hatzinger R, Dittrich R: **prefmod: An R Package for Modeling Preferences Based on Paired Comparisons, Rankings, or Ratings**. *Journal of Statistical Software* 2012, **48**:1–31.

5. Wickham H: **Reshaping data with the reshape package**. *Journal of Statistical Software* 2007, **21**:1–20.

6. Henry L, Wickham H: *Rlang: Functions for base types and core r and ’tidyverse’ features*. 2020.

7. Wickham H: *Stringr: Simple, consistent wrappers for common string operations*. 2019.

8. Garnier S: *Viridis: Default color maps from ’matplotlib’*. 2018.

9. Slowikowski K: *Ggrepel: Automatically position non-overlapping text labels with ’ggplot2’*. 2021.