Preference tests

Preference tests are a way of testing the “wants” of individuals and can be used in various experimental setups. Especially in the animal sciences, these methods are used in many applications. Usually, scientists will have to “do” the experiments to determine the “wants” of subjects. Unfortunately, these experiments cost time and money or require intricate gadgets to say that animal N.N. prefers substance “A” over “B”. In addition, experiments that include a lot of test item combinations can quickly become a drain on resources. Therefore, we propose the simsalRbim package in which we present numerous functions to estimate, evaluate and simulate preference tests with complete and missing data. In addition, we make intensive use of the prefmod [1] package that uses a log-linear Bradley-Terry model (LLBT) to calculate “worth values” from the equation coefficients so that items can be ranked.

So, how does this help us save resources?
– we’ll simulate items that we haven’t tested!

But, there is…

A problem with the “unknown”

The simulation of an item of which one has limited knowledge is a good starting point for estimating its putative position on the worth scale. However, this means, e.g., that only limited item combination tests have been experimentally tested, and others not. This leaves us with “open” positions on the item combinations for which no worth value can be calculated. The simsalRbim package takes advantage of how the prefmod package works and fills open positions randomly with the discrete results of the underlying model. These values can assume the numbers -1 (A > B), 0 (tie, A=B), and 1 (A < B) for each open item combination. By repeating the simulations i-times with the items from the Ground Truth, the model eventually runs into saturation due to the limited degrees of freedom that the item combinations provide.

To provide the user with a measure to evaluate the outcome of these simulations, we suggest two criteria: a) the Consensus Error (CE) and b) the intransitivity Ratio (Iratio), which will help identify “good” simulation results. In the case of “sufficiently” long simulation runs and more tested item combinations, the Iratio eventually gets smaller (e.g., Iratio=0 means zero intransitivity). Unfortunately, the number of required simulation runs is a heuristic (highly data-dependent). However, together with the CE, the Iratio provides a decision criterion for accepting the simulated position of a new item. For more information, see the explanation of the two measures.

Ways to simulate

For the simulation, some experimental data are needed (at least tested vs. one existing item from the Ground Truth). Please note that the less information is provided, the higher the uncertainty.

a) confirm a calculated position

Experimental data yields a ranking on the worth scale. However, this scale might include uncertainties. The user can exclude an item and reintroduce it in combinations with the remaining GT items. This can be done by specifying the calculated position of the item on the worth scale (truepos). The simulation yields the best-achieved estimate using the Iratio and the CE on the simulated item’s rank.

b) best guess (with a priori information)

If the item was not tested against all other items, but the user has a good idea about its position, the position can be simulated using the truepos argument as the most likely outcome. The simulation yields the best position estimate based on the Iratio and CE, along with the number of absolute counts that matched the predictions indicated in the truepos argument.

c) no idea (without any prior position information)

When there is no information (or guesstimate) on an item’s position, the simulation can be run without specifying the truepos argument. The resulting table will be without the absolute counts and frequency information on the truepos outcomes in such a case. Nevertheless, the position will still be estimated based using the Iratio and CE as in a) and b).

Example simulation: HoiHoiHoi

The ZickeZacke data within the simsalRbim package contains the “HoiHoiHoi” item, that has only been rated by 3 individuals (“eins”, “zwei”, “vier”) in the “Kacke” item (see). No other item has been tested with “HoiHoiHoi”.

The worth analysis with the “HoiHoiHoi” item shows that it ranks on position No. 2 (see). Let’s test this by simulating “HoiHoiHoi” in 100 runs using the simsim function.

Please note: The user can specify a true position (truepos, here=2) if a specific position is known or should be tested. If truepos=NULL, no such evaluation takes place. This would be the default option when testing a new item. When the path is specified, the results will be saved to that location as a *.txt file. The runs object specifies the number of simulations that are run.

# Does the simulation of HoiHoiHoi at the given Ground Truth (GT)
data       <- ZickeZacke
hoihoihoi  <- simsim(data     = data,
                     simOpt   = "HoiHoiHoi",
                     GT       = c("Zicke","Zacke","Huehner", "Kacke"),
                     runs     = 100,
                     truepos  = 2,
                     seeding  = TRUE,
                     path     = NULL)

Simulation result

item pos worth Iratio CE tp_frq tp total against
HoiHoiHoi 2 0.20 0.05 50.00 0.35 35 100 Zicke
HoiHoiHoi 2 0.22 0.03 12.50 0.40 40 100 Zacke
HoiHoiHoi 2 0.14 0.05 37.50 0.42 42 100 Huehner
HoiHoiHoi 1 0.55 0.03 18.75 0.59 59 100 Kacke
HoiHoiHoi 3 0.16 0.03 62.50 0.45 45 100 Zicke,Zacke
HoiHoiHoi 2 0.29 0.03 25.00 0.35 35 100 Zicke,Huehner
HoiHoiHoi 2 0.18 0.05 25.00 0.50 50 100 Zacke,Huehner
HoiHoiHoi 3 0.09 0.05 43.75 0.46 46 100 Zicke,Kacke
HoiHoiHoi 3 0.11 0.03 68.75 0.58 58 100 Zacke,Kacke
HoiHoiHoi 3 0.10 0.03 43.75 0.40 40 100 Huehner,Kacke
HoiHoiHoi 2 0.18 0.05 25.00 0.51 51 100 Zicke,Zacke,Huehner
HoiHoiHoi 2 0.17 0.03 18.75 0.71 71 100 Zicke,Zacke,Kacke
HoiHoiHoi 2 0.15 0.03 56.25 0.44 44 100 Zicke,Huehner,Kacke
HoiHoiHoi 2 0.36 0.03 43.75 0.64 64 100 Zacke,Huehner,Kacke
HoiHoiHoi 2 0.25 0.03 31.25 0.62 62 100 Zicke,Zacke,Huehner,Kacke

pos: the simulated item position (based on Iratio and CE)
worth: the average worth value obtained from the simulation
Irato: the Iratio (the lower, the better)
CE: the Consensus Error (the lower, the better)
tp_frq: the frequency of true positive results (gets lower with more items)
tp: the absolute tp counts in the simulation
total: the number of simulation runs
against: the items that simOpt was tested against
seeding: the simulation uses constant seeding if TRUE

Note that the simulations did a decent job in finding pos=2 for the “HoiHoiHoi” item - given the fact, that it was only tested with 1 other item and 3 subjects. The CE error remains relatively large and the Iratio is low from the beginning. The fraction of true positives should improve when more actual data/tests are provided.


1. Hatzinger R, Dittrich R: prefmod: An R Package for Modeling Preferences Based on Paired Comparisons, Rankings, or Ratings. Journal of Statistical Software 2012, 48:1–31.