Background
Surveys in the social sciences often have questions on individuals' subjective evaluations of their own situation or how they think about a certain aspect of society. For example, respondents are often asked to evaluate their general health on a scale such as "very good", "good", "moderate", "fair", or "poor". The same scale can be used to evaluate their experience with the quality of their health care system. Or they can rate satisfaction with their own life (or aspects of that, such as income, or social contacts) on a scale like "very satisfied", "satisfied", "moderately satisfied", "not so satisfied", "not satisfied at all". Researchers often use such questions since they are easy to answer and provide global information on important aspects of people's lives and the society they live in.
Moreover, researchers are often interested in differences between groups characterised by, e.g., culture, country, age, socio-economic background, etc. Comparing subjective evaluations across such groups has the disadvantage that different people may interpret and answer the same questions differently. For example, when asked to evaluate their own health, people in different countries may respond differently, not only because health status differs, but also because they may interpret 'good health' differently. If people in different countries or socio-economic groups interpret or understand the same questions in different ways (also called item bias, or differential item functioning), such differences need to be corrected to ensure that comparisons across groups are useful.
Examples of areas where differential item functioning is a problem are consumer research, health and pain research, cognitive psychology, economics, and sociology. The problem can become particularly strong if comparisons across different countries are made. For example, economists may want to compare earnings and income inequality across countries, or psychologists want to compare the prevalence of mental health problems. Another example is responsiveness of health care. Given the many differences in health care systems across countries, it seems interesting to analyse how people from different countries appreciate these systems. Thus, survey research among different groups of people requires taking into account that these people will have different mental standards from which they evaluate situations or events.
The solution to incomparable answers is to find anchors and attach the response categories of the survey questions to some standard or anchor. Anchoring vignettes provide such a standard. By using anchoring vignettes (hypothetical scenarios) we create interpersonally comparable measurements by using the answers to the vignettes to adjust people's self-assessment of a situation or concept.
What are vignettes?
Vignette questions are questions about hypothetical persons in a particular situation which respondents are asked to evaluate. For example, if the issue is work disability, a person with a specific work related health problem is described. Respondents in different groups (e.g., countries) get the same vignettes, so that their evaluations should be the same if there would not be any differential item functioning. In other words, in the work disability example, the degree of work disability of the vignette person is fixed, and is the "anchor" that can be used to compare evaluation thresholds - differences in evaluations now must point at different evaluation criteria. By applying vignettes to different groups, the differences in the evaluations of the anchor can be used to convert the answers of one group (e.g., citizens of The Netherlands) to the answers of another group (e.g., citizens of Belgium) and make them comparable.
Example
Self-report of political efficacy:
How much say do you have in getting the government to address issues that interest you?
(1=no say at all, 2=little say, 3=some say, 4=a lot of say, 5=unlimited say)
Following this question the respondents are presented with vignette questions, asking them to rate the degree of political efficacy for each vignette on the same scale as used for the self-report. Two such vignette questions are given below:
Vignette 1
Alison is bothered by the air pollution caused by a local firm. It is not dangerous but sometimes leads to a bad smell. She and her neighbours are supporting an opposition candidate in the forthcoming local elections that has promised to address the issue. So many people in her area feel the same way that the opposition candidate will probably defeat the incumbent representative.
-
How much say does Alison have in getting the government to address issues that interest her?
-
1=no say at all, 2=little say, 3=some say, 4=a lot of say, 5=unlimited say
Vignette 2
John is bothered by the air pollution caused by a local firm. It is not dangerous but sometimes leads to a bad smell. There is a group of influential local residents who could do something about the problem, but they have said that industrial development is the most important policy right now instead of clean air.
-
How much say does John have in getting the government to address issues that interest him?
-
1=no say at all, 2=little say, 3=some say, 4=a lot of say, 5=unlimited say
How do vignettes help to identify response scale differences?
The vignettes are written such that Alison experiences the largest political efficacy and John the smallest. Suppose that two respondents have the following ordering of how they perceive the situation of the hypothetical people (Figure 1).

In the left picture Respondent 1's assessment of political efficacy in his own situation is between his assessments of the situations of Alison and John. In the middle picture Respondent 2 gives a higher assessment score to his own situation than to those of both Alison and John. Simply comparing the two self-assessments would suggest that political efficacy for Respondent 2 is less than that for Respondent 1. However, comparing the two evaluations of political efficacy experienced by Alison shows that the respondents have very different response scales. Taking Alison's situation as the benchmark, we see that Respondent 1 experiences less political efficacy than Alison, while Respondent 2 experiences more political efficacy than Alison. Adjusting the self-assessment using the vignette answers reverses the conclusion. Now it can be seen that in fact Respondent 2 experiences higher political efficacy than Respondent 1. Thus, ignoring item bias in survey questions can give misleading conclusions.
The purpose of using vignettes is to re-assign or re-define the self-assessment answers relative to the vignette answers. For using vignettes in this way, two requirements need to be fulfilled: response consistency and vignette equivalence. Response consistency means that all respondents use the response categories for a vignette question in the same way as the response categories for the self-assessment question. Vignette equivalence implies that respondents understand the vignette in the same way. This does not mean that they give the same answers, but it does imply that differences in answers are due to response scale differences.
Now that we have rescaled Respondent 2's answers, we need to assign a (new) value to the answers. That is, we need to assign a new value to the self-assessment score relative to the vignette scores. Assume that for Respondents 1 and 2 the ordering is the same. That is, John has least political say, and Alison has most political say. The vignette scores of Respondent 1 serve as cut points to divide the response scale into five discrete areas (Figure 2). A value '1' is given to a self-assessment scores lower than John, '2' if equal to John, '3' if in between John and Alison, '4' if equal to Alison, '5' if higher than Alison. In this example, the recoded self-assessment score of Respondent 1 becomes '3' and that of Respondent 2 becomes '5'. This new variable is free of item bias and can be treated as an ordinal variable.

