Why is the pid not a unique identifier in the biol dataset?¶
The biography questionnaire should actually only be completed once by each respondent in the first year of participation. In this respect, it would be logical if each pid only appeared once in the biol data set. However, this is not the case. Some pids appear more than once, which is why the syear variable is necessary to uniquely identify the cases. For two main reasons, there are exceptions to the rule mentioned at the beginning, which lead to this circumstance. The following exceptions are possible:
Respondents do not complete the questionnaire in the first year
As respondents can quit the survey at any time, it is possible that the biography questionnaire is completed in the subsequent years. This has no direct influence on the data, apart from the fact that the information will be found in a different syear. It is also possible that the biography questionnaire is never completed. In this case, the respective individuals are not present in the biol dataset.
Respondents complete the questionnaire several times
Like other questionnaires, the biographical questionnaire is subject to change. As a result, it may be necessary to collect certain information again or for the first time (e.g. to filter other questions). In this case, parts of the questionnaire are shown to the respondents again. To ensure that these collected data can also be used, they are then included in the biol data set.
Otherwise, incorrect preloads or assignments of identifiers may result in a person accidentally answering the entire biographical questionnaire again. This data is also saved in the biol data set in order to make it officially available. It is important to note that not all of the responses must be identical to the first ones. It may be advisable to check any affected cases and variables and use them appropriately in analyses.