Inno Dataset

The “inno” dataset from the Data Distribution File of SOEP-IS includes most of the variables from innovative modules (inno-modules) that were surveyed in SOEP-IS. It is a long dataset with data since the survey year 2011, in which SOEP-IS was surveyed as separate study for the first time. It does not contain generated variables but only the survey data from inno-modules and some flag-variables that provide additional information on treatment-groups etc.

Survey Years in the inno dataset Data from SOEP-IS modules are provided exclusively to the researchers who submitted the respective proposals for an initial 12-month period. After this embargo has ended, the data are released to the entire SOEP user community for secondary analysis. Because of this, the data in the inno dataset is one survey year behind the rest of the datasets. The simplified default procedure for inno modules is as follows:

  • year -> data collection

  • year+1 -> release of the “normal” data

  • year+2 -> release of inno module data

If, for example, the data set “pl” contains the year 2020 as the latest syear, the inno dataset only covers the data up to 2019.

Identify the variables from an Inno module

There are multiple ways to identify which variables belong to which innovative modules. In the Innovative Modules section, you can select a module and in the respective tables you will usually find as a list of the related published variables. However, it is possible that not all variables are listed, especially for older modules, as the variables were named according to a different scheme. We therefore recommend to consult the relevant questionnaires from our documentation. The questions in our metadata-based questionnaires contain the variablenames as well as the respective dataset they are saved.

Since survey year 2022, the variables of the Inno modules have been named according to a specific scheme. This is intended to ensure simple assignment of the variables to a module, regardless of the survey year. Each variable name always starts with an “i”. This is to indicate the origin from the “inno” data set. Each new Inno module is given a consecutive number, which is referenced in each variable directly after the “i”. This number starts at 101 for the first module in 2022. The variables are then named consecutively with an underscore for each module.

Examples:

  • The first variable from the first module in 2022 is called “i101_1”

  • The third variable from the sixth module in 2022 is called “i106_3”

These variable names are supposed to remain constant over the years. If the same module is surveyed in several years and the variable does not change, then the data from several years is included in this variable. If new variables are added to the module, they are named using the same module number and a consecutive variable number.

Besides the survey data, each module has a flag variable that marks all cases that have received this module. These variable names always start with “im_” (for “innovative module”) and since 2022 they also contain the corresponding module number.

Note: As this renaming process is only partially automated, it is possible that a longitudinal question is mistakenly saved in several variables. Therefore, if in doubt, it may be helpful to consider the information from the documented questionnaires. Additionally, there is no standardized nomenclature for the “inno” variables in the survey years before 2022. Only the flag variables always start with “im_”.