Skip to contents

Main semantic rules

The raw questionnaire is written in the TOML file format. Basically, all questionnaire items are defined in an ordered tree structure where child nodes inherit the attributes of their parent (and transitively grandparent etc.) nodes.

The following main semantic rules apply:

  1. Top table level (blockname) defines a questionnaire block1, whereby the (alphabetical) order of the table levels conforms to the order of the blocks in the questionnaire:

    [blockname]
  2. Lower table level names ideally start with a 3-digit number (###) since table levels (and hence questionnare items below them) are always ordered according to the (alphabetical) order of their names:

    [blockname.00X_2nd_lvl.00X_3rd_lvl]
  3. Deepest2 table level named item defines an “atomic” questionnaire item (= single row in Markdown questionnaires) or a template for multiple similar “atomic” questionnaire items. It can be defined multiple times (once for each item) and must be wrapped in double brackets:

    [[blockname.00X_2nd_lvl.00X_3rd_lvl.item]]

    Items appear in the final questionnaire in the same order they are defined here. Technically, the level item is an array of tables which is represented as a list of unnamed lists in R.

  4. Table levels in between top and deepest level item can be named anything except variable_name and are intended to arrange and/or group items and set keys that hold for multiple questionnaire items hierarchically in order to avoid redundancies.

    NOTE: It is still strongly discouraged to name table levels the same as any of the keys listed below.

Supported keys

Depending on the table level, the set of possible keys includes:

  • Only regarded on top level (“block”):

    Key Type Optional Remarks
    title character scalar block title
    intro character scalar text paragraph introducing the questionnaire items of the block
    prefix integer scalar block numbering prefix for the first questionnaire column #; if not provided, items will be auto-numbered across all blocks without a prefix
  • Regarded on all levels incl. individual item-level nodes:

    Key Type Default Value Optional Iterator Excluder Remarks
    lvl character vector "?" political-level loop iterator
    i integer vector NA_integer_ 2nd-level loop iterator
    j integer vector NA_integer_ 3rd-level loop iterator, i.e. “for each lvl, iterate over each i, and in turn iterate over every j
    block character scalar *top-level block name* holds the name of the respective top-level block; filled automatically during parsing
    variable_name character scalar "???" aka “Variablenname”; mustn’t have any subkeys (see below); usually not sensible to be inherited since it has to be unique
    who character scalar "alle" aka “Wer”
    topic character scalar NA aka “Thema”
    question_intro_i character scalar NA 1st-priority part of “Frage”, only included if i == 1 and j %in% c(1, NA)
    question_intro_j character scalar NA 2nd-priority part of “Frage”, only included if j == 1
    question character scalar NA last-priority part of “Frage”, always included
    question_full character scalar NA alternative fully formulated version of “Frage” that can refer to question
    question_common character scalar NA version of question that stays the same across ballot dates
    variable_label character scalar "???" aka “Variablenlabel”
    variable_label_common character scalar NA version of variable_label that stays the same across ballot dates
    response_options character vector NA aka “Antwortoptionen”
    variable_values integer vector NA aka “Variablenausprägungen”
    value_labels character vector NA aka “Ausprägungslabels”
    value_scale character scalar "nominal" item’s scale of measure (aka “level of measurement”); possible values include "binary", "nominal", "ordinal_ascending", "ordinal_descending", "interval" and "ratio"
    allow_multiple_answers logical scalar FALSE aka “Mehrfachnennungen”
    randomize_response_options logical scalar FALSE whether or not the response_options are displayed in randomized order to online respondents (response_options) with code 80, 90 and 99 are still excluded from randomization)
    is_mandatory logical scalar FALSE aka “Antwort obligatorisch”
    ballot_types character vector c("referendum", "election") ballot-type-specific in- and exclusion of the respective item(s)
    include logical scalar TRUE ballot-date-wide in- and exclusion of the respective item(s)

The keys regarded on item-level nodes are resolved in the following order

  1. lvl
  2. i
  3. j
  4. All the other item-level keys3.

Further notes

  • All keys support glue’s string interpolation, meaning that non-character-type keys like include or variable_values can also be specified as string (arrays) which will automatically be converted to their proper type during questionnaire generation.

    More specifically, keys that hold arrays like response_options are fed to glue::glue() allowing to use its powerful string interpolation syntax, while the other keys holding scalars like include are fed to cli::pluralize() which additionally supports a handy pluralization syntax. The default behavior of trimming all surrounding whitespaces is disabled in both cases.

  • To vary non-binary keys like topic, question etc. wordings for different cantons and/or ballot dates directly without relying on embedded R code, you can define

    • single-canton subkeys named by the lowercase English canton name (e.g. aargau) applying to a specific canton,
    • single-date subkeys named YYYYMMDD (the respective ballot date without any -) applying to a specific ballot date,
    • begin-end interval subkeys named YYYYMMDD_YYYYMMDD applying to all ballot dates that fall into the specified interval, or
    • default applying to any canton or ballot date for which no more specific subkey exists.

    Interval subkeys mustn’t overlap. Single-date subkeys have precedence over interval subkeys and canton subkeys have precedence over any date subkeys.

    To vary binary keys like allow_multiple_answers or include directly, you can define

    • begin-end interval subkeys as described above, or alternatively
    • subkeys named true (meaning inclusion) or false (meaning exclusion; has priority over true in case of ambiguity) containing an array of cantons or dates (YYYY-MM-DD), as well as
    • default (applying to any cantons or ballot dates not included in the true or false subkeys).

    The true/false subkeys have precedence over begin-end interval subkeys. Subkey can be combined by nesting them.

    To vary keys for different ballot types, just define subkeys named referendum or election.

    Examples:

    topic.default = "bla"
    topic.aargau = "oops"
    topic.zurich = "upsala"
    who.20180923 = "alle"
    who.20181125_20201018 = "Online-Respondenten"
    who.20181125_20201018.zurich = "Print-Respondenten"
    include.default = true
    include.false = [ 2018-09-23 ]
    question.default.election = "lalala"
    question.default.referendum = "nenene"

    NOTES:

    • variable_name is the only key that mustn’t have any subkeys.
    • If variable_label_common is not explicitly defined, it falls back to variable_label.default.
    • If question_common is not explicitly defined, it falls back to a) the question_full.default, b) question.default, or c) question_full.
  • The keys lvl, i and j are interpreted as iterators which you can refer to via string interpolation (e.g. voting_decision_{lvl}), so you have to define similar questionnaire items only once. If any of them evaluate to NULL or an empty vector, the respective item is automatically excluded, i.e. it’s not necessary to explicitly set include (sub)keys in such case.

  • You can even cross-reference keys from non-parent nodes using qstnr_item_val(). But be careful to not create infinite loops via circular references.

  • To explicitly unset a key, just assign a string containing NA of the correct type wrapped in curly braces ("{NA_character_}" for values of type string, "{NA_integer_}" for values of type integer etc.).

  • A who constraint notice is automatically added to the end of variable_label and variable_label_common if possible (e.g. "blabla (only non-voters)"). Because variable_label_common by definition mustn’t vary over time, it must be explicitly specified including such a who constraint notice if who varies over time.

  • This questionnaire is complemented by a supplemental TOML file YYYY-MM-DD.toml for each ballot date that holds additional date-specific metadata.

Supplemental assumptions to resolve glue/cli string interpolation

  • The following R packages are loaded and attached.

  • The following R objects are present in the evaluation environment:

    • ballot_date (character scalar): The ballot date the questionnaire is to be generated for (in the format YYYY-MM-DD).

    • canton (character scalar): The canton the questionnaire is to be generated for (all lowercase).

Debugging

The R function fokus:::gen_qstnr_tibble(), which is designed to turn the raw questionnaire into a tibble containing the questionnaire data for a specific ballot date and canton, prints helpful progress information when its verbose argument is set to TRUE (default is FALSE).

For the questionnaire @ 2018-09-23 in Aargau, this looks as follows:

If some mistake that breaks questionnaire generation is present in the raw questionnaire, like a syntax error in embedded R code for example, the progress output stops immediately, allowing to easily locate the exact source position of the mistake.