FOKUS questionnaire schema

Main semantic rules

The raw questionnaire is written in the TOML file format. Basically, all questionnaire items are defined in an ordered tree structure where child nodes inherit the attributes of their parent (and transitively grandparent etc.) nodes.

The following main semantic rules apply:

Top table level (blockname) defines a questionnaire block¹, whereby the (alphabetical) order of the table levels conforms to the order of the blocks in the questionnaire:
```
[blockname]
```
Lower table level names ideally start with a 3-digit number (###) since table levels (and hence questionnare items below them) are always ordered according to the (alphabetical) order of their names:
```
[blockname.00X_2nd_lvl.00X_3rd_lvl]
```
Deepest² table level named item defines an “atomic” questionnaire item (= single row in Markdown questionnaires) or a template for multiple similar “atomic” questionnaire items. It can be defined multiple times (once for each item) and must be wrapped in double brackets:
```
[[blockname.00X_2nd_lvl.00X_3rd_lvl.item]]
```
Items appear in the final questionnaire in the same order they are defined here. Technically, the level item is an array of tables which is represented as a list of unnamed lists in R.
Table levels in between top and deepest level item can be named anything except variable_name and are intended to arrange and/or group items and set keys that hold for multiple questionnaire items hierarchically in order to avoid redundancies.

NOTE: It is still strongly discouraged to name table levels the same as any of the keys listed below.

Supported keys

Depending on the table level, the set of possible keys includes:

Only regarded on top level (“block”):

Key	Type	Optional	Remarks
`title`	character scalar	❌	block title
`intro`	character scalar	✅	text paragraph introducing the questionnaire items of the block
`prefix`	integer scalar	✅	block numbering prefix for the first questionnaire column `#`; if not provided, items will be auto-numbered across all blocks without a `prefix`

Regarded on all levels incl. individual item-level nodes:

Key	Type	Default Value	Optional	Iterator	Excluder	Remarks
`lvl`	character vector	`"?"`	❌	✅	❌	political-level loop iterator
`i`	integer vector	`NA_integer_`	❌	✅	❌	2nd-level loop iterator
`j`	integer vector	`NA_integer_`	❌	✅	❌	3rd-level loop iterator, i.e. “for each `lvl`, iterate over each `i`, and in turn iterate over every `j`”
`block`	character scalar	`top-level block name`	❌	❌	❌	holds the name of the respective top-level block; filled automatically during parsing
`variable_name`	character scalar	`"???"`	❌	❌	❌	aka “Variablenname”; mustn’t have any subkeys (see below); usually not sensible to be inherited since it has to be unique
`who`	character scalar	`"alle"`	❌	❌	❌	aka “Wer”
`topic`	character scalar	`NA`	✅	❌	❌	aka “Thema”
`question_intro_i`	character scalar	`NA`	✅	❌	❌	1st-priority part of “Frage”, only included if `i == 1` and `j %in% c(1, NA)`
`question_intro_j`	character scalar	`NA`	✅	❌	❌	2nd-priority part of “Frage”, only included if `j == 1`
`question`	character scalar	`NA`	✅	❌	❌	last-priority part of “Frage”, always included
`question_full`	character scalar	`NA`	✅	❌	❌	alternative fully formulated version of “Frage” that can refer to `question`
`question_common`	character scalar	`NA`	✅	❌	❌	version of `question` that stays the same across ballot dates
`variable_label`	character scalar	`"???"`	❌	❌	❌	aka “Variablenlabel”
`variable_label_common`	character scalar	`NA`	✅	❌	❌	version of `variable_label` that stays the same across ballot dates
`response_options`	character vector	`NA`	✅	❌	❌	aka “Antwortoptionen”
`variable_values`	integer vector	`NA`	✅	❌	❌	aka “Variablenausprägungen”
`value_labels`	character vector	`NA`	✅	❌	❌	aka “Ausprägungslabels”
`value_scale`	character scalar	`"nominal"`	❌	❌	❌	item’s scale of measure (aka “level of measurement”); possible values include `"binary"`, `"nominal"`, `"ordinal_ascending"`, `"ordinal_descending"`, `"interval"` and `"ratio"`
`allow_multiple_answers`	logical scalar	`FALSE`	❌	❌	❌	aka “Mehrfachnennungen”
`randomize_response_options`	logical scalar	`FALSE`	❌	❌	❌	whether or not the `response_options` are displayed in randomized order to online respondents (`response_options`) with code `80`, `90` and `99` are still excluded from randomization)
`is_mandatory`	logical scalar	`FALSE`	❌	❌	❌	aka “Antwort obligatorisch”
`ballot_types`	character vector	`c("referendum", "election")`	❌	❌	✅	ballot-type-specific in- and exclusion of the respective item(s)
`include`	logical scalar	`TRUE`	❌	❌	✅	ballot-date-wide in- and exclusion of the respective item(s)

The keys regarded on item-level nodes are resolved in the following order

lvl
i
j
All the other item-level keys³.

Further notes

All keys support glue’s string interpolation, meaning that non-character-type keys like include or variable_values can also be specified as string (arrays) which will automatically be converted to their proper type during questionnaire generation.

More specifically, keys that hold arrays like response_options are fed to glue::glue() allowing to use its powerful string interpolation syntax, while the other keys holding scalars like include are fed to cli::pluralize() which additionally supports a handy pluralization syntax. The default behavior of trimming all surrounding whitespaces is disabled in both cases.
To vary non-binary keys like topic, question etc. wordings for different cantons and/or ballot dates directly without relying on embedded R code, you can define
- single-canton subkeys named by the lowercase English canton name (e.g. aargau) applying to a specific canton,
- single-date subkeys named YYYYMMDD (the respective ballot date without any -) applying to a specific ballot date,
- begin-end interval subkeys named YYYYMMDD_YYYYMMDD applying to all ballot dates that fall into the specified interval, or
- default applying to any canton or ballot date for which no more specific subkey exists.
Interval subkeys mustn’t overlap. Single-date subkeys have precedence over interval subkeys and canton subkeys have precedence over any date subkeys.

To vary binary keys like allow_multiple_answers or include directly, you can define
- begin-end interval subkeys as described above, or alternatively
- subkeys named true (meaning inclusion) or false (meaning exclusion; has priority over true in case of ambiguity) containing an array of cantons or dates (YYYY-MM-DD), as well as
- default (applying to any cantons or ballot dates not included in the true or false subkeys).
The true/false subkeys have precedence over begin-end interval subkeys. Subkey can be combined by nesting them.

To vary keys for different ballot types, just define subkeys named referendum or election.

Examples:
```
topic.default = "bla"
topic.aargau = "oops"
topic.zurich = "upsala"
who.20180923 = "alle"
who.20181125_20201018 = "Online-Respondenten"
who.20181125_20201018.zurich = "Print-Respondenten"
include.default = true
include.false = [ 2018-09-23 ]
question.default.election = "lalala"
question.default.referendum = "nenene"
```
NOTES:
- variable_name is the only key that mustn’t have any subkeys.
- If variable_label_common is not explicitly defined, it falls back to variable_label.default.
- If question_common is not explicitly defined, it falls back to a) the question_full.default, b) question.default, or c) question_full.
The keys lvl, i and j are interpreted as iterators which you can refer to via string interpolation (e.g. voting_decision_{lvl}), so you have to define similar questionnaire items only once. If any of them evaluate to NULL or an empty vector, the respective item is automatically excluded, i.e. it’s not necessary to explicitly set include (sub)keys in such case.
You can even cross-reference keys from non-parent nodes using qstnr_item_val(). But be careful to not create infinite loops via circular references.
To explicitly unset a key, just assign a string containing NA of the correct type wrapped in curly braces ("{NA_character_}" for values of type string, "{NA_integer_}" for values of type integer etc.).
A who constraint notice is automatically added to the end of variable_label and variable_label_common if possible (e.g. "blabla (only non-voters)"). Because variable_label_common by definition mustn’t vary over time, it must be explicitly specified including such a who constraint notice if who varies over time.
This questionnaire is complemented by a supplemental TOML file YYYY-MM-DD.toml for each ballot date that holds additional date-specific metadata.

Supplemental assumptions to resolve glue/cli string interpolation

The following R packages are loaded and attached.
- fokus
- magrittr
The following R objects are present in the evaluation environment:
- ballot_date (character scalar): The ballot date the questionnaire is to be generated for (in the format YYYY-MM-DD).
- canton (character scalar): The canton the questionnaire is to be generated for (all lowercase).

Debugging

The R function fokus:::gen_qstnr_tibble(), which is designed to turn the raw questionnaire into a tibble containing the questionnaire data for a specific ballot date and canton, prints helpful progress information when its verbose argument is set to TRUE (default is FALSE).

For the questionnaire @ 2018-09-23 in Aargau, this looks as follows:

Verbose FOKUS questionnaire generation (Aargau @ 2018-09-23)

00:00-00:00

"Verbose FOKUS questionnaire generation (Aargau @ 2018-09-23)"

If some mistake that breaks questionnaire generation is present in the raw questionnaire, like a syntax error in embedded R code for example, the progress output stops immediately, allowing to easily locate the exact source position of the mistake.

Structure and semantic rules of the raw TOML questionnaire

2024-03-01

Main semantic rules

Supported keys

Further notes

Supplemental assumptions to resolve glue/cli string interpolation

Debugging