FOKUS questionnaire schema
Structure and semantic rules of the raw TOML questionnaire
2024-03-01
Source:vignettes/raw_qstnr_schema.Rmd
raw_qstnr_schema.RmdMain semantic rules
The raw questionnaire is written in the TOML file format. Basically, all questionnaire items are defined in an ordered tree structure where child nodes inherit the attributes of their parent (and transitively grandparent etc.) nodes.
The following main semantic rules apply:
-
Top table level (
blockname) defines a questionnaire block1, whereby the (alphabetical) order of the table levels conforms to the order of the blocks in the questionnaire: -
Lower table level names ideally start with a 3-digit number (
###) since table levels (and hence questionnare items below them) are always ordered according to the (alphabetical) order of their names: -
Deepest2 table level named
itemdefines an “atomic” questionnaire item (= single row in Markdown questionnaires) or a template for multiple similar “atomic” questionnaire items. It can be defined multiple times (once for each item) and must be wrapped in double brackets:Items appear in the final questionnaire in the same order they are defined here. Technically, the level
itemis an array of tables which is represented as a list of unnamed lists in R. -
Table levels in between top and deepest level
itemcan be named anything exceptvariable_nameand are intended to arrange and/or group items and set keys that hold for multiple questionnaire items hierarchically in order to avoid redundancies.NOTE: It is still strongly discouraged to name table levels the same as any of the keys listed below.
Supported keys
Depending on the table level, the set of possible keys includes:
-
Only regarded on top level (“block”):
Key Type Optional Remarks titlecharacter scalar ❌ block title introcharacter scalar ✅ text paragraph introducing the questionnaire items of the block prefixinteger scalar ✅ block numbering prefix for the first questionnaire column #; if not provided, items will be auto-numbered across all blocks without aprefix -
Regarded on all levels incl. individual
item-level nodes:Key Type Default Value Optional Iterator Excluder Remarks lvlcharacter vector "?"❌ ✅ ❌ political-level loop iterator iinteger vector NA_integer_❌ ✅ ❌ 2nd-level loop iterator jinteger vector NA_integer_❌ ✅ ❌ 3rd-level loop iterator, i.e. “for each lvl, iterate over eachi, and in turn iterate over everyj”blockcharacter scalar *top-level block name*❌ ❌ ❌ holds the name of the respective top-level block; filled automatically during parsing variable_namecharacter scalar "???"❌ ❌ ❌ aka “Variablenname”; mustn’t have any subkeys (see below); usually not sensible to be inherited since it has to be unique whocharacter scalar "alle"❌ ❌ ❌ aka “Wer” topiccharacter scalar NA✅ ❌ ❌ aka “Thema” question_intro_icharacter scalar NA✅ ❌ ❌ 1st-priority part of “Frage”, only included if i == 1andj %in% c(1, NA)question_intro_jcharacter scalar NA✅ ❌ ❌ 2nd-priority part of “Frage”, only included if j == 1questioncharacter scalar NA✅ ❌ ❌ last-priority part of “Frage”, always included question_fullcharacter scalar NA✅ ❌ ❌ alternative fully formulated version of “Frage” that can refer to questionquestion_commoncharacter scalar NA✅ ❌ ❌ version of questionthat stays the same across ballot datesvariable_labelcharacter scalar "???"❌ ❌ ❌ aka “Variablenlabel” variable_label_commoncharacter scalar NA✅ ❌ ❌ version of variable_labelthat stays the same across ballot datesresponse_optionscharacter vector NA✅ ❌ ❌ aka “Antwortoptionen” variable_valuesinteger vector NA✅ ❌ ❌ aka “Variablenausprägungen” value_labelscharacter vector NA✅ ❌ ❌ aka “Ausprägungslabels” value_scalecharacter scalar "nominal"❌ ❌ ❌ item’s scale of measure (aka “level of measurement”); possible values include "binary","nominal","ordinal_ascending","ordinal_descending","interval"and"ratio"allow_multiple_answerslogical scalar FALSE❌ ❌ ❌ aka “Mehrfachnennungen” randomize_response_optionslogical scalar FALSE❌ ❌ ❌ whether or not the response_optionsare displayed in randomized order to online respondents (response_options) with code80,90and99are still excluded from randomization)is_mandatorylogical scalar FALSE❌ ❌ ❌ aka “Antwort obligatorisch” ballot_typescharacter vector c("referendum", "election")❌ ❌ ✅ ballot-type-specific in- and exclusion of the respective item(s) includelogical scalar TRUE❌ ❌ ✅ ballot-date-wide in- and exclusion of the respective item(s)
The keys regarded on item-level nodes are resolved in
the following order
lvlij- All the other item-level keys3.
Further notes
-
All keys support glue’s string interpolation, meaning that non-character-type keys like
includeorvariable_valuescan also be specified as string (arrays) which will automatically be converted to their proper type during questionnaire generation.More specifically, keys that hold arrays like
response_optionsare fed toglue::glue()allowing to use its powerful string interpolation syntax, while the other keys holding scalars likeincludeare fed tocli::pluralize()which additionally supports a handy pluralization syntax. The default behavior of trimming all surrounding whitespaces is disabled in both cases. -
To vary non-binary keys like
topic,questionetc. wordings for different cantons and/or ballot dates directly without relying on embedded R code, you can define- single-canton subkeys named by the lowercase English canton name
(e.g.
aargau) applying to a specific canton, - single-date subkeys named
YYYYMMDD(the respective ballot date without any-) applying to a specific ballot date, - begin-end interval subkeys named
YYYYMMDD_YYYYMMDDapplying to all ballot dates that fall into the specified interval, or -
defaultapplying to any canton or ballot date for which no more specific subkey exists.
Interval subkeys mustn’t overlap. Single-date subkeys have precedence over interval subkeys and canton subkeys have precedence over any date subkeys.
To vary binary keys like
allow_multiple_answersorincludedirectly, you can define- begin-end interval subkeys as described above, or alternatively
- subkeys named
true(meaning inclusion) orfalse(meaning exclusion; has priority overtruein case of ambiguity) containing an array of cantons or dates (YYYY-MM-DD), as well as -
default(applying to any cantons or ballot dates not included in thetrueorfalsesubkeys).
The
true/falsesubkeys have precedence over begin-end interval subkeys. Subkey can be combined by nesting them.To vary keys for different ballot types, just define subkeys named
referendumorelection.Examples:
topic.default = "bla" topic.aargau = "oops" topic.zurich = "upsala" who.20180923 = "alle" who.20181125_20201018 = "Online-Respondenten" who.20181125_20201018.zurich = "Print-Respondenten" include.default = true include.false = [ 2018-09-23 ] question.default.election = "lalala" question.default.referendum = "nenene"NOTES:
-
variable_nameis the only key that mustn’t have any subkeys. - If
variable_label_commonis not explicitly defined, it falls back tovariable_label.default. - If
question_commonis not explicitly defined, it falls back to a) thequestion_full.default, b)question.default, or c)question_full.
- single-canton subkeys named by the lowercase English canton name
(e.g.
The keys
lvl,iandjare interpreted as iterators which you can refer to via string interpolation (e.g.voting_decision_{lvl}), so you have to define similar questionnaire items only once. If any of them evaluate toNULLor an empty vector, the respective item is automatically excluded, i.e. it’s not necessary to explicitly setinclude(sub)keys in such case.You can even cross-reference keys from non-parent nodes using
qstnr_item_val(). But be careful to not create infinite loops via circular references.To explicitly unset a key, just assign a string containing
NAof the correct type wrapped in curly braces ("{NA_character_}"for values of type string,"{NA_integer_}"for values of type integer etc.).A
whoconstraint notice is automatically added to the end ofvariable_labelandvariable_label_commonif possible (e.g."blabla (only non-voters)"). Becausevariable_label_commonby definition mustn’t vary over time, it must be explicitly specified including such awhoconstraint notice ifwhovaries over time.This questionnaire is complemented by a supplemental TOML file
YYYY-MM-DD.tomlfor each ballot date that holds additional date-specific metadata.
Supplemental assumptions to resolve glue/cli string interpolation
-
The following R packages are loaded and attached.
-
The following R objects are present in the evaluation environment:
ballot_date(character scalar): The ballot date the questionnaire is to be generated for (in the formatYYYY-MM-DD).canton(character scalar): The canton the questionnaire is to be generated for (all lowercase).
Debugging
The R function fokus:::gen_qstnr_tibble(),
which is designed to turn the raw questionnaire into a tibble
containing the questionnaire data for a specific ballot date and canton,
prints helpful progress information when its verbose
argument is set to TRUE (default is
FALSE).
For the questionnaire @ 2018-09-23 in Aargau, this looks as follows:
If some mistake that breaks questionnaire generation is present in the raw questionnaire, like a syntax error in embedded R code for example, the progress output stops immediately, allowing to easily locate the exact source position of the mistake.