FOKUS questionnaire schema
Structure and semantic rules of the raw TOML questionnaire
2024-03-01
Source:vignettes/raw_qstnr_schema.Rmd
raw_qstnr_schema.Rmd
Main semantic rules
The raw questionnaire is written in the TOML file format. Basically, all questionnaire items are defined in an ordered tree structure where child nodes inherit the attributes of their parent (and transitively grandparent etc.) nodes.
The following main semantic rules apply:
-
Top table level (
blockname
) defines a questionnaire block1, whereby the (alphabetical) order of the table levels conforms to the order of the blocks in the questionnaire: -
Lower table level names ideally start with a 3-digit number (
###
) since table levels (and hence questionnare items below them) are always ordered according to the (alphabetical) order of their names: -
Deepest2 table level named
item
defines an “atomic” questionnaire item (= single row in Markdown questionnaires) or a template for multiple similar “atomic” questionnaire items. It can be defined multiple times (once for each item) and must be wrapped in double brackets:Items appear in the final questionnaire in the same order they are defined here. Technically, the level
item
is an array of tables which is represented as a list of unnamed lists in R. -
Table levels in between top and deepest level
item
can be named anything exceptvariable_name
and are intended to arrange and/or group items and set keys that hold for multiple questionnaire items hierarchically in order to avoid redundancies.NOTE: It is still strongly discouraged to name table levels the same as any of the keys listed below.
Supported keys
Depending on the table level, the set of possible keys includes:
-
Only regarded on top level (“block”):
Key Type Optional Remarks title
character scalar ❌ block title intro
character scalar ✅ text paragraph introducing the questionnaire items of the block prefix
integer scalar ✅ block numbering prefix for the first questionnaire column #
; if not provided, items will be auto-numbered across all blocks without aprefix
-
Regarded on all levels incl. individual
item
-level nodes:Key Type Default Value Optional Iterator Excluder Remarks lvl
character vector "?"
❌ ✅ ❌ political-level loop iterator i
integer vector NA_integer_
❌ ✅ ❌ 2nd-level loop iterator j
integer vector NA_integer_
❌ ✅ ❌ 3rd-level loop iterator, i.e. “for each lvl
, iterate over eachi
, and in turn iterate over everyj
”block
character scalar *top-level block name*
❌ ❌ ❌ holds the name of the respective top-level block; filled automatically during parsing variable_name
character scalar "???"
❌ ❌ ❌ aka “Variablenname”; mustn’t have any subkeys (see below); usually not sensible to be inherited since it has to be unique who
character scalar "alle"
❌ ❌ ❌ aka “Wer” topic
character scalar NA
✅ ❌ ❌ aka “Thema” question_intro_i
character scalar NA
✅ ❌ ❌ 1st-priority part of “Frage”, only included if i == 1
andj %in% c(1, NA)
question_intro_j
character scalar NA
✅ ❌ ❌ 2nd-priority part of “Frage”, only included if j == 1
question
character scalar NA
✅ ❌ ❌ last-priority part of “Frage”, always included question_full
character scalar NA
✅ ❌ ❌ alternative fully formulated version of “Frage” that can refer to question
question_common
character scalar NA
✅ ❌ ❌ version of question
that stays the same across ballot datesvariable_label
character scalar "???"
❌ ❌ ❌ aka “Variablenlabel” variable_label_common
character scalar NA
✅ ❌ ❌ version of variable_label
that stays the same across ballot datesresponse_options
character vector NA
✅ ❌ ❌ aka “Antwortoptionen” variable_values
integer vector NA
✅ ❌ ❌ aka “Variablenausprägungen” value_labels
character vector NA
✅ ❌ ❌ aka “Ausprägungslabels” value_scale
character scalar "nominal"
❌ ❌ ❌ item’s scale of measure (aka “level of measurement”); possible values include "binary"
,"nominal"
,"ordinal_ascending"
,"ordinal_descending"
,"interval"
and"ratio"
allow_multiple_answers
logical scalar FALSE
❌ ❌ ❌ aka “Mehrfachnennungen” randomize_response_options
logical scalar FALSE
❌ ❌ ❌ whether or not the response_options
are displayed in randomized order to online respondents (response_options
) with code80
,90
and99
are still excluded from randomization)is_mandatory
logical scalar FALSE
❌ ❌ ❌ aka “Antwort obligatorisch” ballot_types
character vector c("referendum", "election")
❌ ❌ ✅ ballot-type-specific in- and exclusion of the respective item(s) include
logical scalar TRUE
❌ ❌ ✅ ballot-date-wide in- and exclusion of the respective item(s)
The keys regarded on item
-level nodes are resolved in
the following order
lvl
i
j
- All the other item-level keys3.
Further notes
-
All keys support glue’s string interpolation, meaning that non-character-type keys like
include
orvariable_values
can also be specified as string (arrays) which will automatically be converted to their proper type during questionnaire generation.More specifically, keys that hold arrays like
response_options
are fed toglue::glue()
allowing to use its powerful string interpolation syntax, while the other keys holding scalars likeinclude
are fed tocli::pluralize()
which additionally supports a handy pluralization syntax. The default behavior of trimming all surrounding whitespaces is disabled in both cases. -
To vary non-binary keys like
topic
,question
etc. wordings for different cantons and/or ballot dates directly without relying on embedded R code, you can define- single-canton subkeys named by the lowercase English canton name
(e.g.
aargau
) applying to a specific canton, - single-date subkeys named
YYYYMMDD
(the respective ballot date without any-
) applying to a specific ballot date, - begin-end interval subkeys named
YYYYMMDD_YYYYMMDD
applying to all ballot dates that fall into the specified interval, or -
default
applying to any canton or ballot date for which no more specific subkey exists.
Interval subkeys mustn’t overlap. Single-date subkeys have precedence over interval subkeys and canton subkeys have precedence over any date subkeys.
To vary binary keys like
allow_multiple_answers
orinclude
directly, you can define- begin-end interval subkeys as described above, or alternatively
- subkeys named
true
(meaning inclusion) orfalse
(meaning exclusion; has priority overtrue
in case of ambiguity) containing an array of cantons or dates (YYYY-MM-DD
), as well as -
default
(applying to any cantons or ballot dates not included in thetrue
orfalse
subkeys).
The
true
/false
subkeys have precedence over begin-end interval subkeys. Subkey can be combined by nesting them.To vary keys for different ballot types, just define subkeys named
referendum
orelection
.Examples:
topic.default = "bla" topic.aargau = "oops" topic.zurich = "upsala" who.20180923 = "alle" who.20181125_20201018 = "Online-Respondenten" who.20181125_20201018.zurich = "Print-Respondenten" include.default = true include.false = [ 2018-09-23 ] question.default.election = "lalala" question.default.referendum = "nenene"
NOTES:
-
variable_name
is the only key that mustn’t have any subkeys. - If
variable_label_common
is not explicitly defined, it falls back tovariable_label.default
. - If
question_common
is not explicitly defined, it falls back to a) thequestion_full.default
, b)question.default
, or c)question_full
.
- single-canton subkeys named by the lowercase English canton name
(e.g.
The keys
lvl
,i
andj
are interpreted as iterators which you can refer to via string interpolation (e.g.voting_decision_{lvl}
), so you have to define similar questionnaire items only once. If any of them evaluate toNULL
or an empty vector, the respective item is automatically excluded, i.e. it’s not necessary to explicitly setinclude
(sub)keys in such case.You can even cross-reference keys from non-parent nodes using
qstnr_item_val()
. But be careful to not create infinite loops via circular references.To explicitly unset a key, just assign a string containing
NA
of the correct type wrapped in curly braces ("{NA_character_}"
for values of type string,"{NA_integer_}"
for values of type integer etc.).A
who
constraint notice is automatically added to the end ofvariable_label
andvariable_label_common
if possible (e.g."blabla (only non-voters)"
). Becausevariable_label_common
by definition mustn’t vary over time, it must be explicitly specified including such awho
constraint notice ifwho
varies over time.This questionnaire is complemented by a supplemental TOML file
YYYY-MM-DD.toml
for each ballot date that holds additional date-specific metadata.
Supplemental assumptions to resolve glue/cli string interpolation
-
The following R packages are loaded and attached.
-
The following R objects are present in the evaluation environment:
ballot_date
(character scalar): The ballot date the questionnaire is to be generated for (in the formatYYYY-MM-DD
).canton
(character scalar): The canton the questionnaire is to be generated for (all lowercase).
Debugging
The R function fokus:::gen_qstnr_tibble()
,
which is designed to turn the raw questionnaire into a tibble
containing the questionnaire data for a specific ballot date and canton,
prints helpful progress information when its verbose
argument is set to TRUE
(default is
FALSE
).
For the questionnaire @ 2018-09-23 in Aargau, this looks as follows:
If some mistake that breaks questionnaire generation is present in the raw questionnaire, like a syntax error in embedded R code for example, the progress output stops immediately, allowing to easily locate the exact source position of the mistake.