Skip to contents

Summarise either numeric or categerical variable, possible segment by another categorical variable. Actually a wrapper for stem_summarise_cat() and stem_summarise_num().

Usage

stem_summarise(
  data,
  item,
  group = NULL,
  weight = NULL,
  long = FALSE,
  collapse_item = NULL,
  collapse_group = NULL,
  return_n = FALSE
)

Arguments

data

Dataframe including variables to be analyzed

item

Variable to be summarised

group

Optional segmenting variable

weight

Optional survey weights

long

Returns data in long format. Useful if multiple dataframes are to be merged

collapse_item

Named list. Optionally collapes (or renames) categories of the item variable (Only if item is categorical).

collapse_group

Named list. Optionally collapes (or renames) categories of the group variable

return_n

If TRUE, returns absolute group sizes.

Value

An aggregated data frame with point estimates (either proportions or means) and 95% confidence intervals.

Details

Apart from either point estimates (proportions or means), the function also returns 95% confidence interval bounds. If unweighted, the intervals are computed using the basic sqrt((p * (1-p)) / n) formula. If the estimated proportions are very high/low, this may lead to interval estimates outsides of the (0;1) bounds. If weights are used, the confidence intervals are based on weighted logistic regression. Neither of the approaches will work if the proportions are exactly zero or one.

If long = TRUE, new column is added holding name of the item (and group) variable. This is useful if you need to loop through multiple variables and bind the results into a single data frame. See online vignettes for details.

collapse_item and collapse_group can be used to collapse categories of item and group variable. The named list should include vector of old category names with corresponding to the new category name, e.g. list(Agree = c("Definitely Agree", "Rather Agree"). You can also use the arguments to rename categories (list(Yes = "Agree")) or pass other arguments from the forcats::fct_collapse() function.

If return_n = TRUE, columns holding the absolute frequencies will be added. Column n is the number of observations for the specific combination of item and grouping variable, n_item is the frequency of the item categories and n_group is the frequency of the group categories.

Examples

stem_summarise(data = trust, item = government)
#> # A tibble: 5 × 4
#>   government                  freq freq_low freq_upp
#>   <fct>                      <dbl>    <dbl>    <dbl>
#> 1 Definitely Agree           0.008  -0.0537   0.0697
#> 2 Rather Agree               0.205   0.150    0.260 
#> 3 Neither Agree nor Disagree 0.39    0.342    0.438 
#> 4 Rather Disagree            0.201   0.146    0.256 
#> 5 Definitely Disagree        0.196   0.140    0.252 

stem_summarise(data = trust, item = age, group = eu_index)
#> # A tibble: 4 × 6
#>   eu_index     group_n item_n  mean mean_low mean_upp
#>   <fct>          <int>  <int> <dbl>    <dbl>    <dbl>
#> 1 Likes EU         365    365  39.2     38.8     39.5
#> 2 Neutral          319    319  38.9     38.6     39.3
#> 3 Dislikes EU      212    212  38.7     38.3     39.1
#> 4 Doesn't Know     104    104  39.7     39.1     40.4

stem_summarise(data = trust, item = government,
               collapse_item = list(Agree = c("Definitely Agree", "Rather Agree")))
#> # A tibble: 4 × 4
#>   government                  freq freq_low freq_upp
#>   <fct>                      <dbl>    <dbl>    <dbl>
#> 1 Agree                      0.213    0.158    0.268
#> 2 Neither Agree nor Disagree 0.39     0.342    0.438
#> 3 Rather Disagree            0.201    0.146    0.256
#> 4 Definitely Disagree        0.196    0.140    0.252