Summarise numeric or categorical variable (possibly segmented)

Summarise either numeric or categerical variable, possible segment by another categorical variable. Actually a wrapper for stem_summarise_cat() and stem_summarise_num().

Usage

stem_summarise(
  data,
  item,
  group = NULL,
  weight = NULL,
  long = FALSE,
  collapse_item = NULL,
  collapse_group = NULL,
  return_n = FALSE
)

Arguments

data: Dataframe including variables to be analyzed
item: Variable to be summarised
group: Optional segmenting variable
weight: Optional survey weights
long: Returns data in long format. Useful if multiple dataframes are to be merged
collapse_item: Named list. Optionally collapes (or renames) categories of the item variable (Only if item is categorical).
collapse_group: Named list. Optionally collapes (or renames) categories of the group variable
return_n: If TRUE, returns absolute group sizes.

Value

An aggregated data frame with point estimates (either proportions or means) and 95% confidence intervals.

Details

Apart from either point estimates (proportions or means), the function also returns 95% confidence interval bounds. If unweighted, the intervals are computed using the basic sqrt((p * (1-p)) / n) formula. If the estimated proportions are very high/low, this may lead to interval estimates outsides of the (0;1) bounds. If weights are used, the confidence intervals are based on weighted logistic regression. Neither of the approaches will work if the proportions are exactly zero or one.

If long = TRUE, new column is added holding name of the item (and group) variable. This is useful if you need to loop through multiple variables and bind the results into a single data frame. See online vignettes for details.

collapse_item and collapse_group can be used to collapse categories of item and group variable. The named list should include vector of old category names with corresponding to the new category name, e.g. list(Agree = c("Definitely Agree", "Rather Agree"). You can also use the arguments to rename categories (list(Yes = "Agree")) or pass other arguments from the forcats::fct_collapse() function.

If return_n = TRUE, columns holding the absolute frequencies will be added. Column n is the number of observations for the specific combination of item and grouping variable, n_item is the frequency of the item categories and n_group is the frequency of the group categories.

Examples

stem_summarise(data = trust, item = government)
#> # A tibble: 5 × 4
#>   government                  freq freq_low freq_upp
#>   <fct>                      <dbl>    <dbl>    <dbl>
#> 1 Definitely Agree           0.008  -0.0537   0.0697
#> 2 Rather Agree               0.205   0.150    0.260 
#> 3 Neither Agree nor Disagree 0.39    0.342    0.438 
#> 4 Rather Disagree            0.201   0.146    0.256 
#> 5 Definitely Disagree        0.196   0.140    0.252 

stem_summarise(data = trust, item = age, group = eu_index)
#> # A tibble: 4 × 6
#>   eu_index     group_n item_n  mean mean_low mean_upp
#>   <fct>          <int>  <int> <dbl>    <dbl>    <dbl>
#> 1 Likes EU         365    365  39.2     38.8     39.5
#> 2 Neutral          319    319  38.9     38.6     39.3
#> 3 Dislikes EU      212    212  38.7     38.3     39.1
#> 4 Doesn't Know     104    104  39.7     39.1     40.4

stem_summarise(data = trust, item = government,
               collapse_item = list(Agree = c("Definitely Agree", "Rather Agree")))
#> # A tibble: 4 × 4
#>   government                  freq freq_low freq_upp
#>   <fct>                      <dbl>    <dbl>    <dbl>
#> 1 Agree                      0.213    0.158    0.268
#> 2 Neither Agree nor Disagree 0.39     0.342    0.438
#> 3 Rather Disagree            0.201    0.146    0.256
#> 4 Definitely Disagree        0.196    0.140    0.252