Plotting with stemtools • stemtools

This vignette shows how to use stemtools for creating a variety of standardized plots. We’ll use a simuated dataset trust, included with the package itself.

library(stemtools)

data(trust, package = "stemtools")
head(trust)
#>                       police                 government
#> 1 Neither Agree nor Disagree            Rather Disagree
#> 2           Definitely Agree        Definitely Disagree
#> 3           Definitely Agree        Definitely Disagree
#> 4 Neither Agree nor Disagree            Rather Disagree
#> 5 Neither Agree nor Disagree Neither Agree nor Disagree
#> 6           Definitely Agree Neither Agree nor Disagree
#>                           eu                army                 scientists
#> 1 Neither Agree nor Disagree    Definitely Agree Neither Agree nor Disagree
#> 2 Neither Agree nor Disagree        Rather Agree            Rather Disagree
#> 3               Rather Agree    Definitely Agree            Rather Disagree
#> 4            Rather Disagree    Definitely Agree Neither Agree nor Disagree
#> 5 Neither Agree nor Disagree    Definitely Agree               Rather Agree
#> 6 Neither Agree nor Disagree Definitely Disagree Neither Agree nor Disagree
#>      eu_index   nat_index age         W biggest_concern1 biggest_concern2
#> 1    Likes EU     Neutral  41 0.6452540     Unemployment     Unemployment
#> 2     Neutral   Satisfied  38 0.1800867      Immigration       Corruption
#> 3    Likes EU Disatisfied  39 0.2939668      Immigration       Healthcare
#> 4     Neutral     Neutral  34 0.8033685      Immigration       Healthcare
#> 5    Likes EU   Satisfied  41 0.3471653       Healthcare     Unemployment
#> 6 Dislikes EU Disatisfied  35 1.2082489      Immigration       Healthcare
#>   biggest_concern3
#> 1      Immigration
#> 2      Immigration
#> 3       Corruption
#> 4       Corruption
#> 5      Immigration
#> 6       Corruption

The core function is stem_plot(), a convinient wrapper around stem_summarise(), followed by a flexible ggplot2 template. The aim of the stem_plot() is to visualize aggregated data, namely counts and means.

stem_plot(data = trust, item = police)
#> No weights used.

stem_plot() automatically recognizes whether the item variable is numerical or categorical. For numerical variables, grouping variable should also be included using group argument. Survey weights can be used by calling the weight argument.

stem_plot(data = trust, item = age, group = eu_index, weight = W)

Customizing labels

By default, stem_plot() shows labels for each geom (set labels = FALSE to hide them). Looks of the labels can be customized using several arguments. label_scale multiples labels by specified amount (e.g. 100). label_accuracy rounds labels - 1 for no decimal places, 0.1 for one decimal place, 0.01 for two, and so on. label_suffix and label_prefix allow to add suffix and prefix respectively. label_big controls thousands separator and label_decimal control which character is used to separate decimal digits. You can also hide labels lower than a certain threshold with label_hide (with 0.05 being the default). Additionally, you can pass a list into label_args to specify additional arguments like vjust or position:

stem_plot(data = trust,
          item = government,
          label_scale = 100,
          label_suffix = "%",
          label_hide = 0)
#> No weights used.

Customizing title, caption and axes

By default, plots include title. If the item has an attribute label (such as one created using the labelled package) and title_label = TRUE, the label will be used instead of variable name. You can use title_wrap to set how many characters before the title is wrapped on the next line. If you don’t want to use title, set title = FALSE. If you want to use your own title, use stem_plot(..., title = FALSE) + ggplot::labs(title = "My title").

By default, the plot also includes a caption with sample size. To get rid off the caption, set caption = FALSE.

stem_plot(data = trust,
          item = government,
          title_label = TRUE,
          caption = TRUE)
#> No weights used.

You can pass ggplot2’s scale functions into arguments scale_x, scale_y, scale_fill and scale_color. Note that you may need to set limits by hand when using grouping variable and setting your own scale. Similarly, you can control legend using guides.

Lastly, you can reverse the order of item and group variable using item_reverse and group_reverse.

stem_plot(data = trust,
          item = police,
          group = eu_index,
          label = FALSE,
          scale_y = ggplot2::scale_y_continuous(labels = scales::percent_format(),
                                                limits = c(0, NA)),
          scale_fill = ggplot2::scale_fill_brewer(palette = "Set1"),
          scale_color = ggplot2::scale_color_brewer(palette = "Set1"),
          guides = ggplot2::guides(fill = ggplot2::guide_legend(reverse = TRUE),
                                   color = ggplot2::guide_legend(reverse = TRUE)),
          item_reverse = TRUE)
#> No weights used.

Collapsing categories

You can use collapse_item and collapse_group to collapse variable categories by passing a named list. The same arguments can also be used to rename categories.

stem_plot(data = trust,
          item = police,
          group = eu_index,
          label = FALSE,
          collapse_item = list(`Ano` = c("Definitely Agree",
                                         "Rather Agree"),
                               `Ani ano, ani ne` = "Neither Agree nor Disagree",
                               `Ne` = c("Rather Disagree",
                                        "Definitely Disagree")),
          collapse_group = list(`Pozitivní vztah` = "Likes EU",
                                `Neutrální vztah` = "Neutral",
                                `Negativní vztah` = "Dislikes EU",
                                `Neví` = "Doesn't Know"))
#> No weights used.

Weights

If weights variable is specified, weights are treated as survey weights passed into {svryr} (which in turn passes it into survey).

stem_plot(data = trust,
          item = government,
          label_scale = 100,
          weight = W)

Choosing geoms

You can choose other geoms to represent the data. For all estimates, 95% confidence intervals are also computed and can be plotted using an appropriate geom. You can also set additional arguments for your geom by passing a named list into geom_args.

Note that with unweighted data, the confidence intervals are computed using the basic $\sqrt{\frac{p \cdot (1-p)}{n}}$ formula. With very small proportions and small sample sizes, this can lead to the confidence intervals’ bounds being outside of realistic values (i.e. proportions lower than 0 or higher than 1). If the data are weighted, the confidence intervals are based on logistic regression and will always be properly bounded.

stem_plot(data = trust,
          item = police,
          group = eu_index,
          weight = W,
          label = FALSE,
          geom = ggplot2::geom_pointrange)