Skip to content

Instantly share code, notes, and snippets.

@davebraze
Last active July 30, 2023 04:59
Show Gist options
  • Save davebraze/f73cc377fdee3b0b0373f0abc5a9725b to your computer and use it in GitHub Desktop.
Save davebraze/f73cc377fdee3b0b0373f0abc5a9725b to your computer and use it in GitHub Desktop.
R Tables

Intro

Getting the right tables for a project can be fiddly in R, both for content and format. What I really prefer is clean separation between functions for generating content of a table, from those to do with its formatting. I am often, but not always, working in the context of an rmarkdown based workflow. There, fine control over format details will usually require making use of tools peculiar to the output type of the document (pdf, html, etc). This can complicate things a bit.

Table Content

These methods and tools are primarily about getting table content right.

Summary tables from dataframes

dplyr::

Using dplyr to create a summary table with the desired statistics has the advantage of allowing you to easily tailor your selection of summary statistics. Then, other tools can be used for formatting.

Rolling your own summary table with dplyr involves several steps.

  1. Use select() to choose the variables from your df to summarize, along with any grouping variables.
  2. Use group_by() to get summaries by group according to a categorical variable, or cutpoints on a continuous variable.
  3. Use summarize_all() to set up the desired summary stats and their labels.
  4. Use gather(), mutate(), select(), and spread() to convert the resulting wide df into a proper table, with rows and columns ordered as desired.

gtsummary::

https://www.rdocumentation.org/packages/gtsummary/versions/1.6.1

"The gtsummary package was created to streamline these everyday analysis tasks by allowing users to easily create reproducible summaries of data sets, regression models, survey data, and survival data with a simple interface and very little code. The package follows a tidy framework, making it easy to integrate with standard data workflows, and offers many table customization features through function arguments, helper functions, and custom themes."

skimr::

https://cran.r-project.org/web/packages/skimr/

"A simple to use function to summarize dataframes. It can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs."

The data building functions in this package (skim() etc.) return the table already having converted all columns to type character. It does its own thing in formatting numeric values for sig digits and so on, but not well in my experience. It is easy to pipe the result into kable, or other formatting packages, but those generally work best if table columns arrive in their native type.

tableone::

https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html

"eases the construction of “Table 1”, i.e., patient baseline characteristics table commonly found in biomedical research papers. The packages can summarize both continuous and categorical variables mixed within one table. Categorical variables can be summarized as counts and/or percentages. Continuous variables can be summarized in the “normal” way (means and standard deviations) or “nonnormal” way (medians and interquartile ranges)."

The idiom summary(CreateTableOne(df)) builds reasonable content. But the result does not play well with either pipes or kable.

table1::

https://cran.r-project.org/web/packages/table1/

"Create HTML tables of descriptive statistics, as one would expect to see as the first table (i.e. "Table 1") in a medical/epidemiological journal article."

Gmisc::

https://cran.r-project.org/web/packages/Gmisc/

"Tools for making the descriptive "Table 1" used in medical articles, a transition plot for showing changes between categories (also known as a Sankey diagram), flow charts by extending the grid package, a method for variable selection based on the SVD, Bézier lines with arrows complementing the ones in the 'grid' package"

tangram::

https://cran.r-project.org/web/packages/tangram/

Tangram intends to create production quality summary tables. It uses a formula based table specification method. Provides output in html5, markdown, or latex.

"The steps of the process are formula parser, statistical content generation from data, to rendering. Each step of the process is separate and user definable thus creating a set of building blocks for highly extensible table generation. A user is not limited by any of the choices of the package creator other than the formula grammar. For example, one could chose to add a different S3 rendering function and output a format not provided in the default package. Or possibly one would rather have Gini coefficients for their statistical content. Routines to achieve New England Journal of Medicine style, Lancet style and Hmisc::summaryM() statistics are provided. The package contains rendering for HTML5, Rmarkdown and an indexing format for use in tracing and tracking are provided."

Deducer::descriptive.table()

library("Deducer")
descriptive.table(
  vars = d(mpg,hp),
  data = mtcars,
  func.names =c("Mean", "Median",
  "St. Deviation", "Valid N",
  "25th Percentile", "75th Percentile"))

RcmdrMisc::numSummary()

library("RcmdrMisc")
numSummary(
  data.frame(mtcars$mpg, mtcars$hp),
  statistics = c("mean", "sd", "quantiles"),
  quantiles = c(.25, .50, .75))

Model summaries from model objects

modelsummary::

This package "produces beautiful, customizable, publication-ready tables to summarize statistical models." It's built on top of gt::.

https://github.com/vincentarelbundock/modelsummary

sjPlot::

sjPlot has various functions for generating summary tables from model objects. Table output functions seem to only support HTML. sjPlot functions serve to both run the model and generate the summary. Available functions include:

  • tab_df(): print dataframes as html tables
  • sjt.xtab(): contingency tables
  • sjt.corr(): correlation tables
  • tab_model(): regression tables
  • sjt.pca(): principal component analysis
  • sjt.fa(): factor analysis
  • sjt.itemanalysis(): item analysis based on classical test theory

https://cran.r-project.org/web/packages/sjPlot/

pander::

Table Formatting

huxtable::

https://hughjonesd.github.io/huxtable/

"Huxtable is an R package to create LaTeX and HTML tables, with a friendly, modern interface. Features include control over text styling, number format, background color, borders, padding and alignment. Cells can span multiple rows and/or columns. Tables can be manipulated with standard R subsetting or dplyr functions."

Documentation includes a vignette with an incomplete but still useful list of other table formatting packages in R with a feature comparison matrix.

knitr::kable and kableExtra::

kable and kableExtra are probably the gateway for formatting tables in rmarkdown files, with reasonable support for pdf and html output files https://cran.r-project.org/web/packages/knitr/

https://cran.r-project.org/web/packages/kableExtra/

https://community.rstudio.com/t/nice-tables-when-knitting-to-word/3840

https://community.rstudio.com/t/creating-a-complex-table-from-a-data-frame/8742/5

"Build complex HTML or 'LaTeX' tables using 'kable()' from 'knitr' and the piping syntax from 'magrittr'. Function 'kable()' is a light weight table generator coming from 'knitr'. kableExtra simplifies the way to manipulate the HTML or 'LaTeX' codes generated by 'kable()' and allows users to construct complex tables and customize styles using a readable syntax."

pander::

https://cran.r-project.org/package=pander https://rdrr.io/cran/pander/f/vignettes/pandoc_table.Rmd

Pander is an interface to the pandoc utility. Pander includes core functionality for table formatting in the function pandoc.table(), which is oriented toward a markdown based workflow. Pandoc.table() supports a wide variety of formatting options (highlighting, styles, etc.).

Pander affords control over number formatting (e.g., rounding, sig digits, decimal & big number symbols), table and cell width, cell highlighting etc.

Pander purports to work well with a knitr based rmarkdown workflow. PDF, docx, and html outputs are all supported.

formattable::

https://cran.r-project.org/web/packages/formattable/

https://www.littlemissdata.com/blog/prettytables

"functions to create formattable vectors and data frames. 'Formattable' vectors are printed with text formatting, and formattable data frames are printed with multiple types of formatting in HTML to improve the readability of data presented in tabular form rendered in web pages."

flextable::

https://cran.r-project.org/web/packages/flextable/

"Create pretty tables for 'HTML', 'Microsoft Word' and 'Microsoft PowerPoint' documents. Functions are provided to let users create tables, modify and format their content. It extends rstats package officer::"

gt::

https://gt.rstudio.com/

https://github.com/rstudio/gt

"make wonderful-looking tables using the R programming language. The gt philosophy: we can construct a wide variety of useful tables with a cohesive set of table parts. These include the table header, the stub, the stub head, the column labels, the table body, and the table footer."

"generate information-rich, publication-quality tables from R"

Uncategorized

compareGroups::

https://cran.r-project.org/web/packages/compareGroups/vignettes/compareGroups_vignette.html

This package seems to address both content and formatting.

"The compareGroups package lets users to create tables displaying results of univariate analyses, stratified or not by categorical variable groupings. Tables can easily be exported to CSV, LaTeX, HTML, PDF, Word or Excel, or inserted in R-markdown files to generate reports automatically. This package can be used from the R prompt or from a user-friendly graphical user interface for non-R familiarized users."

janitor::tabyl

https://cran.r-project.org/web/packages/janitor/

https://cran.r-project.org/web/packages/janitor/vignettes/tabyls.html

"tabyl() produces frequency tables using 1, 2, or 3 variables. Under the hood, tabyl() also attaches a copy of these counts as an attribute of the resulting data.frame. The result looks like a basic data.frame of counts, but because it's also a tabyl containing this metadata, you can use adorn_ functions to add additional information and pretty formatting."

arsenal::

https://cran.r-project.org/web/packages/arsenal/

"Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models."

qwraps2::summary_table

https://cran.r-project.org/web/packages/qwraps2/

"A collection of (wrapper) functions the creator found useful for quickly placing data summaries and formatted regression results into '.Rnw' or '.Rmd' files."

Amisc::describeBy

(not on CRAN)

stargazer::

Mostly focussed on tables for linear models?

https://cran.r-project.org/web/packages/stargazer/

english::

https://cran.r-project.org/web/packages/english/index.html

"Convert numbers to an English language version, one, two, three, ... Ordinals are also available, first, second, third, ..."

Useful especially for Notes, captions, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment