Skip to content

Instantly share code, notes, and snippets.

@davebraze
Last active March 26, 2024 21:49
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save davebraze/ff95762bea2868f52942c0bb2d247b29 to your computer and use it in GitHub Desktop.
Save davebraze/ff95762bea2868f52942c0bb2d247b29 to your computer and use it in GitHub Desktop.
On using Emacs for data work with R

I use GNU Emacs on MS Windows 11, specifically, the pre-packaged pre-compiled distributions for Windows provided by Vince Goulet (https://vigou3.gitlab.io/emacs-modified-windows/). He also provides a bundle for MacOS (https://vigou3.gitlab.io/emacs-modified-macos/). I have used, and occassionally still use, Emacs on a variety of different unixen. I believe most of what follows will apply to any GNU Emacs distribution or derivative on any platform, but of course, YMMV.

By way of background, I've been using Emacs since the late 80s as an IDE for various programming languages (e.g., pascal, C, lisp, matlab, python), and as a general text editor. I've also got a lot of mileage out of it's features for calendaring, scheduling, note-taking, and agenda making. So, when I started using R around 2001, it was natural to do my R scripting and programming in Emacs (using its ESS package, which I'd already been using with SAS since the early 90s). When RStudio came out in about 2011, I did give it a look, but it was pretty bare-bones at that point, and certainly didn't have anything to offer that motivated me to move away from Emacs/ESS. I've revisited RStudio periodically in the intervening years, and it has matured quite a bit. But, I still find Emacs to be the better tool for me, over-all.

You should be aware that using Emacs really requires a tinkerers mind-set. There are many many options and the out-of-the-box settings are probably not ideal for anyone. You will need to do some work in order to arrive at a configuration that works well for you. In the end, you're the only one who can decide if Emacs is right for you.

These notes are geared toward helping a potential new user of Emacs/ESS with R to to navigate some of those difficulties. I find using Goulet's pre-compiled version of GNU Emacs simplifies things greatly when working on Windows. His distributions are fairly unopinionated, being minimal modifications to the stock GNU Emacs distribution. The fact that Goulet's distributions are built on garden variety GNU Emacs means that you get all the benefit of the most widely used Emacs distribution. (If you want an opinionated Emacs distribution, you might look into Doom Emacs: https://github.com/doomemacs/doomemacs ). Everything below assumes you are using the Goulet distribution of GNU Emacs, but most of it will apply more generally.

Unix-like Command line tools on Windows

Much Emacs functionality depends on the availability of Unix/Linux like command line tools. If you're reading this, it's likely that you are an R user, but even if not, an easy way to get a set of these tools working on Windows is to install Rtools (https://cran.r-project.org/bin/windows/Rtools/) and add it (e.g., 'c:/rtools40/usr/bin') to your computer's PATH environment variable. Rtools is a toolchain designed to allow you to compile R from source (you don't want to do that), or install R packages from source rather than pre-compiled binaries; this is handy sometimes because not all R packages are available in pre-compiled form. But, those are side issues for present purposes. For now, it's enough to know that Rtools contains Windows versions of most unix/linux command line tools you will need to complete your Emacs experience.

There are a few other command line tools, not included with Rtools, that you may want to install, eventually. These include things like git, ripgrep, hunspell, & pandoc. When you do install them, you'll need to make sure they're in your PATH, so that Emacs can find them. But don't worry about them just yet.

Emacs Terminology

  • Buffers in Emacs correspond roughly to the contents of tabs in a web browser. In a browser, different tabs may display read-only content, like a web page or pdf, or an app, like a google doc or wordle, or just about anything that can be accessed by way of a URL. Another similarity is that a browser can have different tabs open to different parts of the same content (e.g., web page). Emacs buffers can hold the contents of a file, of an app, a web page, or just about anything else. Older versions of Emacs did not expose buffers by way of a 'tab' widget in the user interface, but newer versions do offer that option. A buffer may (or may not) be displayed in a window.
  • Frames, in Emacs-speak, correspond to what are called "Windows" in the vernacular of modern operating systems like MS Windows or Mac OS. A frame may contain one or more windows.
  • Windows, in Emacs-speak, correspond to a pane or subframe within an Emacs frame. A window displays the contents of a single buffer. Each window includes a mode-line that summarizes its status. It is possible to have more than one window displaying the same buffer (just as you can have more than 1 browser tab open to the same page).
  • A hook is a chunk of elisp code that will be run when a specific event occurs. The triggering event might be something like saving a file, or activating or deactivating a mode. Hooks are one of the tools available to help customize your emacs experience.
  • Mode line: A mode line appears at the bottom of every Emacs window. It displays useful summary information about the buffer shown in the window, including its name and current modes. Users can configure the mode line to display the information they find most useful.
  • Major Mode; Minor Mode; Package. See the following section for info on these important concepts.

EmacsWiki.org includes a helpful glossary of other Emacs terms.

Useful Emacs extensions (AKA 'Modes' which are installed/invoked via packages)

Emacs modes are divided into major modes and minor modes. An Emacs major mode is essentially a set of capabilities centered around a specific task (e.g. editing R code). Each Emacs buffer is associated with exactly one major mode. For buffers associated with files, the major mode will typically have to do with the type of file. A minor mode changes behavior of a major mode in subtle ways and can add functionality, adjust key mappings or display features, or really just about any emacs capability. An Emacs buffer can have more than one minor mode active at a time.

Note that the terms 'package' and 'mode' are not interchangeable. A package may provide more than one mode. For example, the yasnippets package includes a major mode for creating and editing snippets, and a minor mode for invoking them. Regardless, Emacs comes bundled with a rich set of packages and associated modes of operation. But often, a mode will need to be enabled (even for modes that come pre-installed), or installed and then enabled. You'll want to use the Emacs package manager for installing and updating packages, via menu "Options>>Manage Emacs Packages". You can enable and configure your installed packages via menu "Options>>Customize Emacs". Although, it is really best to do your configuration by editing your configuration file by editing your Emacs initialization file directly (either .emacs or init.el, depending), but I recognize this may be a heavy lift for beginners.

One stumbling block for new users of Emacs is that, since it has been around so long, there are often a number of packages/modes that provide similar features. So, it can be hard for the Emacs novice to know which one is 'best'. For example, there are a number of packages that provide auto completion features (e.g., selectrum, ido, vertico, company-mode, corfu, auto-complete).

Emacs packages are written in a variety of the lisp programming language called 'elisp' (guess what the 'e' stands for). If you do opt in to Emacs as your programming/coding tool of choice, you will eventually want to learn enough elisp to customize your work environment. For the most part, this just involves setting variables to particular values to control the behavior of major and minor modes (if you want something other than the default behavior). It's really not bad and I'll eventually add a few examples of how it works to this document. In the mean time, you might look at my Emacs configuration file, init.el, here.

Note that, like R, ELisp is case-sensitive. Pay close attention to case in the names of packages, functions, and variables. By convention, function and variable names are typically lower case, with hyphen used as a separator (e.g., global-set-key). Package names are generally lower case as well, but there are exceptions. The package providing R interaction is 'ESS'; the command within ESS to invoke R is 'R'.

Recommended Packages

Each Emacs buffer is associated with a single major mode, which determines the behavior of Emacs while in that buffer. Usually, the major mode is automatically set by Emacs, when you first visit a file or create a buffer. But, some pre-installed packages/modes are not enabled by default, and there are many useful packages that are not part of the core bundle and so must be installed before you can use them.

Here are some of the packages, and attendant major modes, that I find essential. Some are part of Enacs core, some are not and will need to be installed.

  • ESS: ESS (Emacs Speaks Statistics) is a mode for interacting with R and other statistical tools (S-Plus, SAS, Stata, Julia, and OpenBUGS/JAGS). ESS is essential for working with R. ESS is pre-packaged with the Goulet Emacs distros, but if you don't start with one of those, you may need to install it.
  • magit: Magit is the MAGical GIT interaction mode. There are two pronunciations in use. Mostly people say it with soft 'g' (like magic with a 't'). Less commonly, you hear it pronounced with a hard 'g', like "maggot". No matter how you pronounce it, magit is, by far, the best git porcelain I have ever used. It is much much more powerful and flexible than RStudio's git capabilities. Note that if you want to use magit for git interaction, you will have to install git itself, and make sure that it is on your PATH so that Emacs can find it.
  • polymode allows more than 1 major mode to be active within a single buffer. This is important for working with Rmarkdown files. You want different major modes active in the yaml block vs code chunks vs text portions of the Rmarkdown file. Polymode enables that. Be warned, polymode is a bit of a work in progress. I find that it sometimes fails to do proper syntactic highlighting for R code chunks. That doesn't bother me, too much. YMMV.
  • dired: Dired is a "directory editor" used to navigate your directories/folders, operate on files, etc. This is another essential mode. It is built in to Emacs. It is well worth spending the time learning to use it. Several packages extending its functionality are available.
  • projectile: Projectile is a project management tool for Emacs. You'll want to get this, especially if you use RStudio's project management features. (I never have, so can't compare features.) Projectile is not part of Emacs' standard installation (core bundle). You'll have to install it.
  • yasnippet: If you are a user of RStudio's code snippet capabilities you will appreciate this package. It provides Emacs with similar templating capabilities. In fact, the syntax for defining snippets in RStudio and Emacs is essentially the same (both derive from TextMate). I think it is installed as part of the Emacs core, but you'll have to enable it and set it up with your own library of snippets. You can see my snippets here.
  • ispell: Ispell is a spell checker. Very handy. I use it in conjunction with the external program "hunspell" because the spell-check backend that is most commonly used with Emacs (aspell) does not work on MS Windows. The Flyspell package is another useful adjunct to ispell.
  • occur: Occur is a regular expression based "multiple search" mode. It will give a clickable menu to all RE matches that 'occur' in the current file. Very powerful method for within file navigation. It is built in to emacs. Several packages extending its functionality are available.
  • calc: is a feature rich RPN calculator. It is built-in to Emacs.
  • package: This is Emacs package manager, useful for installing and updating packages to extend and modify Emacs' functionality.

Minor Modes

I've divided minor modes into "Essential" and "Nice to Have". Those listed as essential are minor modes that I use every time I use emacs. They are central to my experience in the editor. Nice to have minor modes are those that I find handy from time to time.

Essential

  • cua-mode: emulates Common User Access style editing giving something closer to the familiar keystrokes for copy/cut/paste/undo et cetera. I don't use cua-mode because the standard Emacs keystrokes are firmly ingrained in my fingers. It is built in to Emacs, but not enabled by default. If you are giving Emacs a trial run as a new user, you'll probably have a better experience if you enable cua-mode by adding the line (cua-mode 1) to your Emacs configuration file (which will be called either .emacs or init.el, depending).
  • hl-line-mode: highlight the current line. I mostly use this in dired mode where I keep it on by default, but it's also useful when looking at code with others, whether in person or sharing a screen on Zoom or whatnot.
  • company-mode: Company is an in-buffer auto-completion mode for emacs. It contrasts with and complements minibuffer completion modes like selectrum or vertico, which serve a different purpose. There are a number of other in-buffer completion modes (e.g., corfu) and I can't claim to have tried them all.
  • which-key: This minor mode provides a continuation menu for complex keystrokes that are standardly used to invoke commands in Emacs. No body remembers any but their own most frequently used keystrokes. Which-key helps you to navigate the rest. It is essential.
  • electric-pair-mode: When typing left hand member delimiter pair ( e.g., (, [, {) automatically insert the matching right-hand delimiter. If you select a block of text and hit the left-hand delimiter, the block will be bracketed with the appropriate LH and RH pair. Also works with double and single quotes. This mode is built-in to Emacs, but not globally enabled by default. For example, it is on for R source buffers, but not for R shell buffers. So, you may want adjust that.
  • ace-window: is a package that improves ability to navigate from one window to another within an emacs session. You may also want to consider the avy package that combines both between window and within file (buffer) navigation; I tend to use occur for within file navigation.
  • selectrum: A light-weight mode to modernize Emacs' stock minibuffer completion. If you go with selectrum, you'll also want to get prescient (see below). S is not part of the standard distribution. You'll have to install it. Nb, there are lots of other completion modes available (find a list at the selectrum link). I haven't tried most of them. Selectrum is a good one to start with, though. Once you're settled in to Emacs, you can experiment with others.
  • prescient: Used in conjunction with selectrum, prescient provides intelligent filtering and sorting of minibuffer completions. The actual package to install is selectrum-prescient, which will install prescient itself as a dependency.
  • marginalia: Adds annotations to minibuffer completion prompts. This will be very useful for new Emacs users, and is pretty nice even for older hands. Works in conjunction with selectrum.

Nice to Have

  • subword-mode: Tune cursor movement to stop at word components of "camelCase" names. It can be toggled on and off by buffer, or globally.
  • superword-mode: Tune cursor movement to treat "snake_case" names as single words. It can be toggled on and off by buffer, or globally.
  • multi-cursor mode: Multiple cursor functionality for Emacs.
  • string-inflection: convert among different object name styles: camelCase, snake_case, kebab-case, etc.

Some Essential Keystrokes

You will quickly realize that Emacs is built around a keystroke driven interface to its functions. This is in contrast to the mouse and menu driven approach of much other software. The bottom line is that a keyboard driven approach to invoking commands is more efficient than a pointy-clicky interface. This is similar to the argument, familiar to many R users, for using a script-based approach to statistical analysis versus a menu-driven approach.

The challenge is that there are a tremendous number of functions that you might want to access, and the more packages you add to extend Emacs, the more functions there will be. My own modest Emacs configuration offers more than 7000 functions. To be sure, I only use a small fraction of those at all, and even fewer on a regular basis. Not all of those functions are available by way of keyboard shortcuts, and that's fine because Emacs offers a way to access these commands by typing the full name, again without leaving the keyboard.

Many commands are accessible by way of keyboard shortcuts out of the box. One of the great advantages of Emacs is that the user can customize, in any way they please, which commands are invoked by which keys. Regardless, standard Emacs has so many commands bound to keyboard shortcuts that there are not enough single key shortcuts to go around. Many Emacs shortcuts are 2, or 3, or even 4 keystrokes long; see 'open file,' for example.

Another issue is that Emacs' standard set of keystrokes for invoking common text editing functions (e.g., cut, copy, paste, save file, etc) are different than what you will be used to. This may be one of the biggest hurdles you'll confront in adapting to Emacs. But there are ways to fix that: see cua-mode above.

Emacs' idiosyncracies around keystrokes extend even to terminology. A case in point is that the 'alt' key is standardly called the 'meta' key in Emacs documentation. Why isn't important, but you should be aware of it. I will include both the more conventional key notation like 'ctl-w' and 'alt-x' for 'control w' and 'alt x', respectively, but also note the standard Emacs shorthand notation for each key: 'C-w' (control w) and 'M-x' (meta x).

  • cut: ctl-w (C-w)
  • copy: alt-w (M-w)
  • paste: ctl-y (C-y)
  • open a file: ctl-x ctl-f (C-x C-f), then type in the file name. You can, instead, hit TAB to see all files in the current directory/folder, or to see completions for a partially typed file name. The same command/keystroke will create a new file if you give it a file name that does not yet exist.
  • save current file: ctl-x ctl-s (C-x C-s)
  • save all open files: ctl-x s (C-x s)
  • close current file/buffer: ctl-x k (C-x k) closes ('kills') the current buffer, prompting your to save it first, if there are unsaved changes.
  • go to list of all open files/buffers: ctl-x ctl-b (C-x C-b) opens a list of all available buffers in the current window. To jump to a specific buffer you can click on it in the list, or arrow down to the line you want and hit return.
  • switch to a different file/buffer: ctl-x b (C-x C-b)
  • close Emacs: ctl-x ctl-c (C-x C-c) closes emacs, prompting you to first save any files with unsaved changes.

Other useful keystrokes

  • alt-<space> (M-<space>): delete multiple spaces around cursor, leaving a single space.
  • ctl-h k (C-h k): followed by another keystroke will provide "help" on that following keystroke; it will say what command it is 'bound' to.
  • alt-x (M-x): followed by the name of a command, will invoke that command. Auto completion works here
  • ctl-x d (C-x d): start the directory editor
  • ctl-x 2 (C-x 2): split the current window in 2 horizontally. Different windows can be used to view different parts of the same file/buffer, or different files altogether. C-x 3 works the same but splits the window vertically. If more than 1 window is open, C-x 1 closes all windows but the active one, while C-x 0 closes the active window leaving all others. By default, C-x o advances to the 'next' window, but you really want to install the ace-window package and use that to navigate windows.

Customizing Keystrokes

In general, it is possible to map any keystroke to any available function within Emacs. The examples I've given above are, to the best of my memory, standard out-of-the-box mappings for GNU Emacs.

Useful commands

Here is are a handful of useful commands that are not, by default, bound to any keystroke. Invoke them by typing alt-x (M-x) and then then the name of the command followed by the return/enter key, (notated as <return>).

  • occur (M-x occur <return>): Find all matches to a regex in the current file/buffer and provide a menu for jumping between them.
  • shell (M-x shell <return>): Open a command line shell within emacs. It's possible to configure which shell is used.
  • R (M-x R <return>): Start an R session within Emacs, assuming ESS mode is installed and properly configured. Note the command name is an upper case 'R'.

Unicode

Emacs does a pretty good job of handling unicode out of the box. That doesn't matter much for programming, but it's very handy if you are writing prose, as you would if composing an Rmarkdown document. To insert unicode characters, start with the key stroke C-x 8. This calls up a menu of options for further keys to insert specific unicode characters (TODO: check to see if which-key needs to be enabled for this to work). Here are some examples:

  • C-x 8 * E gives € (Euro)
  • C-x 8 L gives £ (Pound)
  • C-x 8 o gives ° (Degree)
  • C-x 8 C gives © (Copyright)
  • C-x 8 S gives § (Section)
  • C-x 8 P gives ¶ (Paragraph)
  • C-x 8 ‘ e gives é. Choose a different different base character to get its accented version, upper or lower case.
  • C-x 8 ` e gives è. Extends to other base characters.
  • C-x 8 " e gives ë. Extends to other base characters.
  • C-x 8 ^ e gives ê. Extends to other base characters.

You can also hit C-x 8 <return> and then type the name of a unicode character and Emacs will insert it. Auto-completion works here; that's pretty handy. If you have selectrum and prescient installed, then emacs will do fuzzy matching on the names of characters. For example, if you type C-x 8 <return> bullet, Emacs will show you a pick-list of all unicode character names that include the string 'bullet'.

Of course, in order to display a particular unicode character, it must be included in the font you're using in Emacs. You can configure Emacs to use the font of your choice, from among those installed on your system.

Emacs Configuration

You can find my Emacs configuration file, init.el, here. It includes examples of how I've set up most of the packages mentioned here, BUT PROCEED WITH CAUTION. One way to make use of it would be to copy/paste parts of it that would seem to be useful to you. You should assume the Goulet distribution of GNU Emacs, and be aware that not all of the packages referenced in the configuration file are installed by default. Becoming familiar with the 'package' package referenced above should probably be your first move.

There is a decent youtube video on the basics of Emacs configuration available here. It gets off to a little bit of slow start, but worth watching.

Further Reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment