Skip to content

Instantly share code, notes, and snippets.

@Varriount
Last active March 31, 2024 01:12
Show Gist options
  • Save Varriount/c3ba438533497bc636da to your computer and use it in GitHub Desktop.
Save Varriount/c3ba438533497bc636da to your computer and use it in GitHub Desktop.
Nim-By-Example: Strings

String Literals

Strings literals in Nim can be written by surrounding text with either a single pair of double-quotes or with three pairs of double-quotes. Single-quoted string literals may only span one line, while triple-quoted string literals may span multiple lines.

"Hello World"
"""
  Hello
  world!
"""

Single-quoted string literals (but not triple-quoted string literals!) may contain escape sequences (denoted by a preceding backslash), which are used to specify special characters. A full list of available escape sequences is here.

"The code example said \"Hello world! \" in bold lettering"

To create a raw single-quoted string literal (a string literal that doesn't interpret escape characters), precede the first quote with an 'r'. To specify a double-quote character in a raw single-quoted string literal, put two double-quote characters side-by-side.

r"The code example said ""Hello world! \nI am here!"" in bold lettering"

Note: Triple-quoted string literals don't have a raw form, since they don't interpret escape sequences anyway.

Besides 'r', Nim also allows procedure, template, and macro names to directly precede the first quote in single- and triple-quoted string literals. This is another way of calling the named procedure/template/macro with the string literal as the first argument. Note: The strings passed to the named procedure/template/macro is always in raw form.

proc removeCommas(s: string): string =
  result = newStringOfCap(s.len)
  for character in s:
    if character == ',':
      continue
    else:
      result.add(character)

let s = removeCommas"I, have,, too, ,man,y comm,,as"
assert(s == "I have too many commas")

It's important to remember that although Nim's string literals are encoded in utf-8, data that is added to a string can be in any encoding.

String Data Type

The string data type in Nim is represented as a reference to a mutable, dynamically-resized array of 8-bit integers. Aside from creating a string via a string literal, a string instance may also be created using the newString and newStringOfCap procedures.
The newString creates a new string given length. All the data in the string is initialized to 0 (null). The newStringOfCap procedure creates a new string with a length of 0, but with a given amount of memory pre-allocated for additional string data to be added onto later.

var x = newString(3) # Creates a new string with 3 null characters.

var y = newStringOfCap(10) # Creates a new string with no characters, but with
                           # 10 characters worth of memory allocated for later
                           # use.
y[0] = a # Error! Though the string has an initial capacity of 10 characters,
         # it has a length of 0.
y.add('a') # Uses the reserved memory.

Common String Operations

; Reading string data To get the length of the string (the number of 8-bit characters it contains) use the len procedure:

assert(len("Hello world!") == 12)

To retrieve the n'th 8-bit character in an array, use the subscript operator along with the integer n-1 (string indexing starts at 0):

var a = "abc"
echo a[0] # 'a'
echo a[1] # 'b'
echo a[2] # 'c'

You can also prefix the index with '^' to start indexing from the end of the string (reverse indexing starts at 1):

var a = "abc"
echo a[^3] # Is transformed to `a[len(a)-3]` and returns 'a'
echo a[^2] # Is transformed to `a[len(a)-2]` and returns 'b'
echo a[^1] # Is transformed to `a[len(a)-1]` and returns 'c'

To get the highest valid index in a string, use the high procedure. To get the lowest valid index, use the low procedure.

let s = "Hello!"
assert(high(s) == 5)
assert(low(s) == 0)

The double-equals operator (==) tests whether the content of two strings are the same, returning false if they are not equal, and true if they are. The cmp procedure behaves similarly, performing lexicographic comparison. cmp returns -1 when the first string is less than the second string, 0 if both strings are equal, and 1 if the first string is greater than the second string.

let
 a = "a"
 b = "b"
 c = "c"
 d = "a"

assert(a == d)
assert(cmp(a, b) == -1)
assert(cmp(a, d) == 0)
assert(cmp(c, b) == 1)

; lexicographic comparisons (<=, >=), subcript operations (slices), isNil ; strutils, delete

A Note on String Assignment

Though it's a reference type, a string, like object types, has copy-on-assignment behavior. This means that whenever a string is assigned from one variable or location to another, a new string is created at the destination and filled with the old string's data.

var
  a = "I'm a string"
  b = a

assert(addr a != addr b)

This behavior is a consequence of Nim's strings being both mutable, and represented as references]to dynamically allocated and resized character arrays. When a string needs to be resized, it must be reallocated to a new area of memory that can hold it's data (plus some extra space for future additions). This means that all the references to the string must change to point to the new memory location. The current behavior means that a string usually only has one reference that needs to be updated. To support more than one reference would either require tracking and updating multiple areas of memory, or using a double-reference mechanism - both of which are either error-prone or expensive in terms of efficiency.

If your program seems to be slow, or is taking up large amounts of memory, check to make sure there isn't excessive string copying somewhere. To suggest to the compiler that a string shouldn't be copied for a particular assignment operation, you can either mark the string as shallow with the shallow procedure, or use the shallowCopy procedure to assign the string to a variable without copying:

proc main =
  var a = "Hello"

  # Note that the addresses of the strings are compared by the address of their
  # first characters.
  # If addr(a) were to be used, then the address of the 'a' variable on the 
  # stack would be retrieved.

  # Assigning 'a' to 'b' causes a copy - there are now two new strings
  var b = a
  assert(addr(a[0]) != addr(b[0]))

  # Using `shallowCopy` causes the string *reference* to be assigned.
  # Both 'a' and 'c' now contain references pointing to the same string.
  var c: string
  shallowCopy(c, a)
  assert(addr(a[0]) == addr(c[0]))

  # Using 'shallow' on the variable 'a' causes all following string assignments
  # to be shallow.
  shallow(a)
  var d = a
  assert(addr(a[0]) == addr(d[0]))

  # Note that adding to the string might still change the underlying
  # reference and cause string copying!
  for i in 0..3:
    c.add("abcdefjhijklmnopqrstuvwxyz")
  assert(addr(a[0]) != addr(c[0]))

main()

There are some caveats with using the shallow/shallowAssign procedures. First, the compiler is free to ignore them under certain circumstances (one such notable circumstance is when working with global variables). Second, shallow assignments do not negate the need for strings to move/reallocate memory for length-changing operations.

If you need strings that don't exhibit copy-on-assignment behavior and support length-changing operations, use string references:

proc newStringRef(s = ""): ref string =
  new(result)
  result[] = s

proc main =
  var 
    a = newStringRef()
    b = a

  assert(addr(a[0]) == addr(b[0]))
  for i in 0..3:
    # Unless the {.experimental.} pragma is enabled, string references must be
    # manually dereferenced before making most kinds of string procedure calls.
    a[].add("abcdefghijklmnopqrstuvwxyz")
  assert(addr(a[0]) == addr(b[0]))

  
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment