Skip to content

Instantly share code, notes, and snippets.

@dsyme
Last active July 30, 2020 19:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dsyme/dc86bf86de81b83b75557d4944db43c2 to your computer and use it in GitHub Desktop.
Save dsyme/dc86bf86de81b83b75557d4944db43c2 to your computer and use it in GitHub Desktop.
Topics
1. tokenization
2. parsing
3. checking and elaboration (i.e. producing TypedTree)
- $" " plain
- $"..." as FormattableString
- $"..." as PrintFormat
4. FSharp.Core support (printf.fs)
Code examples:
printf "abc %d def" 3
$"abc {1+1} def"
@$"abc {1+1} def"
$@"abc {1+1} def"
"""abc {1+1} def"""
## Tokenization
token, get " -> string (args with NormalString)
token, get $" -> string (args with InterpolatedString)
token, get {, and ars.Stack~~InterpolatedString -> token (args.PushABrace())
token, get }, and ars.Stack~~InterpolatedString and Braces=1 -> string/vstring/tqstring (args with NormalString)
token, get }, and ars.Stack~~InterpolatedString and Braces=N -> token (args.PopABrace())
token, get @" -> verbatimString
token, get @$", $@" -> verbatimString
token, get """ -> tripleQuoteString
token, get $""" -> tripleQuoteString
string, get "{", and args.InterpolatedString --> produce INTERPOLATED_STRING_FRAGEMENT, then go to token state + push
vstring, get "{", and args.InterpolatedString --> produce INTERPOLATED_STRING_FRAGEMENT, then go to token (args.Push "vstring")
New tokens:
INTERP_STRING_BEGIN_END --> $"cvkjvrkjhrve" $"""vrkjhrvhrewhervj""
INTERP_STRING_BEGIN_PART --> $"vrwhwver { $"""vrwjrlvjwe {
INTERP_STRING_PART --> } vrewhvrehkjervh {
INTERP_STRING_END --> } vrwkwjervh"
fsc --tokenize test.fs
## Parsing
```
atomicExprAfterType: // Q: WHY THIS ONE
| interpolatedString
interpolatedStringFill:
| declExpr
| declExpr COLON ident %prec interpolation_fill
interpolatedStringParts:
| INTERP_STRING_END
| INTERP_STRING_PART interpolatedStringFill interpolatedStringParts
interpolatedString:
| INTERP_STRING_BEGIN_PART interpolatedStringFill interpolatedStringParts
| INTERP_STRING_BEGIN_END
```
Giving these SyntaxTree extensions:
```fsharp
type SynExpr =
...
| InterpolatedString of
contents: SynInterpolatedStringPart list *
range: range
type SynInterpolatedStringPart =
| String of string * range
| FillExpr of SynExpr * Ident option
```
## Checking and Elaboration
1. $"..." : overallTy --> Check if overallTy unifies with 'string' etc. as per spec
2. Then put together the fragments into one format string using `%P()` or `%alignmentP(format)` as holes as per spec
3. Do normal format string checking of the overall format string, with %P(..) allowed
--> Extract type information about the format string
4. In the case where $".." is being used as a string or a PrintfFormat
Make a call to PrintfFormat<...>(format)
Fill in Captures and CaptureTypes in the PrintfFormat object.
If $"..." is being used as a string then call "sprintf" taking the PrintfFormat as argument
e.g.
$"abc{x,5}" --> Printf.sprintf (new PrintfFormat("abc%5P()", [| x |], null))
$"abc{1+1}def" --> Printf.sprintf (new PrintfFormat("abc%P()def", [| box (1+1) |], null))
$"abc%d{1+1}def" --> Printf.sprintf (new PrintfFormat("abc%d%P()def", [| box (1+1) |], null))
In the case where $"..." is being used as a .NET FormattableString then some different codegen is needed, also
more restrictions apply (e.g. no % patterns are allowed), as per spec. Codegen becomes a
call to FormattableStringFactory.Create, e.g.
($"abc {x} {y:N}" : FormattableString)
--> FormattableStringFactory.Create("abc {0} {1:N}", [| box x; box y |])
## printf at runtime
- Given format string object containing
.FormatString (.Value) --> the string, e.g. "abc%d%P()def"
.Captures --> null for a normal old-style printf, non-null for capturing interpolation
.CaptureTypes --> null for a normal old-style printf, non-null of there are %A patterns
- Aim of `sprintf` is EITHER
1. produce a string (if interpolated printf formatting)
2. produce a curried function of the right type (if old-style printf formatting)
- Two phase approach
1. crack the format string into an array of "steps"
1b. if producing a curried function, generate the curried function now `(fun arg1 -> (fun arg2 -> .... <phase2>))`
2. iterate over the steps writing the output fragments
There is a two-level Cache, type-directed table
type Cache<'Printer, 'Residue, '...> =
static let mutable recent = ...
static let mutable dict = ConcurrentDictionary....
The Cache holds the results of phase 1
This is how printf has always worked since 2012 or so. The main addition here is that "phase 2" can fill in the arguments
and relevant %A types from Captures/CaptureTypes rather than the arguments of the curried function chain.
Basic runtime action of sprintf will
1. Look up cache, populate with phase1 results if needed
2. run phase 2, return the string.
## Tooling
1. Extra complication for reporting locations of %d etc. in interpolated strings.
2. Extra complication for making sure we can take a correct continuation from tokenization.
$""" vwhvwerkhj vwekh wvekjh vwe { <--- take continuation at the end of each line
Test cases related to tokenization:
```
$""" vwhvwerkhj vwekh wvekjh vwe {
#if GOO
vwevw
}
#else
fwewe
}
#endif
vwekhwevvew"""
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment