mikeschinkel/StringLiteralsAreAPITA.md

## StringLiteralsAreAPITA.md

      
    Raw
  

              StringLiteralsAreAPITA.md
            
          
    Managing String Literals is a PITA,
Not Just in GoLang but in (All?) Other Languages

When working in Go, or Bash, or PHP, or any other language for that matter I find one of the most tedious tasks that are also very error prone is management of literal strings.
AFAIK no programming language has addressed this head on, ever. If I am right about that, it would be great if Go was the first.
Literal String Management

Management of literal strings occurs especially with URLs and local paths, but also with many other values that end up as literals used in source code.  And there is not one good way to handle them, only several different ways all with major cons, and so developers are all over the map in how they handle string literals.
One approach is for a developer to assign a string literals to a named constant, and then use that name constant wherever the value is needed, e.g.:
const MyPath1 = "path1"
func func1() {
   print(MyPath1)  
}

The Benefits of String Constants

The upside of using a constant — as any progressional developer should already know — is that it is defined in one place and can be changed in one place, assuming that embedded assumptions about how the constant is used are not changed.
The Reason Not to Use String Constants

A  downside of using constants instead of literals is the developer using the constant often cannot see the literal value in context making it hard to reason about the code. This is especially true for testing where developers want to look at test code and not have to go searching for references to be able to understand the tests.
Another downside is that the developer must go to the effort of defining said constant, including deciding which package to place it in and selecting a good name for the constant.
The Sad Truth of Developer Behavior

The upshot is I find that many (most?) developers simply hardcode and duplicate literals wherever they are needed in their code, which makes a codebase fragile and hard to refactor. For refactoring best case that we can do is know that which was defined elsewhere might match during a search-and-replace, but does not necessarily mean that it will match correctly.
Representing Many Related Values as String Constants

Further I find that — especially with URLs and paths — that one value could be comprised of other values, such as a domain or root path, and then the path segment.  To represent these as constants requires all values to be constant and/or to mix constants and vars with a lot of anonymous function and fmt.Sprintf() boilerplate:
const SiteDomain = "www.example.com"
const RootURL = "https://" + SiteDomain
const AdminProfileURL = RootURL + "/admin"
var ProfileURL = func(username string) string {
   return fmt.Sprintf("%s/%s", RootURL, username)
}

PROPOSAL: Consider Literal String Types and Literal Template Types

What would be helpful would be a way to declare a named constant as a literal string type with a prefixing capital L, and also declaring a named literal template type with a prefixing capital T:
const SiteDomain = L"www.example.com"
const SiteURLTemplate = T"https://%s"
const RootURL = SiteURLTemplate(SiteDomain)
const ProfileURLTemplate = T"%s/%s"(RootURL)
const AdminProfileURL = ProfileURLTemplate("admin")

In the above the templates use fmt.Sprintf() positional placeholders, but an alternate approach to literal named template variables — and one that could co-exist with the one above — could be to be leverage Go templates, in part or in whole:
const SiteDomain = L"www.example.com"
const SiteURLTemplate = T"https://{{.Domain}}"
const RootURL = SiteURLTemplate(SiteDomain)
const ProfileURLTemplate = T"%s/{{.Username}}"(RootURL)
const AdminProfileURL = ProfileURLTemplate("admin")

Note how in this envisioned approach a template can have its values substituted positionally within a const declaration?  This is an important aspect of the envisioned approach as it makes for a consistent way to define these literals.
Use of Literal Types in the Large

However, beyond defining these constants together, the larger value of the literal types would be the ability to define them and then use them across a large codebase and still maintain the association between a declared constant literal and a used literal.
Consider the following, which should be flagged as a compiler error per the concept as presented:
const SiteDomain = L"www.example.com"
func showDomain() {
   print(L"www.example.com") // Not declared in showDomain()
}

However, the following would not be flagged as an error since the same literal string is declared with var in local scope with the same name as declared in package scope as a const :
const SiteDomain = L"www.example.com"
func showDomain() {
   var SiteDomain = L"www.example.com"
   print(L"www.example.com") 
}

But the following would also work, of course:
const SiteDomain = L"www.example.com"
func showDomain() {
   var SiteDomain = L"www.example.com"
   print(SiteDomain) 
}

And the following here would work as well:
const SiteDomain = L"www.example.com"
func showDomain() {
   print(SiteDomain) 
}

But the following would not work because it redeclares SiteDomain to have a different literal string value compared to the value declared for the const. Note it would be the L prefix in the var declaration that would trigger this behavior:
const SiteDomain = L"www.example.com"
func showDomain() {
   var SiteDomain = L"profile.example.com"
   print(SiteDomain) 
}

Literal Type Use Across Packages

If the SiteDomain constant were declared in a package called other then it would need to look like the following:
func showDomain() {
   var SiteDomain = other.L"www.example.com"
   print(L"www.example.com") 
}

Or simply the following, which of course already works in GoLang:
func showDomain() {
   print(other.SiteDomain) 
}

Literal Types in All Its Permutations

And here is a larger example showing the various permutation that would work and not work (ignoring the straight-forward case of references external literal strings from other packages to keep the permutations down to reasonable):
const MyPath1 = L"path1"
const MyPath2 = L"path2"
const MyPath3 = T"/{{.Seg1}}/{{.Seg2}}"
const MyPath4 = MyPath3(MyPath1,MyPath2)
literal "literal/path"

func func1() {
   var MyPath1 = L"path1"    // works as expected
   print(MyPath1)            // works as expected
   print(L"path1")           // works as expected
   print(MyPath4)            // works as expected
   print(L"literal/path")    // works as expected
}
func func2() {
   var MyPath1 = L"path2"    // Error: mismatch w/MyPath2
}
func func3() {
   print(L"path1")           // Error: MyPath1 not declared
}
func func4() {
   const MyPath1 = L"path1"  // works as expected
   print(MyPath1)            // prints 'path3'   
   print(L"path1")           // prints 'path3'   
   print(MyPath2)            // prints 'path2'   
   print(L"path3")           // Error: MyPath1 not declared
}
func func5() {
   print(MyPath3("x","y"))   // prints '/x/y'
   print(T"/{{.Seg1}}/{{.Seg2}}"("x","y"))      
                             // Error: MyPath3 not declared
}
func func6() {
   var MyPath3 = T"/{{.Seg1}}/{{.Seg2}}"
   print(T"/{{.Seg1}}/{{.Seg2}}"("x","y"))      
                             // prints '/x/y'
}
var s = struct {
   Seg1 string
   Seg2 string
}{
   Seg1: "a",
   Seg2: "b",
}
func func7() {
   print(MyPath3(s))         // prints '/a/b'
}
func func8() {
   var MyPath3 = T"/{{.Seg1}}/{{.Seg2}}"
   print(MyPath3(s))         // prints '/a/b'
   print(T"/{{.Seg1}}/{{.Seg2}}"(s))                                          
                             // prints '/a/b'
}
func func9() {
   print(T"/{{.Seg1}}/{{.Seg2}}"(s))
                             // Error: MyPath3 not declared
}

Why the Duplication?

This is likely to be the first thing developers think of, especially when they have immersed in the "Don't Repeat Yourself" (D.R.Y.) mantra embraced by many developers. However, there is bad repetition and then good repetition.
In the case of Literal Types, the syntactical constructs are designed to allow for literal strings to be duplicated so that the strings values can be seen in the context in which they are used, but that are also associated with names that are checked by the compiler that ensures the same named values are always in sync.
If an errant search-and-replace accidentally changes a literal value in one place but not all places, the compiler will catch is when using literal types.  Not so with just using literal strings.
Benefits of Literals Types

The benefits to this adding literal string and literal template types to Go would be to:

Allow developers to use string literals in context while still getting the benefit of using lexically-scoped constants,
Enable the creation of Linting Tools that could enforce approaches to using string literals.
Establish a single recognized best-practice approach for handling string literals,
Make declaration of literal string and literal template types consistent via the use of const to declare them,
Allow the Go compiler to recognize the literals values as literals and thus be able to optimize their usage where possible,
Empower IDEs and good editors to treat these literal strings as lexically recognized units rather than simple strings of characters whose equivalence may or may not be coincidental.

What gave rise to this idea?

I once worked for a consulting team that views itself as a leader in Javascript-related testing. They had developed an approach to testing they felt should be a best practice, and as such working for them required this best practice to be following lest their clients see people they hired not using their own advice.
So I was asked to write test code for a Go program and at first I wrote using the out-of-the-box testing approach build into Go, but the CTO for this company was very unhappy with me and wanted me to use his spin on behavioral testing, which meant adopting Ginkgo. As I did not view myself as a thought leader in testing I said sure and went along with his direction.
However, soon he was unhappy Ginkgo-based testing because I was trying factor out repeatition and thus was writing code that abstracted the test process. But he wanted to be able to look at the test and immediate "see" what was being tested, which was a preference that was very hard to argue with, ignoring the repetition concern.
Still, I find it hard to manage lots of duplication because I am a divide-and-conquer developer, so if I am littering my code with the same information across many test cases — information that will often be duplicated — it is pretty much impossible for me to divide up the code to conquer it. Developing that way is very stressful for me because I can not long finish a section of code and then assume it is handled because if I make changes to any of URLs, paths, or other string literals anywhere else in the code, I might break that which I'd already completed, at least in my mind. It puts me in what feels like a never-ending cycle of making changes on new code and then having to code back and fix all the old code.
So I racked my brain and came up with an epiphany; the duplication itself was not the problem per-se, but being able to keep the duplication in sync was the actual problem.  So I wrote a package that included a method called Literal() where you "declare" string literals and then I added a method called  L() where you can use those literals:
import (
   . "github.com/mikeschinkel/go-literals"
)
func init() {
   Literal( "site-domain", "www.example.com" )
}
func showDomain() {
   print(L("www.example.com")) 
}

In addition, at the end of a Test I would call CheckLiteralUsage() which would tell me:

If any literals had been used by L() but never declared with Literal() , and
If any literals had been declared with Literal() but never used by L().

Using that check allowed me to put closure on the tests I had completed but still know if I modified one of the string usages on purpose but accidentally did not modify them all. It was not perfect, but it was much better than coding duplicate string literals in many tests without it.
One thing my go-literals package did not require was to associate the L() usage of the literal with the name defined when calling Literal() because to do so would add not-insignificant runtime overhead and would also require way too much boilerplate thus making the code obtrusive and reducing the value of the technique.
Of course, if the Go compiler were to handle checking the associations at compile-time vs. doing so oit at runtime then there should be no runtime overhead. Matter of fact, doing it at runtime might allow for runtime optimizations that cannot be handled by Go with mere strings today.