Skip to content

Instantly share code, notes, and snippets.

@david-fong
Last active July 21, 2021 05:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save david-fong/11cdffd21a0c1d948554bb819f08b025 to your computer and use it in GitHub Desktop.
Save david-fong/11cdffd21a0c1d948554bb819f08b025 to your computer and use it in GitHub Desktop.
lang-design-thoughts

Terse, Punctuation-Based Scope & Mutability Modifiers for Java-like Languages

🚧 This document is a work in progress. The spec, examples, and wording are not complete or finalized. Let me know if you find mistakes or anything confusing Discussion is welcome. Feel free to skip the background section if you really want to cut to the chase.

Background

OOP Languages I've Used

Here's where i'm coming from.

Language Used For Last Used Used For
JS/TS ~3 yr ongoing personal & work
Java ~2.5 yr 2021 work
C++ ~1 yr 2021 personal
C# ~0.67 yr 2021 work
Python ~0.3 yr 2019? personal

I have read a little about Rust and Kotlin, but have not used them yet.

I have not studied programming language design, and I don't know anything about other OOP languages that I haven't listed above. So perhaps there are better existing solutions to the problems I've listed that I haven't heard of. Some languages have partial solutions to the problems I want to solve, which I will mention later.

My High-Level Goals and Key Values

Here are the things that I find satisfaction in when reading and writing code:

  • Controlling scoping and mutability
  • Brevity
  • Visual Aesthetic

I am not trying to come up with an "optimal" solution with the freedom to radically depart from the current status quo. I am trying to come up with a terser way to express scoping and mutability constraints in the languages that I have worked with- specifically Java, TypeScript, and C#. I want to minimize non-syntactic changes.

I am not thinking of scripting languages such as Python or JavaScript. I don't feel comfortable enough with C++ to believe that the idea I will propose will be compatible with it.

Motivating Pain Points & Existing Solutions

Here are things I personally feel bothered by concerning scoping, reassignment, and mutability control in the OOP languages I have used.

Some people have linked me to cool Raku things like its is built and is rw traits, which I still need time to digest and internalize.

Choice and Placement of Modifiers

  • Chained keywords such as "public final" take up a fair bit of space. For classes with many fields, there end up being big blocks of keywords.

    • The C++ label style takes up much less space, but it creates another problem: I can't tell what a field's scoping is just by looking at the line it's on. I need to scan upward to find out.
  • The meaning of the word "protected" is not intuitive. Private and public are not as bad, but in a sense, I think they are all "symbolic" in that the word's english meaning does not directly indicate it's meaning in the programming language. Thinking of the conventional keywords as symbols opens up discussion for choosing better symbols.

  • The modifiers don't align column-wise at their right edge. This is really minor, but I feel really unsatisfied whenever I see it happen.

Grammar for Scope-Variable Variable Scoping and Reassignment

  • In Java, and C++, I sometimes need to make trivial getters and setters just to enforce different read/write permissions at different scopes.
    • I guess in Java-land and maybe C++, you'd be writing all those accessors anyway, but I don't like the heavy method syntax.
    • Kotlin and C#'s properties and TypeScript accessors have cleaner syntax. They allow for specifying different access levels for the getter and setter.
    • I like the above property / accessor solutions. They are simple and powerful, but I still think they could be terser.

Lack of Grammar for Scope-Variable Object Mutability

  • C++ (and structs in C#) have the const keyword to get an immutable view or copy of an object, but I'm not aware of a similar mechanism in Java-like languages. And sometimes I want to make a field's value publicly readable, but only allow mutations privately- not for any possible performance gains, but for the sake of interface design. Again, the current pervasive OOP languages require some acrobatics to achieve this.
    • In C++, you could use a const reference, but in languages that don't allow marking methods as non-mutating, you would need to define an interface for an immutable view of an object and make up-casting getters. This includes standard containers and custom containers and classes.

I'm making an exception here to my rule of minimizing non-syntactic changes because controlling mutability is one of my key goals.

Proposal

Punctuation Key

Punctuation General Meaning How to Remember
_ Hidden Some naming conventions use underscores to indicate privacy or internalness
. Readonly Periods are widely used for object member access
= Read-Write The equals sign is widely used to assign values to variables and members

I also like that there's somewhat of a vertical gradient where the shortest punctuation mark has the most strict meaning, and the tallest has the most permissive meaning.

There are alternative choices of characters I thought of including using "r" and "w" for readonly and read-write respectively. I think using letters is equally valid and may help with readability. I like the visual appearance of the punctuation, and that it has zero ties to non-programming languages, which may be more friendly for non-english speakers. On the other hand, on QWERTY keyboards, the period is far away from the underscore and equal sign, which I don't like.

One thing I don't like is that "_" and "." are kind of hard to tell apart. A solution to this could be to use the "$" sign for read-only, which harkens to bash-scripting.

Inheritance of Hidden-ness for Composition

Any contained thing is hidden at the same scopes for which its container is hidden. Specifically, objects, variables, functions, and namespaces. This should either be enforced by tooling such as the compiler, interpreter, or IDE, or by requiring inherited hidden marks to be omitted.

// Good
_.. namespace NamespaceA {

}

Contextual Meanings

The punctuation marks are arranged in tuples, where each entry corresponds to a scope. The most outer scope is the first entry, and the most inner scope is the last entry.

Non-Meanings

Here are the kinds of modifiers I have chosen to exclude from this proposal due to their meanings not being associable/variable with multiple scopes. These are better left for each language to decide what keyword to use.

  • function purity/const-ness (restrictions on the ability for a function to change the state of anything declared at the top level).

  • abstractness

  • overriding / implementing

  • Whether a method is allowed to mutate the associated instance.

Scoping of Top Level Declarations

When applied to top-level variables, top-level functions, namespaces, and classes, they are a 3-tuple of {package-exported, package-internal, file/"module"-internal}.

For functions and namespaces, "=" is not allowed.

For classes, "=" is used to control whether the class is extensible at each level.

Language-Specific Notes
  • TypeScript project references will be considered as packages in this spec.

Reassignment of Local Variables and Function Parameters

When applied to local variables, they are a 1-tuple of {local/closure}. By "closure scope", I mean when a variable is captured by a nested function (like JavaScript functions or C++ lambdas).

  • (Or maybe it would be useful to separate local and closure modifiers? I'm not sure right now. I haven't thought much about interactions with closures yet. Consider my thoughts here half-half-baked).

Scoping and Reassignment of Object Fields

When applied to declarations of object fields, they are a 3-tuple or 2-tuple of {public, protected, private} (I haven't decided on whether to include one for internal, which would be a 4-tuple). read-write specifies whether the variable can be reassigned (mutability is a separate matter).

For fields of sealed classes
  • _. private final
  • _= private
  • .. public final
  • .= private with public getter
  • == public
For fields of extensible classes
  • __. private final
  • __= private
  • _.. protected final
  • _.= private with protected getter
  • _== protected
  • ... public final
  • ..= private with public getter
  • .== protected with public getter
  • === public

Scoping of Object Methods

When applied to object methods, the tuple has similar meanings to object fields, except "=" is not allowed.

Mutation of Object Instances

When applied to type specifiers for instance objects (not functions or primitives), it is a N-tuple where N is the same as the length of the tuple of whatever holds the object (variable, parameter, field, another object).

  • These can further restrict the scoping and mutability of an object's publicly accessible fields and methods.
  • The "_" character will probably not have many use cases here. I would recommend disallowing it.
  • If "." is used, that means that at that scope, no fields of the object can be reassigned or mutated, and no mutating methods can be called at the given scope.
Examples
// TODO

Syntactic Sugar for Collections and Wrapper Classes

🚧 This section is likely to change or be removed.

Here are some random examples showing what it looks like:

  • Some basic usages:
    • ... f0 [..=] num
    • _.= f0 [...] str
    • ... f0 [.==] ..= obj
  • How nesting works:
    • ... f0 [.==] [...] ..= obj I'm just going to stop writing the Java parallels. You get the picture. They are very long.
    • ..= f0 [..=] [===] [.==] str
    • ... f0 [.==] [.==] [...] num
  • When the field is hidden from public and protected, up to that part of the array access tuple is omitted:
    • _.. f0 [.=] num
    • __. f0 [=] .= obj
    • ... f0 [.==] [_..] [.=] [_.] [=] = obj

Defaults

It will probably be useful to code authors to default object mutability tuples to the tuple of whatever holds the object (variable, field, another object such as a collection or promise).

I won't impose defaults for other parts of this spec. I'd rather leave it up the language. Languages that emphasize convenience make things public and writeable by default, and languages that emphasize safety make things private and readonly by default.

In the case that a language chooses to provide no defaults, a pragma to "#pub_mut_all_the_things" would probably be useful for scripting purposes.

Criticisms of this Design

  • The terseness could possibly hinder readability. There's a lot of meaning packed into just a few characters. It may be hard to unpack all that meaning when skimming code, or when tired.

    • My response: Yep. Maybe it's just something that takes time to get used to, and then afterward it would be easy to read? Perhaps IDE's could help with some kind of color coding. It may also help if the tuples are optionally allowed to have spaces between the punctuation marks.
  • With all the options that it exposes, it would be easy to get overwhelmed.

    • My response: Yep. It might help to just start off by making things super strict, and then opening things up as needed (as long as you think critically as you open things up).
  • It enables use cases / making guarantees that may not be really useful/meaningful/helpful. Ex. Enforcing that a variable holding an array cannot be reassigned, but allowing the array contents to be mutated.

    • My response: That's for sure. Perhaps that's something that's best covered by linting rules or best-practice guides? I want to come up with a list of poor-practice cases before I move forward with any decision making. Help here would be appreciated.
      • To be fair, the mechanism I'm proposing doesn't add any new poor-practice cases that couldn't be achieved in other common OOP languages. It just shortens the grammar for doing it... which could be a bad thing for a code author who doesn't know what they want... Hm...

Feel to share your thoughts in the discussion section of this gist, or in this reddit thread.

Undeveloped Thoughts

I haven't thought about how this syntax could possibly used to improve method scoping and overriding permissions, but I intend to try. It certainly should not be usable on the array-entry-access operator or the iterator operator, since that should be left for the instance declaration variable to decide.

How could this be integrated into existing codebases? Would that even be possible? I'm guessing it would be similar to a Kotlin situation: Create a language that compiles to Java bytecode.

I need to settle on what to call the entries of the tuples. Ideas: "knob", "scome", "modifier", "permission". I think permission is good, but there are other "permissions" that this spec doesn't cover such as function purity.

Syntax-Tree-Based Editing

I'm going to propose some crazy stuff. Buckle up.

Warning

The changes I will later propose will:

  • Possibly make source code extremely difficult to read with existing tooling.
  • Include changes to source code viewing and editing, and version control software.
  • Not be super specced out. I'm just exploring an idea that I have.
  • Maybe have already been thought of by other people. I haven't looked into that yet.

Motivation

These problems are seemingly very separate, but I'm going to tie them all together with a radical solution. The changes will require tons of support from tooling, and so here I'll also emphasize the ways the current state of the art also relies heavily on tooling, so later when I propose big changes to the capabilities of tooling, it will make more sense.

Transient Syntax Errors

Editing character-by-character results in intermediate syntax/grammar errors. If you've been on r/programmerhumor, you've probably seen several memes about this.

Opening and closing delimiters can explode the AST when temporarily imbalanced. Ie. strings, comments, any enclosing braces. The problem can be mitigated by debouncing, but that results in delayed highlighting, syntax error reporting, etc.

Disagreements Over Formatting Style

People don't always agree on formatting styles.

  • Space vs Tab wars and disagreement over indentation width.
  • Disagreement over lint/formatter rules, or the conventions differing between programming languages.

When a repo's formatting style config changes, it can result in huge diffs in the VCS, and create a huge merge-conflict-resolution problem for everyone else.

We expect our toolchain to have the ability to reformat code according to configured rules. It's fairly standard to have this action hook into the save event of editors.

Proposal

Create an IDE experience that edits by syntax-tree tokens instead of by characters. Make it impossible to have transient syntax errors. Ex. Make it impossible to delete opening and closing delimiters- instead enable deleting the whole node.

Separate formatting style from the source code. Make the IDE automatically visually arrange the tokens according to the user's formatting style config. Spaces in the source code can be collapsed into a single space where a space is necessary. This will result in basically unreadable source code, but for me, 80% of the time I'm looking at code, I'm looking at it through and IDE. The rest of the time is through VCS-related tooling. Note that in the web world, separation of content and presentation is a foundational principle; why not here?

Todo

  • Come up with some good default keyboard navigation bindings.
  • Realize all the other things I will need to do.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment