ekingery/configlet-analysis.md

## configlet-analysis.md

      
    Raw
  

              configlet-analysis.md
            
          
    Configlet Analysis for Exercism V3

A language track's config.json file provides a way to pass data between the track's git repo and the V3 website's ActiveRecord data model. configlet is a tool for validation and sanitization of the shared config data as it is transferred between the decentralized track repos and the centralized data models.
configlet is integrated (to varying degrees) into the CI / build process of the language track repos. It is also downloaded and used locally by track maintainers to assist in understanding, validating, and formatting the track config.json files. As part of exercism V3, we will be evolving the specifications (of config.json). This provides a good opportunity to analyze the open issues with configlet and evaluate the best path forward.
Based on the analysis below, I would like to consider replacing configlet with an API (either built into the v3-website or as a standalone service used by the v3-website). As-needed, we could build and distribute a thin CLI client around this API and/or a v3-website interface for track maintainers to view and modify the canonical config data. I am almost certainly missing some key parts of the bigger picture here, in addition to the inevitable complications of building and calling the proposed API. With that said, I am optimistic this analysis can be the start of a productive conversation around the future of configlet and more broadly the role and shape of track configuration and the related tooling.
Current Issues With Configlet

About a year ago, I created a triage of open configlet issues.
For the sake of this analysis, I will divide the challenges with configlet into four categories, focusing on the first three.

Imperfect duplication of validation logic
Decentralization of the track repos
Modifying configlet
Everything else

Imperfect Duplication of Validation Logic

The canonical validation logic for the config.json data resides in the v3-website Ruby code. configlet duplicates some of this logic in an attempt to alert track maintainers to invalid configuration data. A large portion of issues in the configlet repo are a result of the disconnects between the Ruby code and the duplicated versions in configlet's Go codebase. A recent example of this is UUID validation and deduplication, which still presents an issue despite configlet's usage of the https://exercism.io/api/v1/uuids API added in the UUID validation PR.
Compiling all regex patterns in track config and syncing regex validation between Go and Ruby is also related to this duplication. Additionally, there are multiple open issues related to the validation of deprecated exercises. Adding a schema and related enforcement would be helpful in syncing the validation, but ultimately comes up short in solving the core issue here. For example, JSON schema does not support foreign keys / references / dependent keys, so the membership/cycle checks for the concepts and prerequisite arrays would still require duplicated logic within configlet and the v3-website.
Decentralization of the Track Repos

There are many benefits to having individual track repositories for each language. A challenge presented is in distributing configlet as a binary to be used in a semi-autonomous way across all of the tracks. Changes to the way configlet is distributed and used in the build process must be distributed manually / individually across all of the track repos, taking into account the tracks who have customized their usage (Python for example, in addition to others who have integrated fmt and other related commands).
I created issue #181 as a proposal for improving this process and is a worthwhile reference for some additional reasoning. Switching to a centralized Github Actions config is helpful, but still comes up short in solving the core issue. For example, new CI / check functionality would still need to be coordinated and released across all of the track repos.
Modifying Configlet

In starting the required V3 updates, I discovered that the tests are tightly bound to the litany of config file fixures which are (in some cases) tied to the exact output of the configlet commands. The trivial config additions of the version and editor setttings consisted of ~25 lines of new code, coupled with ~55 lines of changes to tests and test fixtures. This was just to get the build to pass, I didn't add any new test coverage for the new config keys.
As I began to implement the changes required for adding the concepts array and restructuring the practice exercise keys in PR #187, it became clear that a large part of the configlet code (~4K lines split roughly 50/50 between application code and test code) is validating a schema (albeit one that is not actually defined outside of documentation). In my estimation, the configlet code is essentially a custom, hardcoded schema validator, with many fixtures and test data tied to current structure and functionality. This combined with the imperfect duplication of validation logic discussed above leads to a particularly high effort, low payoff scenario for retrofitting configlet with the V3 updates.
A relevant factor in this is that I am a novice Go programmer, with roughly a year's worth of professional experience with it. An advanced Go programmer could probably knock out the changes to the linter in a couple hours, with an additional couple days required to fix the tests. The rest of the tools (fmt, generate, tree) would be relatively quick follow-ons (maybe 1-3 days of work). Independent of the experience of the programmer, replacing the hardcoded validation logic with json-schema would make sense and possibly save time. However, given the other factors above, I am reluctant to jump into implementing json-schema, if we could skip to a better architecture that would eliminate the need for configlet (and potentially config.json) altogether.
Everything Else


https://github.com/exercism/configlet/issues
https://github.com/exercism/configlet/pulls