Skip to content

Instantly share code, notes, and snippets.

@shiftkey
Last active Aug 29, 2015
Embed
What would you like to do?
a thought for a better gitignore project

So I've been looking after some of the PRs in the github/gitignore repository recently and found myself lamenting how things currently work.

One of the areas I'd love to improve is how metadata related to changes is tracked. Currently it's ad-hoc, and requires maintainers to understand details about the impact of changes - and often others want to understand a change after it's been merged.

Here's a (good) example of a rule within a gitignore file:

> less Node.gitignore

# Dependency directory
# https://docs.npmjs.com/misc/faq#should-i-check-my-node-modules-folder-into-git
node_modules

And here's a different one:

> less LemonStand.gitignore

# add content_*.php if you don't want erase client changes to content

I'd really like to understand the reason why people add this rule, and what benefits it gives me. But that's missing - merged with the classic "No description provided" several years ago. Sigh.

Anyway, it'd be nice to capture some details about a rule change - to help everyone review and understanding things better. What if we had a meta-format for these rules?

This is current in the "back-of-the-napkin" stage of thinking, but what a potential spec might look like:

> less framework/Node.gitignoretemplate

Dependencies:
  notes: node_modules contains other dependencies related to your package, and
         there's other infrastructure that takes care of restoring these and
         locking specific versions 
  link: https://docs.npmjs.com/misc/faq#should-i-check-my-node-modules-folder-into-git
  rules:
  - node_modules

There's some extra details here which are invaluable to the process:

  • better organization of templates (we could do ide/ and project/ to namespace away more specialized templates, rather than a dumping ground of things)
  • without needing to refer back to the PR, details about the change are present
  • external links encourage projects to provide supporting material
  • support for making localizable documentation

This could be compiled to:

# Dependencies
node_modules

Or you could go for a more verbose format:

# node_modules contains other dependencies related to your package, and
# there's other infrastructure that takes care of restoring these and
# locking specific versions 
# more info: https://docs.npmjs.com/misc/faq#should-i-check-my-node-modules-folder-into-git
node_modules

Essentially, this format might seem like more work for all involved, but having a consistent format for data and details means that knowledge is centralized, discoverable and easier to disseminate.

What's next?

  • your feedback, give it to me
  • review more templates and sketch out a proper spec
  • talk to some contacts about the github/gitignore repo to gauge interest
  • poke around the gitignore.io repo to see if they have a need for this
@MichaelMarner

This comment has been minimized.

Copy link

@MichaelMarner MichaelMarner commented Jun 15, 2015

It feels to me as though this is more about enforcing a styleguide on contributors. These rule descriptions or justifications should be in the commit message or the pull request. Having a look around the PRs, I can see where you're coming from.

Having a standard for meta information about ignore rules is probably a good idea. And if you're going to enforce a standard it may as well be easy to parse ;)

I don't like the idea of an additional build step. I do a lot of stuff with git outside GitHub, and I really like just being able to piece together .gitignore files based on the GitHub/gitignore templates. Having to compile a .gitignore from source doesn't really bring any benefits, unless a lot of the rules are optional. The spirit of the repo seems to be about sane defaults though, and a compilation step would get in the way of this.

But yeah, some of those PRs are pretty sparse on justification. Requiring all rule modifications to have corresponding justification would help.

@shiftkey

This comment has been minimized.

Copy link
Owner Author

@shiftkey shiftkey commented Jun 15, 2015

@MichaelMarner

Having a look around the PRs, I can see where you're coming from.

Yep. But I've seen a lot of "why did this change occur" threads recently, which makes me think it's more than just about styleguide - which led me down this path.

I don't like the idea of an additional build step.

That's understandable. Are you familiar with gitignore.io at all? I see some duplication of work between both projects, and would love to see more things being shared between the two to make our collective lives easier.

I do a lot of stuff with git outside GitHub, and I really like just being able to piece together .gitignore files based on the GitHub/gitignore templates.

Yep, which is where I think things like how the gitignore.io CLI works is real neat - and complements things like GitHub's gitignore selector, for those who want more control.

I guess my ultimate goal with this is to see the things we already encourage (grouping rules into logical groups, providing supporting links from project authors about best practices, etc) be done more consistently. Using an intermediate representation definitely has it's downsides, but I think there's some big benefits on offer...

Having to compile a .gitignore from source doesn't really bring any benefits, unless a lot of the rules are optional.

Some templates suffer from this more than others - look for the files with high churn. And particular scenarios (think multi-technology repositories) would benefit from being able to incorporate multiple gitignore templates.

But composition of templates isn't something I've given much thought about - I think getting some structure behind things I think will help me understand these scenarios better before I can propose something better.

PS: I'm in Adelaide for the next couple of weeks. 🍻?

@shiftkey

This comment has been minimized.

Copy link
Owner Author

@shiftkey shiftkey commented Jun 15, 2015

These rule descriptions or justifications should be in the commit message or the pull request.

These just don't seem to be happening. I'm not sure how to encourage this knowledge out of people's heads so we can capture it, and I know other maintainers are grappling with the same headaches.

@MichaelMarner

This comment has been minimized.

Copy link

@MichaelMarner MichaelMarner commented Jun 15, 2015

ok, until this point in time I had never seen gitignore.io. That is really clever, and alleviates some of my concerns about a build step. Perhaps the templates get compiled into a Github Pages site or something for general consumption, built from a set of rules. If the ready-for-use ignore files live somewhere then it doesn't matter if they get compiled from a set of rules.

Like I said, parseable metadata is clearly a good idea. My only real concern was ease of access to the final files.

(I've got lots of Three D meetings and tings, but 🍻 is certainly a possibility)

@bkeepers

This comment has been minimized.

Copy link

@bkeepers bkeepers commented Jun 15, 2015

I love the thinking behind this and don't have any feels one way or the other, but a few questions came to mind while reading it:

Could you get the same benefit by requiring a format in the existing templates that doesn't require a build step? For example, it would be relatively easy to write a test that ensures each rule has this format:

# one or more lines of comments
# more info: /something that looks like a url/
rules

Also, a strict format solves the problem for future contributions, but it doesn't really help alleviate the current pain of answering why an existing rule was added. Would you try to backfill the notes and links, or how will this alleviate that pain?

Again, I don't have any feels either way, just trying to help think through the ramifications of this.

@joeblau

This comment has been minimized.

Copy link

@joeblau joeblau commented Jun 15, 2015

I like the idea of adding more documentation around changes to files, but that depends on who the stakeholder is. The way I look at the .gitignore ecosystem, there are two main stakeholders.

  1. Developer - When I first started working on gitignore.io, it was because all I cared about was getting my ignore file with the right presets. I didn't really care about the file contents; All I wanted was a way to ensure that the code I wanted was being checked in and no Linux, Apple of other metadata were in my repository.

  2. Template Maintainer - Once I started trying to maintain gitignore.io, I realized that not a lot of people don't document their PR's for whatever reason. But this statement is very true:

    it's ad-hoc, and requires maintainers to understand details about the impact of changes - and often others want to understand a change after it's been merged.

From the gitignore template maintainer side, I think the process can definitely be refined. The question is going to be about enforcing process. GitHub already has policy around documenting changes, but keeping track of so many new languages, IDE's, and project files had their GitHub repo backed up with 100's of PR's at one point. Honestly, when I see a PR on gitignore.io usually just accept it. I've had some templates PR's come in, get changed immediately and then get changed again a few days later.

I think this is a good proposal for addressing the template maintainer, but enforcement will be tricky if you want to move quickly. From the developer side, I agree with @MichaelMarner about keeping final .gitignore file metadata to a minimum.

@M-Zuber

This comment has been minimized.

Copy link

@M-Zuber M-Zuber commented Jun 15, 2015

My two cents:
Automating the compilation should be relatively easy and therefore doesn't bother me. Of course you have the aforementioned github.io which can host precompiled files.
Another thing I would hope this could help with understanding /writing exceptions.

@shiftkey

This comment has been minimized.

Copy link
Owner Author

@shiftkey shiftkey commented Jun 15, 2015

@bkeepers

Could you get the same benefit by requiring a format in the existing templates that doesn't require a build step?

I'm not overly attached to the format above, so if there's enough resistance to introducing an additional build step with this process I think an alternative that's closer to the .gitignore format is fine with me, especially for interop with existing platforms.

Would you try to backfill the notes and links, or how will this alleviate that pain?

I think that would be helpful - grabbing data from the PR history to annotate existing files. It'd have to be a manual process, but it can be introduced gradually - especially if we're not introducing a new format.

@MikeMcQuaid

This comment has been minimized.

Copy link

@MikeMcQuaid MikeMcQuaid commented Jun 16, 2015

Nice work thinking this out @shiftkey

Could you get the same benefit by requiring a format in the existing templates that doesn't require a build step?

I really like this idea. I think the key thing is having this clearly explained by a failing CI job so that it's not a human but a bot doing the enforcement. While you're doing this you could also add e.g. basic spellchecking and a git test to ensure the rule is valid.

@bkeepers

This comment has been minimized.

Copy link

@bkeepers bkeepers commented Jun 16, 2015

Honestly, when I see a PR on gitignore.io usually just accept it. I've had some templates PR's come in, get changed immediately and then get changed again a few days later.

@joeblau this is really interesting. It basically sounds like the Wikipedia philosophy where you're more open to accepting changes and assume that enough eyeballs will ensure the content is accurate.

@arcresu

This comment has been minimized.

Copy link

@arcresu arcresu commented Jun 17, 2015

I agree that we need some more organised metadata on the rules, but we should consider exactly why we would use it. Here are some thoughts:

  • Compilation: a compilation step would let us do things like mixing files which are just impossible with the existing setup. I strongly feel that it's a necessary step, and I think the reorganisation we would do with this additional power would make a custom syntax which lives in gitignore comments not terribly useful. I'd say stick with YAML or JSON or some easy existing format like that.
  • Compilers: I think we should build in the assumption that the collection is used by more than one consumer (i.e. GitHub). We already know about gitignore.io and I believe there is at least one IDE that pulls information from here to make suggestions. I think if everyone contributes to these rules, everyone should be allowed to use the data easily. We should aim to make information machine-readable and declarative where possible so that it can be consumed by a variety of tools solving slightly different use-cases.
  • Fields: here's the kind of information I envisage.
    • Structure often rules can be put into logical blocks ("documentation", "dependency management") which are easier to eyeball than big slabs of rules. We need to capture this so that a compiler could group rules under headings.
    • Freeform comments and documentation URLs should be aimed a maintainers. The level of verbosity is usually just clutter in a compiled gitignore file, but a compiler might choose to output this as comments as an option. We should encourage PRs to add these fields, but leave them as optional in general.
    • Aliases which are currently handled in an ad-hoc way as symlinks. For example some people don't find the TeX template because they search for LaTeX. Compilers could use the canonical name for populating a menu and use the suggested aliases when performing a search.
    • Including of groups of rules within other rules, for example when Objective-C wants to include XCode, but XCode also makes sense in other contexts without Objective-C.
    • Partials - things like compiled objects (.so, .o and friends) are actually fragments that would never be useful on their own as a suggested gitignore for a new project, but they appear in many places. It would make sense to call it a partial, so that it wouldn't appear in menus, but was available for including.
    • Reasons for including a rule are most commonly from a small set. Rather than have similar but arbitrary freeform comments explaining these, we could define a list of common reasons that can be used. That way compilers can make decisions or present the output differently depending on the reason, and be forcing the presence of at least either a notes or reason field we could deal with the problem of under-explained PRs without imposing much of a burden on the submitters. I think this set would cover the majority of cases, with the rest being freeform notes:
      • reason: temporary for files which are supposed to be temporary or cache
      • reason: output for build products, documentation etc. which is deterministically regenerated from source files
      • reason: dev-specific for user-specific configuration, including local filesystem paths etc.
      • reason: exposed-secret for passwords and API keys etc. A compiler could ask the user about adding this rule or give a warning, or add the rule with a TODO comment, for example.
    • Actions which should be done before using a gitingore file are currently handled in an ad-hoc way. As well as the exposed-secret reason, I would suggest something like the following. These correspond to default-disabled and default-enabled rules that might need to be changed under specific circumstances - and hence they are comments that would appear in the compiled output by default.
      • enable-if: using version X.Y.Z of TechnologyA (might compile to # TODO: Uncomment the following rule(s) if <reason>\n<rules>)
      • remove-if: you want to check in external dependencies (might compile to: # TODO: Remove the following rule(s) if <reason>\n<rules>)

I also think there are some opportunities to improve our guidelines:

  • Global rules: These are fundamentally different to project-level rules. They are supposed to go in a user's global git configuration and not in the project; we're encouraging best practices by default. We should keep in mind that they are used differently - in their current layout people find it too fiddly to manually collate and update several Global/*.gitignore rules, or they just don't see the Global/ directory and so we keep getting PR noise to add the rules to project-level templates. We should formalise the definition too:
    • *.swp and these sort of generic editor or OS-specific files should be ignored by each person globally depending on what they use. There's no point in checking in emacs and vim rules into every project be default - it just makes the rules harder to audit.
    • What about the MATLAB IDE? It's an IDE but it's only used for MATLAB, so essentially everyone working on a project that uses it would benefit from the rule. What about VS? I don't think our current rule of IDE = global really makes sense.
    • I think the test should be: if another developer is working on the same project, is it very unlikely that they would use a different toolset? If so, then it's okay to include in the project-level template. That way VS which is a language environment as much as an IDE would be project-level, as would the MATLAB IDE.
  • PHP frameworks: templates which merely list the set of files installed by some PHP framewor are not very useful. They're very brittle since they depend on the specific version, and they could be generated automatically with ls if someone wants a similar set of rules. We should explicitly discourage these and remove the existing ones.
  • Removals: it would be nice to have a way to remove stagnant templates in a way that gives people a chance to object. It's hard to know when things are still relevant.
  • Relevance: we should be more explicit about what is considered to niche to include, and remove templates which are too niche and so don't get the attention they need.
  • Licensing: we should open up the license as per the long-standing PR, if the issues with that change can be resolved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment