thecata/feature_tags

## feature_tags
I'm thinking of developing a new feature for GIT. Before I get into the explanations, I would like to point out that this is
something I didn't encounter in any other existing SCM, so it may sound a bit weird to most of you out there, but I do think this
may be something useful to have, especially for larger projects, and for new guys as you will see below.


SHORT DESCRIPTION:

For those who don't want to read this whole document to get an idea of what I had in mind, the short version would go something
like this: I want to be able to add a bunch of "feature tags" (or "flags" if you will) for each individual file I encounter. The
idea is that by adding some information like "file X implements the Y technique/pattern", you can easily incorporate extra
knowledge about the project, or highlight common team practices, knowledge which usually resides only with a few experienced
programmers on the team. With this added information, you can add more people easier to a project, because anybody can access it.
Also, these file tags can also be used to mark specific states for a file, which can be useful for example when doing code reviews
(or any other process that is file state oriented).


BACKGROUND STORY:

The idea first occurred to me while I was trying to organize my collection of source files which I gathered in programming
contests. I was trying to work out a system where I could quickly find any source file to answer any of the following questions:
	- which problems were solved correctly? (accepted by the judge, which rejected status did it receive: wrong answer, time
	  limit exceeded or just unfinished - did not submit it yet/still in progress)
	- which problems employed a certain technique? (dynamic programming, a particular graph algorithm, backtracking, greedy)

The first question was easily answered by putting them into folders with the obvious meaning (same as project files are organized
into modules). For a long time, this entire collection of mine was unversioned. Putting this under a SCM was the next logical
step, but it would have to be something that would suit my needs, so the choice of SCM would be in favor of the one that already
offered an answer for the second issue of mine. To answer the second question however, things became a little trickier. One way of
doing this would be to maintain a folder of symlinks for each technique. In other words, I would maintain a folder for the greedy
technique for example. Every time I found a file which implements a greedy algorithm, I would put a symlink in that 'greedy'
folder. Unfortunately, this entire symlink structure would have to be under the SCM as well, and adding symlinks would just
increase the commit tree. After a more careful analysis, I realized that a solution to my problem would solve another pretty
common issue I noticed in about every programming job I ever heard of (including the job I currently have, or the jobs I had in
the past). Every time a new guy entered an existing project, he would start on small tasks, usually related to certain common
practices that he will eventually encounter while working on that project. For example, when I was working on a Java web platform
and I had to train someone new, I would make sure he/she would work on a task that involved load/store operations with the
database, some basic transformations on the objects and finally displaying the data in a web browser, using a html/javascript
interface.  Whenever the new guy encountered a problem, he would go to a senior programmer (or his trainer) to find an answer
(preferably, the go to guy is what I usually call the "project guru"...  there always seems to be at least one guy that has a
great overall view of the project, that always knows where in the project to look to find a solution that was implemented when
this problem occurred 3 years ago, or at least knows the guy who encountered something similar). In some cases, writing the answer
to a question like "how should I display the person information dialog" would be something like "just use our CustomWindow
component... you will find similar usages in the project". The new guy would just search for usages of that particular class and
figure out the answer for his problem from there. That's the happy case, where every usage of the 'CustomWindow' would be a
correct usage (usually, this almost never happens; in every large enough project where development time is a concern, there will
always be a corner cut somewhere, so the answer for the new guy would usually turn out to be "use the CustomWindow component...
you fill find correct usages in files X and Y").


TECHNICAL PROPOSAL:

I would suggest to add a new command in git. My first suggestion would have been "tag", but that already exists, with a totally
different semantic. A second suggestion would be "feature" (or "feat" or even "ftag" for short).  Some of the usages would
include:
	git feat --add    DB-search file1.cpp --since=dbf64e1 --until=58b66f8 ; (1)
	git feat          DB-store file2.cpp --since=dbf64e1                  ; (2)
	git feat --remove DB-load file3.cpp --since=58b66f8                   ; (3)
	git feat DB-search file4.cpp                                          ; (4)
	git feat DB-search file5.cpp --comment='a random comment'             ; (5)
	git feat          DB/store file2.cpp --since=563bdac                  ; (6)
	git feat --create DB/load/DTO                                         ; (7)
	git feat          DB/load/DTO  file3.cpp                              ; (8)
	git feat --autocreate DB/load/DTO   file3.cpp                         ; (9)
The idea is that we should be easily be able to add a certain "feature tag" to a particular file, starting from a particular
version of that file. Normally, the "feature tag" would be preserved for a particular file for as long as possible. Take the
examples above for starters. In (4) and (5), we just add a feature tag called 'DB-search' for the latest version of 'file1.cpp'.
This tag would be preserved for all subsequent versions of the file (even if it was merged with some other versions of the file,
in the idea that this file has that particular feature implemented). The "--add" is the default if nothing else is specified. Of
course, the opposite of "--add" is also possible, so a file can lose a "feature tag" starting with the current version, or since a
particular version back in time (see example (3) above - as you've probably guessed, the default value for "--since" is "HEAD"
wherever supported). The functionality for both "--since" and "--remove" will be split in two commands, namely an add and a remove
(or a remove followed by an add if "--remove" is also present). Last but not least, examples (6), (7) and (8) show using multiple
levels of tags (example (9) is just the same as (7) followed by (8)). Tags will be stored in a folder-like structure, so nesting
tags would be natural and straight forward. I think it may be useful to have a tag autocomplete feature, but not an auto-creation
(same as with branches, which currently supports bash autocompletion for branch names).

Of course, adding some markers are useless unless a search and list mechanism is also provided:
	git feat --list DB-store                     ; (1)
	git feat --list DB  --deep                   ; (2)
	git feat --file=file1.cpp                    ; (3)
	git feat --user=john.doe --list DB-store     ; (4)
The first command here, (1), would list all files having the 'DB-store' feature tag. The second, (2), would list all the files
having the feature tag 'DB' and its children. The third one, (3), would just print the tags that are set for the file 'file1.cpp'.
I'm also considering adding some support to highlight tags a particular file had in the past (in order to find some feature that
were once implemented in the project even if they are no longer included), but couldn't figure out a syntax that is easy to use
(not to mention the mechanics involved there). Last but not least, I'll just mention that all the searching can be per user (just
show the files user X has marked with a certain feature tag, or just show the tags user X has entered for a given file). I'm
currently leaning in favor of multiple users as well, so it would be a union between the sets returned for users X, Y and Z if
requested (in the end, it's simple enough to implement, so why not have it from the start.


A FEW IMPLEMENTATION DETAILS:

I noticed earlier that one way to write this would be to maintain a folder of symlinks for every feature tag, but as I said, this
would just overcrowd the commit tree unnecessarily. The other (and possibly only viable option) is to add these tags as internal
refs (same as the tags now available in git). This means no extra data is commited, the users can still commit their work
completely unaware of feature tags and so on. In other words, a tree structure of feature tags would be maintained, in parallel
with the file tree. I say tree structure, because unlike the current "version tags" (or labels, however you want to call them),
these feature tags would be a lot more useful as a tree structure (the version tags are basically linear, i.e, all tags exist on
the same level).


FINAL NOTES:

Now you're all probably thinking: "Ok, this sounds nice/awful... when do we get to see it in action?". To answer that question,
this is basically all in the planning phase right now, so just documention for now. I've read through the git documentation, I
have a decent understanding of the git project structure itself and I know pretty much where everything I want should go.
Unfortunately, this would be my first open-source contribution (everybody has to start somewhere, right?). To top it all, my C
programming is a bit rusty these days.  I've been a full time Java programmer for the past 4-5 years, including now, so my C days
are quite in the past... but it should be like riding a bicycle, so it will eventually get back to me. And of course, there is the
spare time issue, which I guess many open source contributors have.

The main reason I'm putting this idea forward is to have other suggest other possible usages for this thing (so far, one guy has
suggested using these feature tags to handle code reviews: mark files that need to be reviewed with a particular tag and un-tag it
afterwards - or re-tag it with "ready for production" or something similar). These other possible usages might even suggest a
better name for this (my first idea might not be so suited if the primary usage for it would be for something completely
different). For implementation ideas, any suggestions are welcome (I would like to implement this myself, as it is my first open
source contribution as I said before, but I won't turn down help if any is offered).
	I'm thinking of developing a new feature for GIT. Before I get into the explanations, I would like to point out that this is
	something I didn't encounter in any other existing SCM, so it may sound a bit weird to most of you out there, but I do think this
	may be something useful to have, especially for larger projects, and for new guys as you will see below.



	SHORT DESCRIPTION:

	For those who don't want to read this whole document to get an idea of what I had in mind, the short version would go something
	like this: I want to be able to add a bunch of "feature tags" (or "flags" if you will) for each individual file I encounter. The
	idea is that by adding some information like "file X implements the Y technique/pattern", you can easily incorporate extra
	knowledge about the project, or highlight common team practices, knowledge which usually resides only with a few experienced
	programmers on the team. With this added information, you can add more people easier to a project, because anybody can access it.
	Also, these file tags can also be used to mark specific states for a file, which can be useful for example when doing code reviews
	(or any other process that is file state oriented).



	BACKGROUND STORY:

	The idea first occurred to me while I was trying to organize my collection of source files which I gathered in programming
	contests. I was trying to work out a system where I could quickly find any source file to answer any of the following questions:
	- which problems were solved correctly? (accepted by the judge, which rejected status did it receive: wrong answer, time
	limit exceeded or just unfinished - did not submit it yet/still in progress)
	- which problems employed a certain technique? (dynamic programming, a particular graph algorithm, backtracking, greedy)

	The first question was easily answered by putting them into folders with the obvious meaning (same as project files are organized
	into modules). For a long time, this entire collection of mine was unversioned. Putting this under a SCM was the next logical
	step, but it would have to be something that would suit my needs, so the choice of SCM would be in favor of the one that already
	offered an answer for the second issue of mine. To answer the second question however, things became a little trickier. One way of
	doing this would be to maintain a folder of symlinks for each technique. In other words, I would maintain a folder for the greedy
	technique for example. Every time I found a file which implements a greedy algorithm, I would put a symlink in that 'greedy'
	folder. Unfortunately, this entire symlink structure would have to be under the SCM as well, and adding symlinks would just
	increase the commit tree. After a more careful analysis, I realized that a solution to my problem would solve another pretty
	common issue I noticed in about every programming job I ever heard of (including the job I currently have, or the jobs I had in
	the past). Every time a new guy entered an existing project, he would start on small tasks, usually related to certain common
	practices that he will eventually encounter while working on that project. For example, when I was working on a Java web platform
	and I had to train someone new, I would make sure he/she would work on a task that involved load/store operations with the
	database, some basic transformations on the objects and finally displaying the data in a web browser, using a html/javascript
	interface. Whenever the new guy encountered a problem, he would go to a senior programmer (or his trainer) to find an answer
	(preferably, the go to guy is what I usually call the "project guru"... there always seems to be at least one guy that has a
	great overall view of the project, that always knows where in the project to look to find a solution that was implemented when
	this problem occurred 3 years ago, or at least knows the guy who encountered something similar). In some cases, writing the answer
	to a question like "how should I display the person information dialog" would be something like "just use our CustomWindow
	component... you will find similar usages in the project". The new guy would just search for usages of that particular class and
	figure out the answer for his problem from there. That's the happy case, where every usage of the 'CustomWindow' would be a
	correct usage (usually, this almost never happens; in every large enough project where development time is a concern, there will
	always be a corner cut somewhere, so the answer for the new guy would usually turn out to be "use the CustomWindow component...
	you fill find correct usages in files X and Y").



	TECHNICAL PROPOSAL:

	I would suggest to add a new command in git. My first suggestion would have been "tag", but that already exists, with a totally
	different semantic. A second suggestion would be "feature" (or "feat" or even "ftag" for short). Some of the usages would
	include:
	git feat --add DB-search file1.cpp --since=dbf64e1 --until=58b66f8 ; (1)
	git feat DB-store file2.cpp --since=dbf64e1 ; (2)
	git feat --remove DB-load file3.cpp --since=58b66f8 ; (3)
	git feat DB-search file4.cpp ; (4)
	git feat DB-search file5.cpp --comment='a random comment' ; (5)
	git feat DB/store file2.cpp --since=563bdac ; (6)
	git feat --create DB/load/DTO ; (7)
	git feat DB/load/DTO file3.cpp ; (8)
	git feat --autocreate DB/load/DTO file3.cpp ; (9)
	The idea is that we should be easily be able to add a certain "feature tag" to a particular file, starting from a particular
	version of that file. Normally, the "feature tag" would be preserved for a particular file for as long as possible. Take the
	examples above for starters. In (4) and (5), we just add a feature tag called 'DB-search' for the latest version of 'file1.cpp'.
	This tag would be preserved for all subsequent versions of the file (even if it was merged with some other versions of the file,
	in the idea that this file has that particular feature implemented). The "--add" is the default if nothing else is specified. Of
	course, the opposite of "--add" is also possible, so a file can lose a "feature tag" starting with the current version, or since a
	particular version back in time (see example (3) above - as you've probably guessed, the default value for "--since" is "HEAD"
	wherever supported). The functionality for both "--since" and "--remove" will be split in two commands, namely an add and a remove
	(or a remove followed by an add if "--remove" is also present). Last but not least, examples (6), (7) and (8) show using multiple
	levels of tags (example (9) is just the same as (7) followed by (8)). Tags will be stored in a folder-like structure, so nesting
	tags would be natural and straight forward. I think it may be useful to have a tag autocomplete feature, but not an auto-creation
	(same as with branches, which currently supports bash autocompletion for branch names).

	Of course, adding some markers are useless unless a search and list mechanism is also provided:
	git feat --list DB-store ; (1)
	git feat --list DB --deep ; (2)
	git feat --file=file1.cpp ; (3)
	git feat --user=john.doe --list DB-store ; (4)
	The first command here, (1), would list all files having the 'DB-store' feature tag. The second, (2), would list all the files
	having the feature tag 'DB' and its children. The third one, (3), would just print the tags that are set for the file 'file1.cpp'.
	I'm also considering adding some support to highlight tags a particular file had in the past (in order to find some feature that
	were once implemented in the project even if they are no longer included), but couldn't figure out a syntax that is easy to use
	(not to mention the mechanics involved there). Last but not least, I'll just mention that all the searching can be per user (just
	show the files user X has marked with a certain feature tag, or just show the tags user X has entered for a given file). I'm
	currently leaning in favor of multiple users as well, so it would be a union between the sets returned for users X, Y and Z if
	requested (in the end, it's simple enough to implement, so why not have it from the start.



	A FEW IMPLEMENTATION DETAILS:

	I noticed earlier that one way to write this would be to maintain a folder of symlinks for every feature tag, but as I said, this
	would just overcrowd the commit tree unnecessarily. The other (and possibly only viable option) is to add these tags as internal
	refs (same as the tags now available in git). This means no extra data is commited, the users can still commit their work
	completely unaware of feature tags and so on. In other words, a tree structure of feature tags would be maintained, in parallel
	with the file tree. I say tree structure, because unlike the current "version tags" (or labels, however you want to call them),
	these feature tags would be a lot more useful as a tree structure (the version tags are basically linear, i.e, all tags exist on
	the same level).



	FINAL NOTES:

	Now you're all probably thinking: "Ok, this sounds nice/awful... when do we get to see it in action?". To answer that question,
	this is basically all in the planning phase right now, so just documention for now. I've read through the git documentation, I
	have a decent understanding of the git project structure itself and I know pretty much where everything I want should go.
	Unfortunately, this would be my first open-source contribution (everybody has to start somewhere, right?). To top it all, my C
	programming is a bit rusty these days. I've been a full time Java programmer for the past 4-5 years, including now, so my C days
	are quite in the past... but it should be like riding a bicycle, so it will eventually get back to me. And of course, there is the
	spare time issue, which I guess many open source contributors have.

	The main reason I'm putting this idea forward is to have other suggest other possible usages for this thing (so far, one guy has
	suggested using these feature tags to handle code reviews: mark files that need to be reviewed with a particular tag and un-tag it
	afterwards - or re-tag it with "ready for production" or something similar). These other possible usages might even suggest a
	better name for this (my first idea might not be so suited if the primary usage for it would be for something completely
	different). For implementation ideas, any suggestions are welcome (I would like to implement this myself, as it is my first open
	source contribution as I said before, but I won't turn down help if any is offered).