Skip to content

Instantly share code, notes, and snippets.

Created Jun 23, 2022
What would you like to do?

robots.json - machine learning access control proposal


Similar to the way the file robots.txt controls the scraping of pages on a web server, the file robots.json in a source code repository should control whether machine learning algorithms are allowed to process and exploit the source code.


For example:

{ "disallow": "*" }

Would not permit any source code to be used.

More granual access could be granted or denied bassed on directories:

{ "disallow": { "paths": ["src/models"] } }

should be interpreted as not allowing files matching src/models/** to be used.

Further control could be granted based on contributor name:

{ "disallow": { "contributors": ["borisj"] } }

should be interepreted as not allowing the use of any files that have been modified by the borisj user.

Access control could also be based on commit date:

{ "disallow": { "before": "2006-08-14T02:34:56-06:00" } }

Files that cannot be parsed as JSON should be interpreted as disallowing any use.

For example the robots.json file containing:

epstein didn't kill himself

should be interpreted as { "disallow": "*" }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment