Skip to content

Instantly share code, notes, and snippets.

@dvolk
Created June 23, 2022 20:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dvolk/efc8b6ecb69e03c605533b38439df5f6 to your computer and use it in GitHub Desktop.
Save dvolk/efc8b6ecb69e03c605533b38439df5f6 to your computer and use it in GitHub Desktop.

robots.json - machine learning access control proposal

Introduction

Similar to the way the file robots.txt controls the scraping of pages on a web server, the file robots.json in a source code repository should control whether machine learning algorithms are allowed to process and exploit the source code.

Examples

For example:

{ "disallow": "*" }

Would not permit any source code to be used.

More granual access could be granted or denied bassed on directories:

{ "disallow": { "paths": ["src/models"] } }

should be interpreted as not allowing files matching src/models/** to be used.

Further control could be granted based on contributor name:

{ "disallow": { "contributors": ["borisj"] } }

should be interepreted as not allowing the use of any files that have been modified by the borisj user.

Access control could also be based on commit date:

{ "disallow": { "before": "2006-08-14T02:34:56-06:00" } }

Files that cannot be parsed as JSON should be interpreted as disallowing any use.

For example the robots.json file containing:

epstein didn't kill himself

should be interpreted as { "disallow": "*" }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment