Skip to content

Instantly share code, notes, and snippets.

@internetimagery
Last active April 17, 2023 08:46
Show Gist options
  • Save internetimagery/f543a983efed4dfc5740930f97474697 to your computer and use it in GitHub Desktop.
Save internetimagery/f543a983efed4dfc5740930f97474697 to your computer and use it in GitHub Desktop.
Git Annex Notes

Git Annex Usage Notes

A simple reminder for myself of some often used commands. Plenty of useful docs over on https://git-annex.branchable.com/

This is a work in progress notebook...

Setup

Initializing a new repo / location

The easiest way to get git up and running is to clone an existing git repo. But it's cleaner to start completely fresh.

# Build a place for the repo to exist and initialize git within
mkdir my_repo
cd my_repo
git init
# Set up remote access if desired. Doing it this way instead of cloning directly lets a custom name be added for the remote
git remote add my_remote_name https://path/to/repo.git

Then we want to turn this into a git annex remote. It isn't required to make a name, but your future self will thank you for it later.

git annex init my_repo_name

Setting up special remotes

If files are being stored (or want to be stored) in locations that are not git repos (and we do that to do that, lets face it). Then we want "special remotes". For examples, storing data in the cloud. https://git-annex.branchable.com/special_remotes/ We can set these remotes to be encrypted to prevent spying eyes. We can also utilize many cloud providers for redundancy, and to maximize the offsers different companies provide.

Some common example setups setting up S3. MY_KEY_ID refers to an identifier for gpg to locate the key that will be used for encryption.

export AWS_ACCESS_KEY_ID=my_access_key
export AWS_SECRET_ACCESS_KEY=my_secret_key
# Essential options. Additional options, including region on the docs website 
git annex initremote aws type=S3 encryption=hybrid keyid=MY_KEY_ID bucket=my_bucket datacenter=my_location

OR using a different non-amazon S3 service:

export AWS_ACCESS_KEY_ID=my_access_key
export AWS_SECRET_ACCESS_KEY=my_secret_key
git annex initremote storj type=S3 encryption=hybrid keyid=MY_KEY_ID host=gateway.storjshare.io protocol=https bucket=my_bucket

There is support for the allmighty rclone, with a third party provider. This opens up git annex support to most cloud providers one can find. With more being added to rclone as people need them.

  • First rclone needs to be installed
  • Secondly the provider (link above) needs to be installed and in PATH
  • Thirdly, set up the provider in rclone using rclones configuration
# The basics. The rest of the flags are typically service dependent
git annex initremote my_remote type=external externaltype=rclone encryption=hybrid ....... 

It happens often that you want a large managed / tracked store of files. Movies / Photos / Music / Documents. But also you might wish to interop with another device that does not know of git annex. For example importing images from a camera, or exporting music to a player.

Git annex supports exporting and importing from special remotes as "raw" file trees for this purpose. Plenty can be used for this. In this example a directory special remote is used.

git annex initremote my_remote type=directory exporttree=yes

# We can choose to export a branch and also a particular folder if we like
git annex export master:path/to/files --to my_remote

# Import tree is the same. Note that import and export can be included on a single remote. But have a think about which direction you want changes to actually travel in.
git annex initremote my_remote type=directory importtree=yes

# We can choose to export a branch and also a particular folder if we like
git annex import master:path/to/files --from my_remote

# Additionally exported/imported remotes can be set up for auto export/import via sync, so they stay up to date.
git config remote.my_remote.annex-tracking-branch master

Marking repo preferences

Each repositiory / location can have its own preferences for files. Which will automatically be propagated when a sync is performed. This is incredibly powerful as it both takes the hassle out of deciding which files go where, and also helps document what a location is intented for.

Options available: https://git-annex.branchable.com/git-annex-preferred-content/

Some examples (the . means this repo here, replacable with another repo name if setting that up):

# To have the repo take every single file it can. Useful for a single cloud backup that wants to hoover all files into itself.
git annex wanted . 'anything'

# To have the repo always want images, and otherwise keep manually added files
git annex wanted . 'present or include=*.jpg'

Repos can also be added to groups, to utilize in expressions. For example:

# Add aws remote to a new group nammd "cloud". We can put other cloud providers here later if we desire
git annex group aws cloud

# Now create an expression that says aws wants a file, but only if it is not already in the cloud with at least one copy (share the load)
git anenx wanted aws 'not copies=cloud:1'

Another feature is automatic flagging of files that should be in annex vs regular git. For example if you wanted to use git for regular files, but ensure jpg files are always stored in git annex (even when using regular git commands!).

Creating a new .gitattributes file (or appending to an existing one). The regular glob that git uses is in place but then a content expression can be used afterwards.

*.jpg annex.largefiles=anything

# Or perhaps any file over a certain size
* annex.largefiles=largerthan=100kb

Git configuration can also be used for this purpose.

Working with content

Adding, Editing, Removing files

TODO: get, drop, sync, copy, (un)lock

Queries

TODO: whereis, info

Automating

TODO: watch, assistant, webapp

Metadata

TODO: tagging, views

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment