Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
How to download GIT LFS files

How to retrieve GIT LFS files from GitHub

Retrieving non-LFS files

Through the GitHub API it is possible to retrieve individual files from a Git repository via, e.g. curl. To do so, first retrieve the content information for the relevant file (or folder):

curl https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

For private repositories, authenticate using your username and a personal access token

curl -u {username}:{personal access token'} https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

This will return a JSON response:

{
  "name": "README.md",
  "path": "README.md",
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
  "html_url": "https://github.com/{organisation}/{repository}/blob/main/README.md",
  "git_url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "download_url": "https://raw.githubusercontent.com/{organisation}/{repository}/main/README.md?token=AAL57UOYWVQ56ZZGDGWYUAK76WFNO",
  "type": "file",
  "_links": {
    "self": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
    "git": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
    "html": "https://github.com/{organisation}/{repository}/blob/main/README.md"
  }
}

The file can then be downloaded using the sha:

curl -u https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a

This gives another JSON response with the file contents in base64 encoding:

{
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "node_id": "{node id}",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "content": "{base64 encoded content}",
  "encoding": "base64"
}

Note that for smaller files, the base64 encoded content will already be included in the first call.

Retrieving LFS files

Retrieving an LFS file requires a few extra steps. For LFS files, decoding the base64 string will not return the file's content, but information in the following format:

version https://git-lfs.github.com/spec/v1
oid sha256:{sha}
size {filesize}

Using this information, you need to create a JSON object as follows, filling in the sha and filesize information from the previous step:

{
    "operation": "download", 
    "transfer": ["basic"], 
    "objects": [
        {"oid": "{sha}", "size": "{size}"}
    ]}
}

Pass this object as data parameter to a curl request to the LFS api:

curl -X POST \
-H "Accept: application/vnd.git-lfs+json" \
-H "Content-type: application/json" \
-d '{"operation": "download", "transfer": ["basic"], "objects": [{"oid": "{sha}", "size": {size}}]}' \
https://github.com/{organisation}/{repository}.git/info/lfs/objects/batch

Almost there! This should return a JSON object that tells you where the file is stored:

{
  "objects": [
    {
      "oid": "{sha}",
      "size": {size},
      "actions": {
        "download": {
          "href": "https://github-cloud.s3.amazonaws.com/alambic/media/278163869/a2/42/{sha}?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20210106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210106T104409Z&X-Amz-Expires=3600&X-Amz-Signature=XXX&X-Amz-SignedHeaders=host&actor_id=XXX&key_id=0&repo_id=XXX&token=1",
          "expires_at": "2021-01-06T11:44:09Z",
          "expires_in": 3600
        }
      }
    }
  ]
}

Download the file from the URL stated in the href attribute.

@tljstewart
Copy link

tljstewart commented Feb 19, 2022

Wow, thank you for putting this info together. Github needs a better solution, I'm just trying to download a repo as a zip but I can't make any progress unless I do what you've outlined to actually retrieve the datasets :(

@bauergeorg
Copy link

bauergeorg commented Apr 7, 2022

Florian, thanks a lot for your example!

@fkraeutli
Copy link
Author

fkraeutli commented Apr 7, 2022

Glad it's useful!

@bauergeorg
Copy link

bauergeorg commented Apr 8, 2022

@fkraeutli do you know how to add or to replace a git lfs file to/on github?

@MrCsabaToth
Copy link

MrCsabaToth commented Jun 14, 2022

I'm trying to download a file form an LFS enabled repo to correct quota overflow. When I execute the curl command it returns Cookies must be enabled to use GitHub. When I execute the same POST command from ARC it returns 422 Unprocessable Entity with the message Your browser did something unexpected. Please try again. If the error continues, try disabling all browser extensions.. I need help.

@bauergeorg
Copy link

bauergeorg commented Aug 2, 2022

@fkraeutli and @MrCsabaToth
In the meantime I made the game. You have to verify your download. That's all.

Here is an extract of my python code using PyGithub and requests module.

# see: https://www.mattmoriarity.com/2019-04-25-uploading-media-with-git-lfs/#initiating-the-transfer

# check for verify lock 
headers = {'Content-Type': 'application/vnd.git-lfs+json', 'Accept': 'application/vnd.git-lfs+json'}
url = 'https://lfs.github.com/{}/{}/locks/verify'.format(self.org, repo.name)
# locks 
ans0 = requests.post(url, headers=headers, auth=(self.token, ''))
if ans0.status_code != 200:
    print('status code: ' +  str(ans0.status_code))
    print('text: ' + ans0.text)
    raise Exception("Status code error")
# handle unhandled lock stuff
# we expect an no locked files (empty answer)
# see: https://github.com/git-lfs/git-lfs/blob/main/docs/api/locking.md
if ans0.text.find('{"ours":[],"theirs":[],"next_cursor":""}') == -1:
    raise Exception("Lock handle error. Please contact your favorite developer to add lfs lock handling!")
# print ans
#pprint(ans0.text)

# get old content
old_lfs_pointer_content = decoded_content.split('\n')
old_sha = old_lfs_pointer_content[1].replace('oid sha256:', '')
old_size = int(old_lfs_pointer_content[2].replace('size ', ''))
# calculate content of lfs pointer to replace
new_size = get_file_size(source_file_path)
new_sha = get_file_hash(source_file_path)
# replace content
new_lfs_pointer_content = decoded_content.replace(str(old_size), str(new_size))
new_lfs_pointer_content = new_lfs_pointer_content.replace(old_sha, new_sha)

# get url to upload file to lfs
headers = {'Content-Type': 'application/vnd.git-lfs+json', 'Accept': 'application/vnd.git-lfs+json'}
pointer_data = '{"operation": "upload", "transfer": ["basic"], "objects": [{"oid": "' + new_sha + '", "size": ' + str(new_size) + '}]}'
url = 'https://github.com/{}/{}.git/info/lfs/objects/batch'.format(self.org, repo.name)
res = requests.post(url, headers=headers, data=pointer_data, auth=(self.token, ''))
ans1 = json.loads(res.text)
pprint(ans1['objects'][0])
upload_href = ans1['objects'][0]['actions']['upload']['href']
upload_header = ans1['objects'][0]['actions']['upload']['header']
verify_href = ans1['objects'][0]['actions']['verify']['href']
verify_header = ans1['objects'][0]['actions']['verify']['header']

# add content type to header
upload_header['Content-Type'] = 'application/octet-stream'

# read new data (not encoded)
data = open(source_file_path, 'rb').read()
# send/upload file to git lfs
ans2 = requests.put(url=upload_href, headers=upload_header, data=data)
if ans2.status_code != 200:
    print('status code: ' +  str(ans2.status_code))
    print('text: ' + ans2.text)
    raise Exception("Status code error")
# print ans
pprint(ans2.text)

# verify
# see: https://github.com/git-lfs/git-lfs/blob/main/docs/api/basic-transfers.md#verification
verify_data = '{"oid": "' + new_sha + '", "size": ' + str(new_size) + '}'
ans3 = requests.post(url=verify_href, headers=verify_header, data=verify_data, auth=(self.token, ''))
if ans3.status_code != 200:
    print('status code: ' +  str(ans3.status_code))
    print('text: ' + ans3.text)
    raise Exception("Status code error")
# print ans
#pprint(ans3.text)

# create blob with new content (lfs pointer)
blob = repo.create_git_blob(content=new_lfs_pointer_content, encoding='utf-8')

Greetings and thank's a lot for your posts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment