Skip to content

Instantly share code, notes, and snippets.

@shubham4443
Last active August 28, 2023 14:12
Show Gist options
  • Save shubham4443/dcbdf72d3311f4e709fbe02a89d0b8c9 to your computer and use it in GitHub Desktop.
Save shubham4443/dcbdf72d3311f4e709fbe02a89d0b8c9 to your computer and use it in GitHub Desktop.
GSoC 2023 at CNCF & TUF - Final Report by Shubham Nazare

Google Summer of Code 2023 at CNCF & TUF
by Shubham Nazare

Table of Content

1. Introduction

1.1. About The Update Framework (TUF)

The Update Framework (TUF) is an open-source software framework for secure software update distribution. TUF provides a set of tools and protocols for securely distributing software updates to end-users . It aims to provide a secure and robust solution to the problem of software update distribution making it a valuable tool for developers and organizations that need to distribute software updates securely and efficiently.

1.2. Problem Statement

Issue: theupdateframework/python-tuf#2325

The TUF specification provides explicit guidelines for how artifacts should be hashed and later verified to guarantee their integrity. The TUF specification leaves no room for ambiguity regarding the hashing requirements for artifacts integrity. However, Content Addressable Systems like Git, IPFS (InterPlanetary File System) and OSTree have their own mechanisms for ensuring the integrity of artifacts. When TUF is used with these systems, it is redundant for it to also ensure artifact integrity.

1.3. Proposed Solution

One solution to this problem could be to delegate the responsibility of artifact integrity verification to the content addressable systems themselves, while still using TUF to manage the metadata and provide additional security measures. By delegating the responsibility of artifact integrity verification to the content addressable system, redundancy can be avoided, reducing the overhead and complexity of the update process. This approach also enables organizations to leverage the existing mechanisms provided by content addressable systems, which are often optimized for specific use cases and can provide better performance and scalability compared to generic solutions.

IPFS is one of the content addressable system that has artifact integrity capabilities which can be complemented by all of TUF's other features. In IPFS, each file is given a unique content-based identifier (CID), which is derived from the contents of the file. CIDs contain hash of the file along with other information such as codec. Therefore, every file has a unique hash which acts as a fingerprint. A user will request for the file usings its CID which contains the hash. So, the hash function allows you to check the integrity of the obtained file by comparing the requested hash with the hash of the received file. If the hashes match, it means you received the file as it is. It has not been modified.

2. Implementation

Final Project: https://github.com/theupdateframework/tap19-ipfs-poc

2.1. Client Workflow

The prototype is based on TUF using IPFS as a storage backend for the target files. This means, TUF can store the artifacts in IPFS content-addressable storage (CAS) instead of a traditional file system. This would require modifying the TUF client to interact with IPFS APIs and retrieve artifacts from IPFS. The client workflow in the specification remains the same except the part where client has to verify the downloaded target’s hash with the target’s metadata. This task will be delegated to the IPFS. The client will use CID which contains the hash to locate and retrieve the file. IPFS will only return that file which matches this hash. The integrity is verified in this process itself. Therefore, the TUF client will not perform the verification at the time of download thereby saving time and computation power.

Consider the flowchart below for better explanation in the case of downloading target files between the traditional file system and content addressable system -

2.2. IpfsUpdater API

python-tuf's Updater class provides an implementation of the TUF client workflow for traditional target files. The prototype solution provides an API called IpfsUpdater which is built on top of the python-tuf's Updater for IPFS target files. The IpfsUpdater provides different implementation specifically during downloading of the target. When TUF is used with IPFS, it becomes redundant for TUF to verify artifact integrity. This in fact is done implicitly while downloading files in IPFS. The IpfsUpdater has an IPFS gateway property using which the IpfsUpdater makes a call over HTTP/s to download files. The IpfsUpdater can be initialized as follows -

updater = IpfsUpdater(
    metadata_dir='./metadatas',
    metadata_base_url='https://example.com/metadatas/',
    gateway='http://localhost:8080/', # private gateway
    target_base_url='https://example.com/targets/',
    target_dir='./targets',
)

An example usage can be found in examples/client.py.

3. Project Timeline

3.1. Rejected Adapter Proposal

In the beginning of the project development phase, we had another Implementation Proposal based on the concept of Adapters. The work on this implementation can be found in this unmerged PR: theupdateframework/python-tuf#2415. However the community felt this should not live within the python-tuf codebase (see reasons here theupdateframework/python-tuf#2415 (comment)) and a separate standalone application for the support of IPFS in TUF seemed like a much better solution.

3.2. IpfsUpdater and example client

The work on a separate standalone application began immediately after the rejection of adapter proposal. As suggested by the maintainer of the python-tuf here - theupdateframework/python-tuf#2415 (comment), we developed the IpfsUpdater class which is a subclass of python-tuf's Updater with its own implementation of download_target and find_cached_target for IPFS based target files. An example client was henced created as a reference usage of IpfsUpdater.

3.3. Writing tests

Tests of IpfsUpdater can be found in theupdateframework/tuf-ipfs/tests. It simulates a basic repository with metadatas and keys. Temporary directory is created to store the generated metadatas of simulated repository. All the files present in test_files directory are uploaded to IPFS. The generated CID of each file is used to create target entries in the simulated repository. The same files are downloaded using IpfsUpdater. The result of download_target is asserted against the expected destination path. The code cleans up all the generated metadatas and downloaded targets after the end of tests.

3.4. Documentation

I spent the final days of the program on documentation and polishing the work. The README.md of the project repository serves as the documentation of the entire work.

4. Summary

It has been an amazing journey and I got to learn a lot throughout the program especially from my mentors Aditya Sirish A Yelgundhalli and John Ericson. The entire TUF Community has been supportive by giving constant feedback and suggestions. A special shoutout to Google and CNCF for giving me this wonderful opportunity. I would still be contributing to this project and fine tune my work to make it more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment