Google Summer of Code 2023 at CNCF & TUF
by Shubham Nazare
by Shubham Nazare
The Update Framework (TUF) is an open-source software framework for secure software update distribution. TUF provides a set of tools and protocols for securely distributing software updates to end-users . It aims to provide a secure and robust solution to the problem of software update distribution making it a valuable tool for developers and organizations that need to distribute software updates securely and efficiently.
Issue: theupdateframework/python-tuf#2325
The TUF specification provides explicit guidelines for how artifacts should be hashed and later verified to guarantee their integrity. The TUF specification leaves no room for ambiguity regarding the hashing requirements for artifacts integrity. However, Content Addressable Systems like Git, IPFS (InterPlanetary File System) and OSTree have their own mechanisms for ensuring the integrity of artifacts. When TUF is used with these systems, it is redundant for it to also ensure artifact integrity.
One solution to this problem could be to delegate the responsibility of artifact integrity verification to the content addressable systems themselves, while still using TUF to manage the metadata and provide additional security measures. By delegating the responsibility of artifact integrity verification to the content addressable system, redundancy can be avoided, reducing the overhead and complexity of the update process. This approach also enables organizations to leverage the existing mechanisms provided by content addressable systems, which are often optimized for specific use cases and can provide better performance and scalability compared to generic solutions.
IPFS is one of the content addressable system that has artifact integrity capabilities which can be complemented by all of TUF's other features. In IPFS, each file is given a unique content-based identifier (CID), which is derived from the contents of the file. CIDs contain hash of the file along with other information such as codec. Therefore, every file has a unique hash which acts as a fingerprint. A user will request for the file usings its CID which contains the hash. So, the hash function allows you to check the integrity of the obtained file by comparing the requested hash with the hash of the received file. If the hashes match, it means you received the file as it is. It has not been modified.
Final Project: https://github.com/theupdateframework/tap19-ipfs-poc
The prototype is based on TUF using IPFS as a storage backend for the target files. This means, TUF can store the artifacts in IPFS content-addressable storage (CAS) instead of a traditional file system. This would require modifying the TUF client to interact with IPFS APIs and retrieve artifacts from IPFS. The client workflow in the specification remains the same except the part where client has to verify the downloaded target’s hash with the target’s metadata. This task will be delegated to the IPFS. The client will use CID which contains the hash to locate and retrieve the file. IPFS will only return that file which matches this hash. The integrity is verified in this process itself. Therefore, the TUF client will not perform the verification at the time of download thereby saving time and computation power.
Consider the flowchart below for better explanation in the case of downloading target files between the traditional file system and content addressable system -
python-tuf
's Updater
class provides an implementation of the TUF client workflow for traditional target files. The prototype solution provides an API called IpfsUpdater
which is built on top of the python-tuf
's Updater
for IPFS target files. The IpfsUpdater
provides different implementation specifically during downloading of the target. When TUF is used with IPFS, it becomes redundant for TUF to verify artifact integrity. This in fact is done implicitly while downloading files in IPFS. The IpfsUpdater
has an IPFS gateway property using which the IpfsUpdater makes a call over HTTP/s to download files. The IpfsUpdater can be initialized as follows -
updater = IpfsUpdater(
metadata_dir='./metadatas',
metadata_base_url='https://example.com/metadatas/',
gateway='http://localhost:8080/', # private gateway
target_base_url='https://example.com/targets/',
target_dir='./targets',
)
An example usage can be found in examples/client.py.
In the beginning of the project development phase, we had another Implementation Proposal based on the concept of Adapters. The work on this implementation can be found in this unmerged PR: theupdateframework/python-tuf#2415. However the community felt this should not live within the python-tuf
codebase (see reasons here theupdateframework/python-tuf#2415 (comment)) and a separate standalone application for the support of IPFS in TUF seemed like a much better solution.
The work on a separate standalone application began immediately after the rejection of adapter proposal. As suggested by the maintainer of the python-tuf
here - theupdateframework/python-tuf#2415 (comment), we developed the IpfsUpdater
class which is a subclass of python-tuf's Updater
with its own implementation of download_target
and find_cached_target
for IPFS based target files. An example client was henced created as a reference usage of IpfsUpdater
.
Tests of IpfsUpdater
can be found in theupdateframework/tuf-ipfs/tests. It simulates a basic repository with metadatas and keys. Temporary directory is created to store the generated metadatas of simulated repository. All the files present in test_files directory are uploaded to IPFS. The generated CID of each file is used to create target entries in the simulated repository. The same files are downloaded using IpfsUpdater
. The result of download_target
is asserted against the expected destination path. The code cleans up all the generated metadatas and downloaded targets after the end of tests.
I spent the final days of the program on documentation and polishing the work. The README.md of the project repository serves as the documentation of the entire work.
It has been an amazing journey and I got to learn a lot throughout the program especially from my mentors Aditya Sirish A Yelgundhalli and John Ericson. The entire TUF Community has been supportive by giving constant feedback and suggestions. A special shoutout to Google and CNCF for giving me this wonderful opportunity. I would still be contributing to this project and fine tune my work to make it more robust.