Skip to content

Instantly share code, notes, and snippets.

@pgressa
Last active August 29, 2023 10:51
Show Gist options
  • Save pgressa/5d7321ea1bd9e3afc5a62b0b9a1a6a83 to your computer and use it in GitHub Desktop.
Save pgressa/5d7321ea1bd9e3afc5a62b0b9a1a6a83 to your computer and use it in GitHub Desktop.

Cloud Object Storage Abstraction Layer

Q: Shouldn't we name it like Cloud Agnostic Object Storage Interface

The Object Storage is one of the fundamental services provided by the cloud provides. The objects are generally stored in the tree directory structure similary like on the file system. The objects are consumed either by API or in most cases by their unique HTTP URL or by cloud specific internal URI (See Appendix 1).

Motivation

The initiative addresses a growing need to support the hybrid cloud applications by allowing the developers to transparently access and manipulate the objects from within applications running in various cloud environments.

Goal

The goal of the object storage abstraction layer is to create an object storage interface that is independent on the object storage provider. The abstraction layer hides the provider's object storage specifics and allows the developer to interact with the object storage no matter the subject of interaction is any of the supported object storage services (AWS S3, Google Cloud Storage, Azure Blob Storage, Oracle Cloud Object Storage) or local storage.

Use cases

The general use cases related to the manipulation with the objects on objects storage.

Put object

Upload objects to the object storage.

Example:

# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp micronaut-buffer-netty-3.1.1.pom s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom

# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_upload
az storage blob upload --container-name micronaut-container --file micronaut-buffer-netty-3.1.1.pom --sas ... 

# https://cloud.google.com/storage/docs/uploading-objects
gsutil cp OBJECT_LOCATION gs://DESTINATION_BUCKET_NAME/

# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/put.html
oci os object put --bucket-name $bucket_name --file $file

Get object

Receive object from object storage.

Example:

# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom micronaut-buffer-netty-3.1.1.pom

# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_download
az storage blob download --container-name <The container name> --file <Path of file to write out to> --name <The blob name>

# https://cloud.google.com/storage/docs/downloading-objects#gsutil
gsutil cp gs://BUCKET_NAME/OBJECT_NAME SAVE_TO_LOCATION

# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/get.html
oci os object get --bucket-name $bucket_name --file $file --name $name

Update object

Update already existing object on object storage. Based on the object storage policy, this operation may fail.

Example:

# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp micronaut-buffer-netty-3.1.1.pom s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom

# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_upload
az storage blob upload --container-name micronaut-container --file micronaut-buffer-netty-3.1.1.pom --sas $sas

# https://cloud.google.com/storage/docs/uploading-objects
gsutil cp OBJECT_LOCATION gs://DESTINATION_BUCKET_NAME/

# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/put.html
oci os object put --bucket-name $bucket_name --file $file

Copy objects

Copy objects in the scope of one object storage or in between the object storages.

Example:

# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom s3://micronaut-object-storage/io/micronaut-buffer-netty-3.1.1.pom

# https://docs.microsoft.com/en-us/cli/azure/storage/blob/copy?view=azure-cli-latest#az_storage_blob_copy_start
az storage blob copy start --account-name MyAccount --destination-blob MyDestinationBlob --destination-container MyDestinationContainer --sas-token $sas --source-uri https://storage.blob.core.windows.net/photos

# https://cloud.google.com/storage/docs/copying-renaming-moving-objects
gsutil cp gs://SOURCE_BUCKET_NAME/SOURCE_OBJECT_NAME gs://DESTINATION_BUCKET_NAME/NAME_OF_COPY

# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/copy.html
oci os object copy --bucket-name $bucket_name --destination-bucket $destination_bucket --source-object-name $source_object_name

Delete object

Delete object from object storage. Based on the object storage policy, this operation may fail.

Example:

# https://docs.aws.amazon.com/cli/latest/reference/s3/rm.html
aws s3 rm s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom

# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_delete
az storage blob delete -c micronaut-container -n MyBlob --account-name mystorageaccount

# https://cloud.google.com/storage/docs/deleting-objects
gsutil rm gs://BUCKET_NAME/OBJECT_NAME

# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/delete.html
oci os object delete --bucket-name $bucket_name --object-name $object_name

List objects

List objects from object storage.

Example:

# https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
aws s3 ls s3://micronaut-object-storage

# https://docs.microsoft.com/en-us/cli/azure/storage/blob/directory?view=azure-cli-latest#az_storage_blob_directory_list
az storage blob directory list -c MyContainer -d DestinationDirectoryPath --account-name MyStorageAccount

# https://cloud.google.com/storage/docs/listing-objects
gsutil ls -r gs://BUCKET_NAME/** 

# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/list.html
oci os object list --bucket-name $bucket_name

Sync objects

Sync directories in between object storages or with local directory.

Example:

# https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
aws s3 sync local-dir s3://micronaut-object-storage

# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_sync
az storage blob sync -c mycontainer --account-name mystorageccount --account-key 00000000 -s "path/to/directory"

# https://cloud.google.com/storage/docs/gsutil/commands/rsync
gsutil rsync data gs://mybucket/data

# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/sync.html
oci os object sync --bn backup --src-dir .

Design

The design consists from the 3 parts:

  1. The low level api - the core of abstraction layer
  2. The injection - how to work with the abstraction layer
  3. The extensions - introduces various extensions to ease the interaction with object storage abstraction layer

Low Level API

This section describes the main building blocks of the Object Storage abstraction layer.

Object

Every object is uniquely identified by its path within the bucket. Additionally the objects have metadata associated with them. The metadata represents various object properties (like content type, tags, cache control, ..) and are provided when creating or updating the object on the object storage. See Appendix 3 for the object metadata list that are generally provided by cloud providers.

For the sake of the object-oriented approach, the object on object storage has its own object representation:

interface ObjectStorageEntry {
    /**
     * The object name. For example {@code picture.jpg}
     * @return object name
     */
    String getName();

    /**
     * The object path on object storage. For example {@code /path/to}
     *
     * @return object path or empty string if the object is placed at the root of bucket
     */
    String getPath();

    /**
     * The object absolute path. For example {@code /path/to/picture.jpg}
     * @return absolute path
     */
    String getAbsolutePath();

    /**
     * The object metadata.
     *
     * @return map of object metadata
     */
    Map<ObjectMeta, String> getMetadata();

    /**
     * The object content.
     *
     * @return object content.
     */
    InputStream getInputStream();
}

Object storage

Note: In general the ObjectStorage is understood rather as the service than the storage itself. Every object storage service has some internal architecture. The most known and used name for the place where are the objects stored is the bucket. The bucket is used by AWS, GoogleCloud, OracleCloud. However, the inclusion of the bucket into the internal architecture differs. For AWS,GCloud the name of the bucket is unique globally. For Oracle Cloud the uniqueness is in scope of the namespace (tenant). On the opposite the Azure uses container instead of bucket and the uniquness is in scope of the storage account. Because of that the noun ObjectStorage is IMHO the best representation of the main interaction with the service itself as it abstracts all the cloud provider's specifics and yet is not related to any cloud provider nomenclature.

The ObjectStorage interface provides the common operations with objects on the logical storage. Such abstraction allows to have specific ObjectStorage configurations like authentication, ACL policies, tags or hooks with respect to the implemented adapter for the given provider. For example if the application has two object storages configured, then there are two ObjectStorage beans created.

The location of the object on the logical storage is represented as the file path in directory structure, without containing any object storage provider specifics. For example the object on AWS S3 represented as S3 URI s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom is represented as micronaut-buffer-netty-3.1.1.pom. Such approach allows the developer to independently work with objects leaving the object storage specifics be handled by the respective implementation of the ObjectStorage interface.

The API of ObjectStorage mixes Java native types methods with the object-oriented approach. The Java native types methods are for quick and easy interaction.

interface ObjectStorage {

    /**
     * Upload object to the object storage.
     * @param objectPath the object path
     * @param inputStream the object content
     * @param metadata the object metadata
     * @throws ObjectStorageException if there was a failure to store object
     */
    void put(String objectPath, InputStream inputStream, Map<ObjectMeta, String> metadata) throws ObjectStorageException;

    /**
     * Upload object to object storage.
     *
     * @param object the object
     * @throws ObjectStorageException
     */
    void put(ObjectStorageObject object) throws ObjectStorageException;

    /**
     * Get the object from object storage.
     *
     * @param objectPath the object path 
     * @return the object content as input stream or null if object not exists
     * @throws ObjectStorageException if there was a failure to store object
     */
    InputStream get(String objectPath) throws ObjectStorageException;

    /**
     * Get the object from object storage.
     *
     * @param objectPath the object path
     * @return the object or null if object not exists
     * @throws ObjectStorageException if there was a failure to store object
     */
    ObjectStorageObject get(String objectPath) throws ObjectStorageException;

    /**
     * Copy the object in scope of the object storage.
     *
     * @param objectSourcePath object source path
     * @param objectTargetPath object target path
     */
    void copy(String objectSourcePath, String objectTargetPath) throws ObjectStorageException;

    /**
     * Delete the object. 
     * @param objectName object name in format {@code /foo/bar/file}
     */
    void delete(String objectName) throws ObjectStorageException;

    /**
     * List objects filtered to path. The list contains the objects.
     *
     * @implNote the implementation uses paging if possible
     * @apiNote this call may lead to 1 + N requests.
     */
    Iterable<ObjectStorageObject> list(String path) throws ObjectStorageException;

    /**
     * List object absolute paths filtered to {@code path}. The list contains the files in format {@code /foo/bar/file}
     *
     * @implNote the implementation uses paging if possible
     */
    Iterable<String> list(String path) throws ObjectStorageException;

    /**
     * Sync the objects from {@code sourcePath} to {@code targetPath}. The path can be local path or path on object storage. The local path string contains prefix {@code file://}.
     * @param sourcePath the source path
     * @param targetPath the target path
     * @param recursive whether to recursively iterate over subdirectories in sourcePath
     */
    void sync(String sourcePath, String targetPath, boolean recursive) throws ObjectStorageException;
}

Since Micronaut is primarily asynchronously oriented framework, here is the reactive version:

interface ObjectStorageReactive {

    /**
     * Upload object to the object storage.
     * @param objectPath the object path
     * @param inputStream the object content
     * @param metadata the object metadata
     * @throws ObjectStorageException if there was a failure to store object
     */
    Publisher<Boolean> put(String objectPath, InputStream inputStream, Map<ObjectMeta, String> metadata) throws ObjectStorageException;

    /**
     * Upload object to object storage.
     *
     * @param object the object
     * @throws ObjectStorageException
     */
    Publisher<Boolean> put(ObjectStorageObject object) throws ObjectStorageException;

    /**
     * Get the object from object storage.
     *
     * @param objectPath the object path 
     * @return the object content as input stream or null if object not exists
     * @throws ObjectStorageException if there was a failure to store object
     */
    Publisher<InputStream> get(String objectPath) throws ObjectStorageException;

    /**
     * Get the object from object storage.
     *
     * @param objectPath the object path
     * @return the object or null if object not exists
     * @throws ObjectStorageException if there was a failure to store object
     */
    Publisher<ObjectStorageObject> get(String objectPath) throws ObjectStorageException;

    /**
     * Copy the object in scope of the object storage.
     *
     * @param objectSourcePath object source path
     * @param objectTargetPath object target path
     */
    Publisher<Boolean> copy(String objectSourcePath, String objectTargetPath) throws ObjectStorageException;

    /**
     * Delete the object. 
     * @param objectName object name in format {@code /foo/bar/file}
     */
    Publisher<Boolean> delete(String objectName) throws ObjectStorageException;

    /**
     * List objects filtered to path. The list contains the objects.
     *
     * @implNote the implementation uses paging if possible
     * @apiNote this call may lead to 1 + N requests.
     */
    Publisher<ObjectStorageObject> list(String path) throws ObjectStorageException;

    /**
     * List object absolute paths filtered to {@code path}. The list contains the files in format {@code /foo/bar/file}
     *
     * @implNote the implementation uses paging if possible
     */
    Publisher<String> list(String path) throws ObjectStorageException;

    /**
     * Sync the objects from {@code sourcePath} to {@code targetPath}. The path can be local path or path on object storage. The local path string contains prefix {@code file://}.
     * @param sourcePath the source path
     * @param targetPath the target path
     * @param recursive whether to recursively iterate over subdirectories in sourcePath
     */
    Publisher<Boolean> sync(String sourcePath, String targetPath, boolean recursive) throws ObjectStorageException;
}

Q: Since the object locator is a path without object storage specific, using the String as locator was a first call. However, the java.nio.file.Path could be a more suited option. This also means the ObjectStorageObject#getPath and ObjectStorageObject#getAbsolutePath would reflect that. See extensions where this is discussed more

Example 1: Interaction using Java native types
ObjectStorage objectStorage = ...;
        List<String> objects = objectStorage.list("/public/www");
        if(objects.get("/public/www/icon.png") != null){
        InputStream is = new FileInputStream(new File("src/main/resources/sample.txt"));
        objectStorage.put("/public/www/icon.png", is);
        }

Configuration

The common configuration interface contains just name of the object storage only, e.g. for s3://micronaut-object-storage/ it is micronaut-object-storage. The rest of the properties is cloud provider specific:

public interface ObjectStorageConfiguration {

    /**
     * The name of the object storage.
     * @return object storage name
     */
    String getName();
} 

The "DSL" of configuration:

object-storage:
  <provider-name>:
    <object-storage-bean-name/object-storage-name>:
      <implementation details configuration>

Example for hybrid cloud application:

Note the same name of the object storage.

application-ec2.yml

object-storage:
  aws:
    public-images:
      access-key-id: xxx
      secret-access-key: xxx

application-oraclecloud.yml:

object-storage:
  oracle-cloud:
    public-images:

application-azure.yml:

object-storage:
  azure:
    public-images:
      storage-account: xxx

application-gcp.yml:

object-storage:
  gcp:
    public-images:

Example for using more cloud providers:

application.yml:

object-storage:
  oracle-cloud:
    public-images-on-aws:
      bucket-name: public-images
      access-key-id: xxx
      secret-access-key: xxx
  aws:
    public-images-on-oracle-cloud:
      bucket-name: public-images
  azure:
    public-images-on-azure:
      container-name: public-images
      storage-account: xxx
  gcp:
    public-images-on-gcp:
      bucket-name: public-images
Configuration Alternative

The configuration can be merged into flat structure, having the object storage implementation be driven by mandatory field provider.

The "DSL" of configuration:

object-storage:
  <object-storage-name>:
    provider: [azure|gcp|aws|oracle-cloud]
    <implementation details configuration>

Example for using more cloud providers:

object-storage:
  public-images-on-aws:
    bucket-name: public-images
    access-key-id: xxx
    secret-access-key: xxx
    provider: oracle-cloud

  public-images-on-oracle-cloud:
    bucket-name: public-images
    provider: aws

  public-images-on-azure:
    container-name: public-images
    storage-account: xxx
    provider: azure

  public-images-on-gcp:
    bucket-name: public-images
    provider: gcp
ObjectStorage Qualifier

The ObjectStorage qualifier is evaluated from the ObjectStorageConfiguration#getName property. The property name is derived from the configuration (in this order):

  • the name,
    object-storage:
      gcp:
        public-images-on-gcp:
          name: micronaut-object-storage
  • the bean qualifier micronaut-object-storage
    object-storage:
      gcp:
        micronaut-object-storage:

The reason for having this way of qualifier evaluation lies within the hybrid cloud use case when it is not possible to have unified name of the bucket across the cloud providers. This is a case for AWS S3 and Google Cloud Object Storage where the bucket are globally uniquely identified.

Injection

For the configuration:

object-storage:
  gcp:
    micronaut-object-storage:

the injection using the the @Named annotation looks like for the configuration:

import jakarta.inject.Named;

public class ImageService {

    public ImageService(@Named("micronaut-object-storage") ObjectStorage objectStorage){
        //..
    }

}

Extensions

Extension 1: ObjectStorage beans based on configured SDK authentication using @Named

Allows to create ObjectStorage beans using qualifiers if there's either

  1. configured cloud provider SDK like
  2. the configuration can be automatically deduced (~/.aws/, ~/.oci/`).

Then for example if OCI sdk is present and the credentials were automatically deduced:

import jakarta.inject.Named;

public class ImageService {

    public ImageService(@Named("micronaut-object-storage") ObjectStorage objectStorage){
        //..
    }

}

Will cause the ObjectStorage for Oracle Cloud Object Storage for bucket micronaut-object-storage using the namespace and region evaluated from the OCI sdk are used.

The advantage is there's no need to configure the object storage in application.yml.

Note that this is possible to do only in case there's one cloud provider ObjectStorage library presented on classpath. In case there would be two supported ObjectStorage implementation, the internals wouldn't have a way how to find out what cloud provider to use.

Extension 2: ObjectStorage beans based on configured SDK authentication using specialised bean qualifiers

This allows to leverage the automatic SDK evaluation even when there are more ObjectStorage implementations by using specialised qualifiers like: @AwsObjectStorage, @OciObjectStorage

Then:

import jakarta.inject.Named;

public class ImageService {

    public ImageService(
            @AwsObjectStorage("micronaut-object-storage") ObjectStorage objectStorage,
            @OciObjectStorage("micronaut-object-storage") ObjectStorage objectStorage
    ){
        //...
    }

}

Will cause the ObjectStorage annotated by @OciObjectStorage will create for Oracle Cloud Object Storage for bucket micronaut-object-storage using the namespace and region evaluated from the OCI sdk are used. Similarly for the @AwsObjectStorage.

Note that by using the cloud provider specific annotation the cloud agnostic approach is broken.

Extension 3: The ResourceLoader for quick access of objects using agnostic URI

Implements the ResourceLoader in order to get the objects using shorter URI in common format:

<object-storage-name>://path/to/file
  • is the ObjectStorage#getName

Then for configuration:

object-storage:
  aws:
    public-images:
      access-key-id: xxx
      secret-access-key: xxx

The locator would be: public-images://path/to/file

Extension 4: The ResourceLoader for quick access of objects using cloud specific URI

Implements the ResourceLoader in order to get the objects using shorter URI in common format:

[s3|os|gs|azb]:([cloud-provider-specifics]:)*//<storage-name>/path/to/file

Where for the cloud providers:

  • AWS
    • format: s3://<bucket-name>/path/to/file
    • example: s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
  • Azure
    • format: azb:<storage-account-name>://<container>/path/to/file
    • example: azb:micronautpgressatest://micronaut-object-storate/micronaut-buffer-netty-3.1.1.pom
  • Google Cloud
    • format: gs://<bucket-name>/path/to/file
    • example: sgs//micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
  • Oracle Cloud
    • format: os:<region>:<namespace>://<bucket-name>/path/to/file
    • example: os:us-ashburn-1:cloudnative-devrel://micronaut-object-storate/micronaut-buffer-netty-3.1.1.pom

Extension 4: StreamingFileUpload using agnostic api

Idea is to implement cloud agnostic StreamingFileUpload where when transferTo(String) method would be used using cloud agnostic locator:

<object-storage-name>://path/to/file
    public Publisher<HttpResponse<String>> upload(ObjectStorageStreamingFileUpload upload){
    
        Publisher<Boolean> uploadPublisher = file.transferTo("public-image://www/uploads/" + upload.getFilename())
        return Mono.from(uploadPublisher)  
            .map(success -> {
                if (success) {
                    return HttpResponse.ok("Uploaded");
                } else {
                    return HttpResponse.<String>status(CONFLICT)
                                       .body("Upload Failed");
                }
            });
    }

Extension 5: StreamingFileUpload using cloud specifc URI

The same like above but instead of using micronaut object storage, the cloud specific URI woudl be used.

Extension 6: java.nio extension

The idea is to implement the https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html for given cloud providers leveriging already existing beans etc.

There are projects that implement to some extend the java.nio:

Appendix

Appendix 1: Object URL access

AWS

Localator Format Example
Object URL https://<bucket-name>.s3.<region>.amazonaws.com/<object-name> https://micronaut-object-storage.s3.eu-west-1.amazonaws.com/micronaut-buffer-netty-3.1.1.pom
S3 URI s3://<bucket-name>/<object-name> s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
ARN arn:aws:s3:::<bucket-name>/<object-name> arn:aws:s3:::micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom

Azure

Localator Format Example
Object URL https://<storage-account-name>.blob.core.windows.net/<container>/<blob-name> https://micronautpgressatest.blob.core.windows.net/micronaut-object-storate/micronaut-buffer-netty-3.1.1.pom

Google Cloud

Localator Format Example
Object URL https://storage.cloud.google.com/<bucket-name>/<object-name> https://storage.cloud.google.com/micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
gsutil URI gs://<bucket-name>/<object-name> gs://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom

Oracle Cloud

Localator Format Example
Object URL https://objectstorage.<region>.oraclecloud.com/n/<namespace>/b/<bucket-name>/o/<object-name> https://objectstorage.us-ashburn-1.oraclecloud.com/n/cloudnative-devrel/b/micronaut-object-storage/o/micronaut-buffer-netty-3.1.1.pom

Appendix 2: Internal structure of object storage per provider

Even though the object storage use case is the file manipulation, the internal complexity varies among the cloud provides. For example the logical nesting of an object in the service:

Azure

Storage account -> Container -> <Object>

Amazon Web Services

Bucket -> <Object>

Google Cloud

Bucket -> <Object>

Oracle Cloud

Namespace (Tenant) -> Bucket -> <Object>

Appendix 3: Object common properties

Name Description
cache-control Specifies caching behavior along the request/reply chain.
content-type Specify an explicit content type for this operation. This value overrides any guessed mime types.
content-language The language the content is in.
content-encoding Specifies what content encodings have been applied to the object and thus what decoding mechanisms must be applied to obtain the media-type referenced by the Content-Type header field.
meta A map of metadata to store with the objects
expires The date and time at which the object is no longer cacheable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment