Title: Implement an Azure blobstore filesystem for Python SDK
JIRA Issue: BEAM-6807
Product: Apache Beam
Organization: The Apache Software Foundation
Student: Aldair Coronel Ruiz (aldaircr@ciencias.unam.mx)
Mentors: Pablo Estrada (pabloem@google.com) and Ismaël Mejía (iemejia@gmail.com)
Before, Apache Beam Python SDK already had support for Google Cloud Storage and Amazon Web Services S3. This project aimed to add support for the Azure Blob Storage filesystem to the Python SDK.
BEAM-6807 | PR #12492 In this pull request I managed to implement the following:
blobstoragefilesystem.py
Filesystem interface methods.blobstoragefilesystem_test.py
Filesystem tests using mocks.blobstorageio.py
Using the Python SDK provided by Azure Blob Storage I implemented a client wrapper. With the client wrapper I was able to obtain operations functionality (which the client did not support) that were necessary to implement the methods of the filesystem interface.blobstorageio_test.py
Parse_azfs_path method tests. This method gets the storage account, container and blob from a path:azfs://storage-account/container/blob
.
PR #12734 In this pull request I managed to implement the following:
blobstoragefilesystem.py
The filesystem now receives pipeline options:--azfs_connection_string
Connection string used for Azure/Azurite authentication.-- use_local_azurite
If set, Azurite will be used.
pipeline_options.py
AzureFileSystemOptions class to obtain pipeline options.sdks/java/io/azure/build.gradle
Added 4 tasks:createAzuriteContainer
StartAzuriteContainer
StopAzuriteContainer
AzureLocalIT
Runs and stops Azurite.
blobstorageio.py
Now supports Azurite.blobstorageio_test.py
A wide variety of unit tests that run against Azurite.
BEAM-10836 Implement a gradle task that runs Azure Blobstorage tests against Azurite.
BEAM-10815 Improve authentication story.
I am very happy with the experience and the results. These last 3 months have been full of new knowledge and have given me a better idea of what open source is. I thank my mentor Pablo for helping me with everything I needed.
Regarding the status of the project, I hope it will be useful for people who want to use Apache Beam with Azure Blob Storage. I will stay close by to make any needed improvements.
Thank you all :)
I am new here, where should i started