Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save yt114/dc5d2fd4437f858bb73e38f0aba362c7 to your computer and use it in GitHub Desktop.
Save yt114/dc5d2fd4437f858bb73e38f0aba362c7 to your computer and use it in GitHub Desktop.
file transferring between Google Colab VM and Google Drive
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {
"id": "sGK4tJRBlE1A",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "## Setting Up\n\n* Install PyDrive Module\n* clone the colab_util gist and place `colab_util.py` to the working directory\n* clone some other git repo as example"
},
{
"metadata": {
"id": "jI85-B3WiMvo",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 2
}
],
"base_uri": "https://localhost:8080/",
"height": 102
},
"outputId": "6030ecc3-8ded-4d64-b3c3-447a158850d3",
"executionInfo": {
"status": "ok",
"timestamp": 1522896019248,
"user_tz": 240,
"elapsed": 3977,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "# !pip install -U -q PyDrive\n# !git clone https://github.com/Joshua1989/python_scientific_computing.git\n!git clone https://gist.github.com/dc7e60aa487430ea704a8cb3f2c5d6a6.git /tmp/colab_util_repo\n!mv /tmp/colab_util_repo/colab_util.py colab_util.py \n!rm -r /tmp/colab_util_repo",
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": "Cloning into '/tmp/colab_util_repo'...\nremote: Counting objects: 15, done.\u001b[K\nremote: Compressing objects: 100% (10/10), done.\u001b[K\nremote: Total 15 (delta 4), reused 0 (delta 0), pack-reused 0\u001b[K\nUnpacking objects: 100% (15/15), done.\n",
"name": "stdout"
}
]
},
{
"metadata": {
"id": "Rzr98u6xmJbL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "## Import colab_util, create a Google Drive handler\nYou need copy the authorization code to the prompted edit widget"
},
{
"metadata": {
"id": "mH24bepIwxl9",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"trusted": false
},
"cell_type": "code",
"source": "from colab_util import *\ndrive_handler = GoogleDriveHandler()",
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "yg9-uZIBmgwK",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "## Compress the target files into *.tar.gz archive"
},
{
"metadata": {
"id": "ZJMfvT3gnzt_",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Make a list of files to archive.\nFor this example, I will compress all notebooks with odd lecture index"
},
{
"metadata": {
"id": "GDK9l-qcY_FJ",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 85
},
"outputId": "83839afe-c216-4848-e818-488c89650203",
"executionInfo": {
"status": "ok",
"timestamp": 1522896034569,
"user_tz": 240,
"elapsed": 332,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "import glob\nipynb_files = sorted(glob.glob('python_scientific_computing/notebooks/*.ipynb'))\narchived_files = [f for f in ipynb_files if int(f.split('/')[-1][:2]) % 2]\nfor f in archived_files:\n print(f)",
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": "python_scientific_computing/notebooks/01_Customize_Your_Jupyter.ipynb\npython_scientific_computing/notebooks/03_Quick_Introduction_on_Python3.ipynb\npython_scientific_computing/notebooks/05_Object_Oriented_Programming.ipynb\npython_scientific_computing/notebooks/07_Random_Numbers.ipynb\n",
"name": "stdout"
}
]
},
{
"metadata": {
"id": "F3YMpQDSn4Pi",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "You can see the paths of 4 Jupyter notebooks have long common prefix.\nIf you make archive directly, you will have a long nested folders 'python_scientific_computing/notebooks/...'.\n\nI simple drop the longest common prefix, i.e. for this case, these 4 notebooks will appear at the root of the archive.\n\n`create_archive` will return the file path of the archive file.\nBy default, the archive file is placed at `/tmp`, you can change the temp directory by setting `temp_folder` argument.\n\nFor detailed shell command output, you can switch on the `verbose` argument manually, otherwith the process will be silence."
},
{
"metadata": {
"id": "7oWs3fYul4l7",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
},
{
"item_id": 2
}
],
"base_uri": "https://localhost:8080/",
"height": 190
},
"outputId": "618c619b-3b57-407a-df6a-8b2febd9395e",
"executionInfo": {
"status": "ok",
"timestamp": 1522896039570,
"user_tz": 240,
"elapsed": 573,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "tar_file_path = create_archive('sample_archive', local_file_paths=archived_files, verbose=True)\ntar_file_path",
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": "ignore the common prefix python_scientific_computing/notebooks/\nrunning shell command: \ntar -czvf /tmp/sample_archive.tar.gz -C python_scientific_computing/notebooks/ 01_Customize_Your_Jupyter.ipynb 03_Quick_Introduction_on_Python3.ipynb 05_Object_Oriented_Programming.ipynb 07_Random_Numbers.ipynb\n01_Customize_Your_Jupyter.ipynb\n03_Quick_Introduction_on_Python3.ipynb\n05_Object_Oriented_Programming.ipynb\n07_Random_Numbers.ipynb\n\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": "'/tmp/sample_archive.tar.gz'"
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"metadata": {
"id": "krZVgZd8pEAC",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "## Managing your Google Drive\n\nIn Google Drive, every file and folder are viewed as \"file\", if file A is in folder B, Google Drive will use the `parent` domain in the API to indicate the inclusion.\nAlso, every file and folder can be uniquely retrieved by its id.\nHowever, using id is inconvinient, my wrapper allows you to use same directory syntax to access files."
},
{
"metadata": {
"id": "I8lFlBgHp7zZ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Create folder `test_folder` in your Google Drive root directory, the return value is the associated id of the created folder"
},
{
"metadata": {
"id": "PLU-L3IBo-zk",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "5837d4fd-d582-455b-d57e-cfa658625414",
"executionInfo": {
"status": "ok",
"timestamp": 1522896955968,
"user_tz": 240,
"elapsed": 1179,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "test_folder_id = drive_handler.create_folder('test_folder')\ntest_folder_id",
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'1mUzJJ4upwVFyjN_EIeDHm_Lga1CPBnzX'"
},
"metadata": {
"tags": []
},
"execution_count": 23
}
]
},
{
"metadata": {
"id": "ZUNanfYjthco",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Create sub-folder `test_sub_folder` under `test_folder`, the return value is the associated id of the created sub-folder"
},
{
"metadata": {
"id": "JESInwkLsbB6",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "3a65907f-0d82-494c-abcd-0b08d53697e7",
"executionInfo": {
"status": "ok",
"timestamp": 1522896959002,
"user_tz": 240,
"elapsed": 2079,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "test_subfolder_id = drive_handler.create_folder('test_sub_folder', parent_path='test_folder')\ntest_subfolder_id",
"execution_count": 24,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3'"
},
"metadata": {
"tags": []
},
"execution_count": 24
}
]
},
{
"metadata": {
"id": "FJS-NHzXt2j0",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "If the folder already exists, the function will tell you its existence and return the id"
},
{
"metadata": {
"id": "gBw01V6csklC",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
},
{
"item_id": 2
}
],
"base_uri": "https://localhost:8080/",
"height": 51
},
"outputId": "3861f5fc-ddba-49f7-e9a1-ce0f0c33ee9a",
"executionInfo": {
"status": "ok",
"timestamp": 1522896960514,
"user_tz": 240,
"elapsed": 918,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "same_subfolder_id = drive_handler.create_folder('test_sub_folder', parent_path='test_folder')\ntest_subfolder_id",
"execution_count": 25,
"outputs": [
{
"output_type": "stream",
"text": "test_sub_folder already exists\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": "'1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3'"
},
"metadata": {
"tags": []
},
"execution_count": 25
}
]
},
{
"metadata": {
"id": "c4ml-BVWuju7",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Let's create more sub folders"
},
{
"metadata": {
"id": "9tFLfBuNuR8W",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"trusted": false
},
"cell_type": "code",
"source": "same_subfolder_id2 = drive_handler.create_folder('test_sub_folder2', parent_path='test_folder')\ntest_subsubfolder_id = drive_handler.create_folder('test_sub_sub_folder', parent_path='test_folder/test_sub_folder2')",
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "BzeFSaJ_u6Pc",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Get id from given directory"
},
{
"metadata": {
"id": "FJVOO6Goumc7",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "a436cbd5-4081-4c6f-838d-449316297d7e",
"executionInfo": {
"status": "ok",
"timestamp": 1522897053646,
"user_tz": 240,
"elapsed": 1125,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "ID = drive_handler.path_to_id('test_folder/test_sub_folder2/test_sub_sub_folder')\nID, test_subsubfolder_id",
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "('1Z6-g09N27bmsQ06RpbkoAWzm1-D4Gkkd', '1Z6-g09N27bmsQ06RpbkoAWzm1-D4Gkkd')"
},
"metadata": {
"tags": []
},
"execution_count": 27
}
]
},
{
"metadata": {
"id": "o7J680ZcvRE8",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "List all files under a folder given its id, by default this is not recursive, but you can set the depth by changing `max_depth` (default 0)"
},
{
"metadata": {
"id": "7EfVNEAIvOM2",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 153
},
"outputId": "4859b296-bc3c-40f9-b1d4-1d0e8de9c295",
"executionInfo": {
"status": "ok",
"timestamp": 1522897168272,
"user_tz": 240,
"elapsed": 445,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "drive_handler.list_folder(test_folder_id)",
"execution_count": 28,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "[{'id': '1DZuDCFMt3aqrU5r29GJ8p77z5f2YqxRk',\n 'link': 'https://drive.google.com/drive/folders/1DZuDCFMt3aqrU5r29GJ8p77z5f2YqxRk',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_folder2'},\n {'id': '1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3',\n 'link': 'https://drive.google.com/drive/folders/1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_folder'}]"
},
"metadata": {
"tags": []
},
"execution_count": 28
}
]
},
{
"metadata": {
"id": "EOFjbqhLv4HA",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Setting `max_depth` to 1, you can see `test_sub_folder2` has an extra field `children`, which contains `test_sub_sub_folder`"
},
{
"metadata": {
"id": "5Bdq9iUAvqWj",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 238
},
"outputId": "e2f6ac2c-a5f3-4e2e-8637-234c97342ca9",
"executionInfo": {
"status": "ok",
"timestamp": 1522897212563,
"user_tz": 240,
"elapsed": 1083,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "drive_handler.list_folder(test_folder_id, max_depth=1)",
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "[{'children': [{'id': '1Z6-g09N27bmsQ06RpbkoAWzm1-D4Gkkd',\n 'link': 'https://drive.google.com/drive/folders/1Z6-g09N27bmsQ06RpbkoAWzm1-D4Gkkd',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_sub_folder'}],\n 'id': '1DZuDCFMt3aqrU5r29GJ8p77z5f2YqxRk',\n 'link': 'https://drive.google.com/drive/folders/1DZuDCFMt3aqrU5r29GJ8p77z5f2YqxRk',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_folder2'},\n {'children': [],\n 'id': '1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3',\n 'link': 'https://drive.google.com/drive/folders/1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_folder'}]"
},
"metadata": {
"tags": []
},
"execution_count": 30
}
]
},
{
"metadata": {
"id": "MA3QNvZAwFfe",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "## Upload files from Colab non-persistent VM to Google Drive"
},
{
"metadata": {
"id": "9gSjqXhOwatk",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Let's try to upload the archive obtained above to path `test_folder/test_sub_folder2`, by listing the files you can find it is in the correct path"
},
{
"metadata": {
"id": "7zXXpeiuvtxL",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 306
},
"outputId": "0ad1bc5d-fa96-4bc8-8e3d-f3cca07dbf63",
"executionInfo": {
"status": "ok",
"timestamp": 1522897359329,
"user_tz": 240,
"elapsed": 2373,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "drive_handler.upload(tar_file_path, parent_path='test_folder/test_sub_folder2')\ndrive_handler.list_folder(test_folder_id, max_depth=1)",
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "[{'children': [{'id': '1BRZpCMQHhTDP2qOExHPFwYKNQCpB2sQ8',\n 'link': 'https://drive.google.com/file/d/1BRZpCMQHhTDP2qOExHPFwYKNQCpB2sQ8/view?usp=drivesdk',\n 'mimeType': 'application/x-tar',\n 'title': 'sample_archive.tar.gz'},\n {'id': '1Z6-g09N27bmsQ06RpbkoAWzm1-D4Gkkd',\n 'link': 'https://drive.google.com/drive/folders/1Z6-g09N27bmsQ06RpbkoAWzm1-D4Gkkd',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_sub_folder'}],\n 'id': '1DZuDCFMt3aqrU5r29GJ8p77z5f2YqxRk',\n 'link': 'https://drive.google.com/drive/folders/1DZuDCFMt3aqrU5r29GJ8p77z5f2YqxRk',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_folder2'},\n {'children': [],\n 'id': '1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3',\n 'link': 'https://drive.google.com/drive/folders/1R66NAmnA6OJMIqMMfXU7_hU2VrXNwaQ3',\n 'mimeType': 'application/vnd.google-apps.folder',\n 'title': 'test_sub_folder'}]"
},
"metadata": {
"tags": []
},
"execution_count": 31
}
]
},
{
"metadata": {
"id": "KQxwqvf2wqj9",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "## Download files from Google Drive to Colab non-persistent VM"
},
{
"metadata": {
"id": "lNYVi1myxAn6",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Now let's download the archive file in Colab VM with path `downloaded_archive.tar.gz`, which means downloaded to the current working directory with this name"
},
{
"metadata": {
"id": "gh9nzIrGwYho",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"trusted": false
},
"cell_type": "code",
"source": "drive_handler.download('downloaded_archive.tar.gz', target_path='test_folder/test_sub_folder2/sample_archive.tar.gz')",
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "mW8thm_oyUb_",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Now you can see the file `downloaded_archive.tar.gz` in the current working directory.\n\nWe can make a new directory to store the extracted archive."
},
{
"metadata": {
"id": "qQRZQf_Bx2r9",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 51
},
"outputId": "7b3ac387-fd45-4d76-d4ca-f09eca4582ad",
"executionInfo": {
"status": "ok",
"timestamp": 1522897958029,
"user_tz": 240,
"elapsed": 1364,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "ls",
"execution_count": 43,
"outputs": [
{
"output_type": "stream",
"text": "colab_util.py downloaded_archive.tar.gz \u001b[0m\u001b[01;34mpython_scientific_computing\u001b[0m/\r\n\u001b[01;34mdatalab\u001b[0m/ \u001b[01;34m__pycache__\u001b[0m/\r\n",
"name": "stdout"
}
]
},
{
"metadata": {
"id": "k3j3gTZ8ysFw",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"trusted": false
},
"cell_type": "code",
"source": "!mkdir downloeded_files",
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "T2QYs_LFyxPM",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 51
},
"outputId": "245a99ed-7495-4839-f473-4a64c8d4731a",
"executionInfo": {
"status": "ok",
"timestamp": 1522897985477,
"user_tz": 240,
"elapsed": 1444,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "ls",
"execution_count": 45,
"outputs": [
{
"output_type": "stream",
"text": "colab_util.py downloaded_archive.tar.gz \u001b[0m\u001b[01;34m__pycache__\u001b[0m/\r\n\u001b[01;34mdatalab\u001b[0m/ \u001b[01;34mdownloeded_files\u001b[0m/ \u001b[01;34mpython_scientific_computing\u001b[0m/\r\n",
"name": "stdout"
}
]
},
{
"metadata": {
"id": "Ixyb2-UeyzKA",
"colab_type": "text"
},
"cell_type": "markdown",
"source": "Finally we need to extract the archive to the folder `downloeded_files`"
},
{
"metadata": {
"id": "NssYJIcbyxnv",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"trusted": false
},
"cell_type": "code",
"source": "extract_archive('downloaded_archive.tar.gz', target_folder='downloeded_files')",
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "v5Euhw9ZzEJd",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"output_extras": [
{
"item_id": 1
}
],
"base_uri": "https://localhost:8080/",
"height": 51
},
"outputId": "36217e7e-f2f6-47ff-b94f-684618db2072",
"executionInfo": {
"status": "ok",
"timestamp": 1522898066813,
"user_tz": 240,
"elapsed": 1589,
"user": {
"displayName": "Joshua Lian",
"photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
"userId": "112640645742688411146"
}
},
"trusted": false
},
"cell_type": "code",
"source": "ls downloeded_files/",
"execution_count": 47,
"outputs": [
{
"output_type": "stream",
"text": "01_Customize_Your_Jupyter.ipynb 05_Object_Oriented_Programming.ipynb\r\n03_Quick_Introduction_on_Python3.ipynb 07_Random_Numbers.ipynb\r\n",
"name": "stdout"
}
]
},
{
"metadata": {
"id": "THUVqFPJzFcV",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"trusted": false
},
"cell_type": "code",
"source": "",
"execution_count": 0,
"outputs": []
}
],
"metadata": {
"colab": {
"name": "Example Notebook of colab_util.ipynb",
"version": "0.3.2",
"views": {},
"default_view": {},
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"varInspector": {
"window_display": false,
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"library": "var_list.py",
"delete_cmd_prefix": "del ",
"delete_cmd_postfix": "",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"library": "var_list.r",
"delete_cmd_prefix": "rm(",
"delete_cmd_postfix": ") ",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
]
},
"gist": {
"id": "",
"data": {
"description": "Colab Notebooks/Example Notebook of colab_util.ipynb",
"public": true
}
},
"language_info": {
"name": "python",
"version": "3.6.4",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
# !pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os
import subprocess
from pathlib import Path
__all__ = [
'create_archive',
'extract_archive',
'GoogleDriveHandler'
]
def create_archive(zip_name, local_file_paths, temp_folder='/tmp', verbose=False):
zip_name = '{0}/{1}'.format(temp_folder, zip_name) + '.tar.gz' * ('.tar.gz' not in zip_name)
# Filter out non-existing files and directorys
zipped_files = []
for f in local_file_paths:
if not Path(f).exists():
print('file {0} does not exist, ignore it'.format(f))
else:
zipped_files.append(f)
# Find common prefix to avoid a too many level folders
common_prefix = ''
for chars in zip(*zipped_files):
if len(set(chars)) == 1:
common_prefix += chars[0]
else:
break
common_prefix = '/'.join(common_prefix.split('/')[:-1]) + '/'
# Excuting tar.gz format compression
L = len(common_prefix)
zipped_files = ' '.join([f[L:] for f in zipped_files])
cmd = 'tar -czvf {0} -C {1} {2}'.format(zip_name, common_prefix, zipped_files)
if verbose:
print('ignore the common prefix {0}'.format(common_prefix))
print('running shell command:','\n'+cmd)
result = subprocess.check_output(cmd, shell=True).decode('utf-8')
if verbose: print(result)
# Return absolute path of the tar.gz file
return zip_name
def extract_archive(zip_path, target_folder='./', verbose=False):
cmd = 'tar -xf {0} -C {1}'.format(zip_path, target_folder)
if verbose: print('running shell command:','\n'+cmd)
result = subprocess.check_output(cmd, shell=True).decode('utf-8')
if verbose: print(result)
class GoogleDriveHandler:
def __init__(self):
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
self.drive = GoogleDrive(gauth)
def path_to_id(self, rel_path, parent_folder_id='root'):
rel_path = '/'.join(list(filter(len, rel_path.split('/'))))
if rel_path == '':
return parent_folder_id
else:
first, *rest = list(filter(len, rel_path.split('/')))
file_dict = {f['title']:f for f in self.list_folder(parent_folder_id)}
if first not in file_dict:
raise Exception('{0} not exist'.format(first))
else:
return self.path_to_id('/'.join(rest), file_dict[first]['id'])
def list_folder(self, root_folder_id='root', max_depth=0):
query = "'{0}' in parents and trashed=false".format(root_folder_id)
file_list, folder_type = [], 'application/vnd.google-apps.folder'
for f in self.drive.ListFile({'q': query}).GetList():
if f['mimeType'] == folder_type and max_depth > 0:
file_list.append(
{
'title': f['title'],
'id': f['id'],
'link': f['alternateLink'],
'mimeType': f['mimeType'],
'children': self.list_folder(f['id'], max_depth-1)
}
)
else:
file_list.append(
{
'title':f['title'],
'id': f['id'],
'link':f['alternateLink'],
'mimeType': f['mimeType']
}
)
return file_list
def create_folder(self, folder_name, parent_path=''):
parent_folder_id = self.path_to_id(parent_path)
folder_type = 'application/vnd.google-apps.folder'
file_dict = {f['title']:f for f in self.list_folder(parent_folder_id)}
if folder_name not in file_dict:
folder_metadata = {
'title' : folder_name,
'mimeType' : folder_type,
'parents': [{'kind': 'drive#fileLink', 'id': parent_folder_id}]
}
folder = self.drive.CreateFile(folder_metadata)
folder.Upload()
return folder['id']
else:
if file_dict[folder_name]['mimeType'] != folder_type:
raise Exception('{0} already exists as a file'.format(folder_name))
else:
print('{0} already exists'.format(folder_name))
return file_dict[folder_name]['id']
def upload(self, local_file_path, parent_path='', overwrite=True):
parent_folder_id = self.path_to_id(parent_path)
file_dict = {f['title']:f for f in self.list_folder(parent_folder_id)}
file_name = local_file_path.split('/')[-1]
if file_name in file_dict and overwrite:
file_dict[file_name].Delete()
file = self.drive.CreateFile(
{
'title': file_name,
'parents': [{'kind': 'drive#fileLink', 'id': parent_folder_id}]
}
)
file.SetContentFile(local_file_path)
file.Upload()
return file['id']
def download(self, local_file_path, target_path):
target_id = self.path_to_id(target_path)
file = self.drive.CreateFile({'id': target_id})
file.GetContentFile(local_file_path)

Run these codes first in order to install the necessary libraries and perform authorization.

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Click the link, copy verification code and paste it to text box. After completion of the authorization process,

mount your Google Drive:

!mkdir -p drive
!google-drive-ocamlfuse drive

There are several approaches

  • Mount Google Drive in local Colab VM
  • Upload and download via browser
  • Use colab_util.py in python script
from google.colab import files
# Upload local files to Colab VM
uploaded = files.upload()
# Download Colab VM fiels to local
files.download('target_file_name')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment