Created
January 24, 2020 20:35
-
-
Save ORESoftware/566d9c3d00e3858c1161ae9b741fc07e to your computer and use it in GitHub Desktop.
Output of `gsutil help cp`
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NAME | |
cp - Copy files and objects | |
SYNOPSIS | |
gsutil cp [OPTION]... src_url dst_url | |
gsutil cp [OPTION]... src_url... dst_url | |
gsutil cp [OPTION]... -I dst_url | |
DESCRIPTION | |
The gsutil cp command allows you to copy data between your local file | |
system and the cloud, copy data within the cloud, and copy data between | |
cloud storage providers. For example, to upload all text files from the | |
local directory to a bucket you could do: | |
gsutil cp *.txt gs://my-bucket | |
Similarly, you can download text files from a bucket by doing: | |
gsutil cp gs://my-bucket/*.txt . | |
If you want to copy an entire directory tree you need to use the -r option. | |
For example, to upload the directory tree "dir": | |
gsutil cp -r dir gs://my-bucket | |
If you have a large number of files to transfer you might want to use the | |
top-level gsutil -m option (see "gsutil help options"), to perform a | |
parallel (multi-threaded/multi-processing) copy: | |
gsutil -m cp -r dir gs://my-bucket | |
You can pass a list of URLs (one per line) to copy on stdin instead of as | |
command line arguments by using the -I option. This allows you to use gsutil | |
in a pipeline to upload or download files / objects as generated by a program, | |
such as: | |
some_program | gsutil -m cp -I gs://my-bucket | |
or: | |
some_program | gsutil -m cp -I ./download_dir | |
The contents of stdin can name files, cloud URLs, and wildcards of files | |
and cloud URLs. | |
Note: Shells (like bash, zsh) sometimes attempt to expand wildcards in ways | |
that can be surprising. Also, attempting to copy files whose names contain | |
wildcard characters can result in problems. For more details about these | |
issues see the section "POTENTIALLY SURPRISING BEHAVIOR WHEN USING WILDCARDS" | |
under "gsutil help wildcards". | |
HOW NAMES ARE CONSTRUCTED | |
The gsutil cp command strives to name objects in a way consistent with how | |
Linux cp works, which causes names to be constructed in varying ways depending | |
on whether you're performing a recursive directory copy or copying | |
individually named objects; and whether you're copying to an existing or | |
non-existent directory. | |
When performing recursive directory copies, object names are constructed that | |
mirror the source directory structure starting at the point of recursive | |
processing. For example, if dir1/dir2 contains the file a/b/c then the | |
command: | |
gsutil cp -r dir1/dir2 gs://my-bucket | |
will create the object gs://my-bucket/dir2/a/b/c. | |
In contrast, copying individually named files will result in objects named by | |
the final path component of the source files. For example, again assuming | |
dir1/dir2 contains a/b/c, the command: | |
gsutil cp dir1/dir2/** gs://my-bucket | |
will create the object gs://my-bucket/c. | |
The same rules apply for downloads: recursive copies of buckets and | |
bucket subdirectories produce a mirrored filename structure, while copying | |
individually (or wildcard) named objects produce flatly named files. | |
Note that in the above example the '**' wildcard matches all names | |
anywhere under dir. The wildcard '*' will match names just one level deep. For | |
more details see "gsutil help wildcards". | |
There's an additional wrinkle when working with subdirectories: the resulting | |
names depend on whether the destination subdirectory exists. For example, | |
if gs://my-bucket/subdir exists as a subdirectory, the command: | |
gsutil cp -r dir1/dir2 gs://my-bucket/subdir | |
will create the object gs://my-bucket/subdir/dir2/a/b/c. In contrast, if | |
gs://my-bucket/subdir does not exist, this same gsutil cp command will create | |
the object gs://my-bucket/subdir/a/b/c. | |
Note: If you use the | |
`Google Cloud Platform Console <https://console.cloud.google.com>`_ | |
to create folders, it does so by creating a "placeholder" object that ends | |
with a "/" character. gsutil skips these objects when downloading from the | |
cloud to the local file system, because attempting to create a file that | |
ends with a "/" is not allowed on Linux and macOS. Because of this, it is | |
recommended that you not create objects that end with "/" (unless you don't | |
need to be able to download such objects using gsutil). | |
COPYING TO/FROM SUBDIRECTORIES; DISTRIBUTING TRANSFERS ACROSS MACHINES | |
You can use gsutil to copy to and from subdirectories by using a command | |
like: | |
gsutil cp -r dir gs://my-bucket/data | |
This will cause dir and all of its files and nested subdirectories to be | |
copied under the specified destination, resulting in objects with names like | |
gs://my-bucket/data/dir/a/b/c. Similarly you can download from bucket | |
subdirectories by using a command like: | |
gsutil cp -r gs://my-bucket/data dir | |
This will cause everything nested under gs://my-bucket/data to be downloaded | |
into dir, resulting in files with names like dir/data/a/b/c. | |
Copying subdirectories is useful if you want to add data to an existing | |
bucket directory structure over time. It's also useful if you want | |
to parallelize uploads and downloads across multiple machines (potentially | |
reducing overall transfer time compared with simply running gsutil -m | |
cp on one machine). For example, if your bucket contains this structure: | |
gs://my-bucket/data/result_set_01/ | |
gs://my-bucket/data/result_set_02/ | |
... | |
gs://my-bucket/data/result_set_99/ | |
you could perform concurrent downloads across 3 machines by running these | |
commands on each machine, respectively: | |
gsutil -m cp -r gs://my-bucket/data/result_set_[0-3]* dir | |
gsutil -m cp -r gs://my-bucket/data/result_set_[4-6]* dir | |
gsutil -m cp -r gs://my-bucket/data/result_set_[7-9]* dir | |
Note that dir could be a local directory on each machine, or it could be a | |
directory mounted off of a shared file server; whether the latter performs | |
acceptably will depend on a number of factors, so we recommend experimenting | |
to find out what works best for your computing environment. | |
COPYING IN THE CLOUD AND METADATA PRESERVATION | |
If both the source and destination URL are cloud URLs from the same | |
provider, gsutil copies data "in the cloud" (i.e., without downloading | |
to and uploading from the machine where you run gsutil). In addition to | |
the performance and cost advantages of doing this, copying in the cloud | |
preserves metadata (like Content-Type and Cache-Control). In contrast, | |
when you download data from the cloud it ends up in a file, which has | |
no associated metadata. Thus, unless you have some way to hold on to | |
or re-create that metadata, downloading to a file will not retain the | |
metadata. | |
Copies spanning locations and/or storage classes cause data to be rewritten | |
in the cloud, which may take some time (but still will be faster than | |
downloading and re-uploading). Such operations can be resumed with the same | |
command if they are interrupted, so long as the command parameters are | |
identical. | |
Note that by default, the gsutil cp command does not copy the object | |
ACL to the new object, and instead will use the default bucket ACL (see | |
"gsutil help defacl"). You can override this behavior with the -p | |
option (see OPTIONS below). | |
One additional note about copying in the cloud: If the destination bucket has | |
versioning enabled, by default gsutil cp will copy only live versions of the | |
source object(s). For example: | |
gsutil cp gs://bucket1/obj gs://bucket2 | |
will cause only the single live version of gs://bucket1/obj to be copied to | |
gs://bucket2, even if there are noncurrent versions of gs://bucket1/obj. To | |
also copy noncurrent versions, use the -A flag: | |
gsutil cp -A gs://bucket1/obj gs://bucket2 | |
The top-level gsutil -m flag is disallowed when using the cp -A flag, to | |
ensure that version ordering is preserved. | |
CHECKSUM VALIDATION | |
At the end of every upload or download the gsutil cp command validates that | |
the checksum it computes for the source file/object matches the checksum | |
the service computes. If the checksums do not match, gsutil will delete the | |
corrupted object and print a warning message. This very rarely happens, but | |
if it does, please contact gs-team@google.com. | |
If you know the MD5 of a file before uploading you can specify it in the | |
Content-MD5 header, which will cause the cloud storage service to reject the | |
upload if the MD5 doesn't match the value computed by the service. For | |
example: | |
% gsutil hash obj | |
Hashing obj: | |
Hashes [base64] for obj: | |
Hash (crc32c): lIMoIw== | |
Hash (md5): VgyllJgiiaRAbyUUIqDMmw== | |
% gsutil -h Content-MD5:VgyllJgiiaRAbyUUIqDMmw== cp obj gs://your-bucket/obj | |
Copying file://obj [Content-Type=text/plain]... | |
Uploading gs://your-bucket/obj: 182 b/182 B | |
If the checksum didn't match the service would instead reject the upload and | |
gsutil would print a message like: | |
BadRequestException: 400 Provided MD5 hash "VgyllJgiiaRAbyUUIqDMmw==" | |
doesn't match calculated MD5 hash "7gyllJgiiaRAbyUUIqDMmw==". | |
Even if you don't do this gsutil will delete the object if the computed | |
checksum mismatches, but specifying the Content-MD5 header has several | |
advantages: | |
1. It prevents the corrupted object from becoming visible at all, whereas | |
otherwise it would be visible for 1-3 seconds before gsutil deletes it. | |
2. If an object already exists with the given name, specifying the | |
Content-MD5 header will cause the existing object never to be replaced, | |
whereas otherwise it would be replaced by the corrupted object and then | |
deleted a few seconds later. | |
3. It will definitively prevent the corrupted object from being left in | |
the cloud, whereas the gsutil approach of deleting after the upload | |
completes could fail if (for example) the gsutil process gets ^C'd | |
between upload and deletion request. | |
4. It supports a customer-to-service integrity check handoff. For example, | |
if you have a content production pipeline that generates data to be | |
uploaded to the cloud along with checksums of that data, specifying the | |
MD5 computed by your content pipeline when you run gsutil cp will ensure | |
that the checksums match all the way through the process (e.g., detecting | |
if data gets corrupted on your local disk between the time it was written | |
by your content pipeline and the time it was uploaded to Google Cloud | |
Storage). | |
Note: The Content-MD5 header is ignored for composite objects, because such | |
objects only have a CRC32C checksum. | |
RETRY HANDLING | |
The cp command will retry when failures occur, but if enough failures happen | |
during a particular copy or delete operation the cp command will skip that | |
object and move on. At the end of the copy run if any failures were not | |
successfully retried, the cp command will report the count of failures, and | |
exit with non-zero status. | |
Note that there are cases where retrying will never succeed, such as if you | |
don't have write permission to the destination bucket or if the destination | |
path for some objects is longer than the maximum allowed length. | |
For more details about gsutil's retry handling, please see | |
"gsutil help retries". | |
RESUMABLE TRANSFERS | |
gsutil automatically performs a resumable upload whenever you use the cp | |
command to upload an object that is larger than 8 MiB. You do not need to | |
specify any special command line options to make this happen. If your upload | |
is interrupted you can restart the upload by running the same cp command that | |
you ran to start the upload. Until the upload has completed successfully, it | |
will not be visible at the destination object and will not replace any | |
existing object the upload is intended to overwrite. However, see the section | |
on parallel composite uploads, which may leave temporary component objects in | |
place during the upload process. | |
Similarly, gsutil automatically performs resumable downloads (using standard | |
HTTP Range GET operations) whenever you use the cp command, unless the | |
destination is a stream. In this case, a partially downloaded temporary file | |
will be visible in the destination directory. Upon completion, the original | |
file is deleted and overwritten with the downloaded contents. | |
Resumable uploads and downloads store state information in files under | |
~/.gsutil, named by the destination object or file. If you attempt to resume a | |
transfer from a machine with a different directory, the transfer will start | |
over from scratch. | |
See also "gsutil help prod" for details on using resumable transfers | |
in production. | |
STREAMING TRANSFERS | |
Use '-' in place of src_url or dst_url to perform a streaming | |
transfer. For example: | |
long_running_computation | gsutil cp - gs://my-bucket/obj | |
Streaming uploads using the JSON API (see "gsutil help apis") are buffered in | |
memory part-way back into the file and can thus retry in the event of network | |
or service problems. | |
Streaming transfers using the XML API do not support resumable | |
uploads/downloads. If you have a large amount of data to upload (say, more | |
than 100 MiB) it is recommended that you write the data to a local file and | |
then copy that file to the cloud rather than streaming it (and similarly for | |
large downloads). | |
CAUTION: When performing a streaming transfer to or from Cloud Storage, | |
neither Cloud Storage nor gsutil compute a checksum. If you require data | |
validation, use a non-streaming transfer, which performs integrity checking | |
automatically. | |
Note: Streaming transfers are not allowed when the top-level gsutil -m flag | |
is used. | |
SLICED OBJECT DOWNLOADS | |
gsutil uses HTTP Range GET requests to perform "sliced" downloads in parallel | |
when downloading large objects from Google Cloud Storage. This means that disk | |
space for the temporary download destination file will be pre-allocated and | |
byte ranges (slices) within the file will be downloaded in parallel. Once all | |
slices have completed downloading, the temporary file will be renamed to the | |
destination file. No additional local disk space is required for this | |
operation. | |
This feature is only available for Google Cloud Storage objects because it | |
requires a fast composable checksum (CRC32C) that can be used to verify the | |
data integrity of the slices. And because it depends on CRC32C, using sliced | |
object downloads also requires a compiled crcmod (see "gsutil help crcmod") on | |
the machine performing the download. If compiled crcmod is not available, | |
a non-sliced object download will instead be performed. | |
Note: since sliced object downloads cause multiple writes to occur at various | |
locations on disk, this mechanism can degrade performance for disks with slow | |
seek times, especially for large numbers of slices. While the default number | |
of slices is set small to avoid this problem, you can disable sliced object | |
download if necessary by setting the "sliced_object_download_threshold" | |
variable in the .boto config file to 0. | |
PARALLEL COMPOSITE UPLOADS | |
gsutil can automatically use | |
`object composition <https://cloud.google.com/storage/docs/composite-objects>`_ | |
to perform uploads in parallel for large, local files being uploaded to Google | |
Cloud Storage. If enabled (see below), a large file will be split into | |
component pieces that are uploaded in parallel and then composed in the cloud | |
(and the temporary components finally deleted). A file can be broken into as | |
many as 32 component pieces; until this piece limit is reached, the maximum | |
size of each component piece is determined by the variable | |
"parallel_composite_upload_component_size," specified in the [GSUtil] section | |
of your .boto configuration file (for files that are otherwise too big, | |
components are as large as needed to fit into 32 pieces). No additional local | |
disk space is required for this operation. | |
Using parallel composite uploads presents a tradeoff between upload | |
performance and download configuration: If you enable parallel composite | |
uploads your uploads will run faster, but someone will need to install a | |
compiled crcmod (see "gsutil help crcmod") on every machine where objects are | |
downloaded by gsutil or other Python applications. Note that for such uploads, | |
crcmod is required for downloading regardless of whether the parallel | |
composite upload option is on or not. For some distributions this is easy | |
(e.g., it comes pre-installed on macOS), but in other cases some users have | |
found it difficult. Because of this, at present parallel composite uploads are | |
disabled by default. Google is actively working with a number of the Linux | |
distributions to get crcmod included with the stock distribution. Once that is | |
done we will re-enable parallel composite uploads by default in gsutil. | |
Warning: Parallel composite uploads should not be used with NEARLINE, | |
COLDLINE, or ARCHIVE storage class buckets, because doing so incurs an early | |
deletion charge for each component object. | |
Warning: Parallel composite uploads should not be used in buckets that have a | |
`retention policy <https://cloud.google.com/storage/docs/bucket-lock>`_, | |
because the component pieces cannot be deleted until each has met the | |
bucket's minimum retention period. | |
To try parallel composite uploads you can run the command: | |
gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket | |
where bigfile is larger than 150 MiB. When you do this, notice that the upload | |
progress indicator continuously updates for the file, until all parts of the | |
upload complete. If after trying this you want to enable parallel composite | |
uploads for all of your future uploads (notwithstanding the caveats mentioned | |
earlier), you can uncomment and set the "parallel_composite_upload_threshold" | |
config value in your .boto configuration file to this value. | |
Note that the crcmod problem only impacts downloads via Python applications | |
(such as gsutil). If all users who need to download the data using gsutil or | |
other Python applications can install crcmod, or if no Python users will | |
need to download your objects, it makes sense to enable parallel composite | |
uploads (see above). For example, if you use gsutil to upload video assets, | |
and those assets will only ever be served via a Java application, it would | |
make sense to enable parallel composite uploads on your machine (there are | |
efficient CRC32C implementations available in Java). | |
If a parallel composite upload fails prior to composition, re-running the | |
gsutil command will take advantage of resumable uploads for the components | |
that failed, and the component objects will be deleted after the first | |
successful attempt. Any temporary objects that were uploaded successfully | |
before gsutil failed will still exist until the upload is completed | |
successfully. The temporary objects will be named in the following fashion: | |
<random ID>/gsutil/tmp/parallel_composite_uploads/for_details_see/gsutil_help_cp/<hash> | |
where <random ID> is a numerical value, and <hash> is an MD5 hash (not related | |
to the hash of the contents of the file or object). | |
To avoid leaving temporary objects around, you should make sure to check the | |
exit status from the gsutil command. This can be done in a bash script, for | |
example, by doing: | |
if ! gsutil cp ./local-file gs://your-bucket/your-object; then | |
<< Code that handles failures >> | |
fi | |
Or, for copying a directory, use this instead: | |
if ! gsutil cp -c -L cp.log -r ./dir gs://bucket; then | |
<< Code that handles failures >> | |
fi | |
Note that an object uploaded using parallel composite uploads will have a | |
CRC32C hash, but it will not have an MD5 hash (and because of that, users who | |
download the object must have crcmod installed, as noted earlier). For details | |
see "gsutil help crc32c". | |
Parallel composite uploads can be disabled by setting the | |
"parallel_composite_upload_threshold" variable in the .boto config file to 0. | |
CHANGING TEMP DIRECTORIES | |
gsutil writes data to a temporary directory in several cases: | |
- when compressing data to be uploaded (see the -z and -Z options) | |
- when decompressing data being downloaded (when the data has | |
Content-Encoding:gzip, e.g., as happens when uploaded using gsutil cp -z | |
or gsutil cp -Z) | |
- when running integration tests (using the gsutil test command) | |
In these cases it's possible the temp file location on your system that | |
gsutil selects by default may not have enough space. If gsutil runs out of | |
space during one of these operations (e.g., raising | |
"CommandException: Inadequate temp space available to compress <your file>" | |
during a gsutil cp -z operation), you can change where it writes these | |
temp files by setting the TMPDIR environment variable. On Linux and macOS | |
you can do this either by running gsutil this way: | |
TMPDIR=/some/directory gsutil cp ... | |
or by adding this line to your ~/.bashrc file and then restarting the shell | |
before running gsutil: | |
export TMPDIR=/some/directory | |
On Windows 7 you can change the TMPDIR environment variable from Start -> | |
Computer -> System -> Advanced System Settings -> Environment Variables. | |
You need to reboot after making this change for it to take effect. (Rebooting | |
is not necessary after running the export command on Linux and macOS.) | |
SYNCHRONIZING OVER OS-SPECIFIC FILE TYPES (SYMLINKS, DEVICES, ETC.) | |
Please see the section about OS-specific file types in "gsutil help rsync". | |
While that section was written specifically about the rsync command, analogous | |
points apply to the cp command. | |
OPTIONS | |
-a canned_acl Sets named canned_acl when uploaded objects created. See | |
"gsutil help acls" for further details. | |
-A Copy all source versions from a source buckets/folders. | |
If not set, only the live version of each source object is | |
copied. Note: this option is only useful when the destination | |
bucket has versioning enabled. | |
-c If an error occurs, continue to attempt to copy the remaining | |
files. If any copies were unsuccessful, gsutil's exit status | |
will be non-zero even if this flag is set. This option is | |
implicitly set when running "gsutil -m cp...". Note: -c only | |
applies to the actual copying operation. If an error occurs | |
while iterating over the files in the local directory (e.g., | |
invalid Unicode file name) gsutil will print an error message | |
and abort. | |
-D Copy in "daisy chain" mode, i.e., copying between two buckets | |
by hooking a download to an upload, via the machine where | |
gsutil is run. This stands in contrast to the default, where | |
data are copied between two buckets "in the cloud", i.e., | |
without needing to copy via the machine where gsutil runs. | |
By default, a "copy in the cloud" when the source is a | |
composite object will retain the composite nature of the | |
object. However, Daisy chain mode can be used to change a | |
composite object into a non-composite object. For example: | |
gsutil cp -D -p gs://bucket/obj gs://bucket/obj_tmp | |
gsutil mv -p gs://bucket/obj_tmp gs://bucket/obj | |
Note: Daisy chain mode is automatically used when copying | |
between providers (e.g., to copy data from Google Cloud Storage | |
to another provider). | |
-e Exclude symlinks. When specified, symbolic links will not be | |
copied. | |
-I Causes gsutil to read the list of files or objects to copy from | |
stdin. This allows you to run a program that generates the list | |
of files to upload/download. | |
-j <ext,...> Applies gzip transport encoding to any file upload whose | |
extension matches the -j extension list. This is useful when | |
uploading files with compressible content (such as .js, .css, | |
or .html files) because it saves network bandwidth while | |
also leaving the data uncompressed in Google Cloud Storage. | |
When you specify the -j option, files being uploaded are | |
compressed in-memory and on-the-wire only. Both the local | |
files and Cloud Storage objects remain uncompressed. The | |
uploaded objects retain the Content-Type and name of the | |
original files. | |
Note that if you want to use the top-level -m option to | |
parallelize copies along with the -j/-J options, you should | |
prefer using multiple processes instead of multiple threads; | |
when using -j/-J, multiple threads in the same process are | |
bottlenecked by Python's GIL. Thread and process count can be | |
set using the "parallel_thread_count" and | |
"parallel_process_count" boto config options, e.g.: | |
gsutil -o "GSUtil:parallel_process_count=8" \ | |
-o "GSUtil:parallel_thread_count=1" \ | |
-m cp -j html -r /local/source/dir gs://bucket/path | |
-J Applies gzip transport encoding to file uploads. This option | |
works like the -j option described above, but it applies to | |
all uploaded files, regardless of extension. | |
Warning: If you use this option and some of the source files | |
don't compress well (e.g., that's often true of binary data), | |
this option may result in longer uploads. | |
-L <file> Outputs a manifest log file with detailed information about | |
each item that was copied. This manifest contains the following | |
information for each item: | |
- Source path. | |
- Destination path. | |
- Source size. | |
- Bytes transferred. | |
- MD5 hash. | |
- UTC date and time transfer was started in ISO 8601 format. | |
- UTC date and time transfer was completed in ISO 8601 format. | |
- Upload id, if a resumable upload was performed. | |
- Final result of the attempted transfer, success or failure. | |
- Failure details, if any. | |
If the log file already exists, gsutil will use the file as an | |
input to the copy process, and will also append log items to | |
the existing file. Files/objects that are marked in the | |
existing log file as having been successfully copied (or | |
skipped) will be ignored. Files/objects without entries will be | |
copied and ones previously marked as unsuccessful will be | |
retried. This can be used in conjunction with the -c option to | |
build a script that copies a large number of objects reliably, | |
using a bash script like the following: | |
until gsutil cp -c -L cp.log -r ./dir gs://bucket; do | |
sleep 1 | |
done | |
The -c option will cause copying to continue after failures | |
occur, and the -L option will allow gsutil to pick up where it | |
left off without duplicating work. The loop will continue | |
running as long as gsutil exits with a non-zero status (such a | |
status indicates there was at least one failure during the | |
gsutil run). | |
Note: If you're trying to synchronize the contents of a | |
directory and a bucket (or two buckets), see | |
"gsutil help rsync". | |
-n No-clobber. When specified, existing files or objects at the | |
destination will not be overwritten. Any items that are skipped | |
by this option will be reported as being skipped. This option | |
will perform an additional GET request to check if an item | |
exists before attempting to upload the data. This will save | |
retransmitting data, but the additional HTTP requests may make | |
small object transfers slower and more expensive. | |
-p Causes ACLs to be preserved when copying in the cloud. Note | |
that this option has performance and cost implications when | |
using the XML API, as it requires separate HTTP calls for | |
interacting with ACLs. (There are no such performance or cost | |
implications when using the -p option with the JSON API.) The | |
performance issue can be mitigated to some degree by using | |
gsutil -m cp to cause parallel copying. Note that this option | |
only works if you have OWNER access to all of the objects that | |
are copied. | |
You can avoid the additional performance and cost of using | |
cp -p if you want all objects in the destination bucket to end | |
up with the same ACL by setting a default object ACL on that | |
bucket instead of using cp -p. See "gsutil help defacl". | |
Note that it's not valid to specify both the -a and -p options | |
together. | |
-P Causes POSIX attributes to be preserved when objects are | |
copied. With this feature enabled, gsutil cp will copy fields | |
provided by stat. These are the user ID of the owner, the group | |
ID of the owning group, the mode (permissions) of the file, and | |
the access/modification time of the file. For downloads, these | |
attributes will only be set if the source objects were uploaded | |
with this flag enabled. | |
On Windows, this flag will only set and restore access time and | |
modification time. This is because Windows doesn't have a | |
notion of POSIX uid/gid/mode. | |
-R, -r The -R and -r options are synonymous. Causes directories, | |
buckets, and bucket subdirectories to be copied recursively. | |
If you neglect to use this option for an upload, gsutil will | |
copy any files it finds and skip any directories. Similarly, | |
neglecting to specify this option for a download will cause | |
gsutil to copy any objects at the current bucket directory | |
level, and skip any subdirectories. | |
-s <class> The storage class of the destination object(s). If not | |
specified, the default storage class of the destination bucket | |
is used. Not valid for copying to non-cloud destinations. | |
-U Skip objects with unsupported object types instead of failing. | |
Unsupported object types are Amazon S3 Objects in the GLACIER | |
storage class. | |
-v Requests that the version-specific URL for each uploaded object | |
be printed. Given this URL you can make future upload requests | |
that are safe in the face of concurrent updates, because Google | |
Cloud Storage will refuse to perform the update if the current | |
object version doesn't match the version-specific URL. See | |
"gsutil help versions" for more details. | |
-z <ext,...> Applies gzip content-encoding to any file upload whose | |
extension matches the -z extension list. This is useful when | |
uploading files with compressible content (such as .js, .css, | |
or .html files) because it saves network bandwidth and space | |
in Google Cloud Storage, which in turn reduces storage costs. | |
When you specify the -z option, the data from your files is | |
compressed before it is uploaded, but your actual files are | |
left uncompressed on the local disk. The uploaded objects | |
retain the Content-Type and name of the original files but are | |
given a Content-Encoding header with the value "gzip" to | |
indicate that the object data stored are compressed on the | |
Google Cloud Storage servers. | |
For example, the following command: | |
gsutil cp -z html -a public-read \ | |
cattypes.html tabby.jpeg gs://mycats | |
will do all of the following: | |
- Upload the files cattypes.html and tabby.jpeg to the bucket | |
gs://mycats (cp command) | |
- Set the Content-Type of cattypes.html to text/html and | |
tabby.jpeg to image/jpeg (based on file extensions) | |
- Compress the data in the file cattypes.html (-z option) | |
- Set the Content-Encoding for cattypes.html to gzip | |
(-z option) | |
- Set the ACL for both files to public-read (-a option) | |
- If a user tries to view cattypes.html in a browser, the | |
browser will know to uncompress the data based on the | |
Content-Encoding header and to render it as HTML based on | |
the Content-Type header. | |
Note that if you download an object with Content-Encoding:gzip | |
gsutil will decompress the content before writing the local | |
file. | |
-Z Applies gzip content-encoding to file uploads. This option | |
works like the -z option described above, but it applies to | |
all uploaded files, regardless of extension. | |
Warning: If you use this option and some of the source files | |
don't compress well (e.g., that's often true of binary data), | |
this option may result in files taking up more space in the | |
cloud than they would if left uncompressed. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment