jjjake/abouts3.md

## abouts3.md

      
    Raw
  

              abouts3.md
            
          
    GENERAL TODO:

The examples are all over the place. They need to be more consistent.
Check that x-archive-queue-derive header. I just skimmed it and it doesn't seem right.
Investigate getting an "ias3support@archive.org" address for support requests
Some of the standard metadata fields are repeatable, some are not. State this in the descriptions.
Excellent Hank idea: Quick Start (TL;DR) section to avoid all the gory details
Dang, but this damn thing is hard to read. Will that get better when it gets converted to the PHP wrapper? I have my doubts. May need a some quick George love to give tips for better readability.
All the other 'foo' (read: green) bits below

Internet Archive's S3-like Server API

Last Updated: $Date: 2011-10-06 +0000 (Thu, 06 Oct 2011) $

NOTA BENE
Introduction
What the IAS3 API Allows You To Do
How IAS3 Differs From Amazon S3
System Requirements
Using S3 Clients to Access IAS3
Passing Authorization Credentials to IAS3
Commonly Used Amazon S3 Headers

x-amz-auto-make-bucket


Internet Archive-specific IAS3 Headers

x-archive-cascade-delete
x-archive-ignore-preexisting-bucket
x-archive-keep-old-version
x-archive-meta-*
x-archive-queue-derive
x-archive-size-hint


IAS3 Identifiers
Settings Metadata Values via Headers

Standard Internet Archive Metadata Fields

hidden
identifier
title
creator
mediatype
collection
description
date
subject
licenseurl
pick
noindex
publicdate
addeddate
adder
uploader
updater
updatedate
notes
rights
contributor
publisher
language
coverage
credits


Custom Metadata Fields
Repeating Metadata Fields


Setting Metadata Values via Files

IDENTIFIER_marc.xml
IDENTIFIER_meta.mrc
How These Metadata Files Are Processed


Special Files

IDENTIFIER_meta.xml
IDENTIFIER_files.xml
IDENTIFIER_rules.conf

Specific File Formats
Only 'lossy' File Formats
All Derivatives


Troubleshooting

Viewing a log of your IAS3 object
My file isn't appearing in the item.
Is there sandbox I can use for testing IAS3?
What happens to my item/file after uploading?
Is there any way to control how files derive?


Downloading via IAS3
Code Examples

curl

Text item (a PDF will be OCR'd)
Movie item (Will get video player on details page)
Uploading a file to an existing item
Destroy and respecify the metadata for an item
A Movie example with subject keywords, and creative commons license


Perl

An extract of a script for uploading multiple files via IAS3 using LWP


Other Languages


Support
Appendices

Terminology
Internet Archive's Item Structure (in brief)
IAS3 HTTP Return Codes
Error Messages
Default Metadata Values
Example IDENTIFIER_marc.xml file
Example IDENTIFIER_meta.mrc file


NOTA BENE

This document is very, very much a work in progress. It's so in-progress it's
not even a first draft. Please do not assume this document is definitive until
it gets committed and pushed live to archive.org (wrapped in appropriate
lookfeel, etc.). Until then, feel free to reference this but the official IAS3
documentation can still be found at
http://archive.org/help/abouts3.txt.
Introduction

This document covers the technical details of using Internet Archive's S3-like
server API, aka "IAS3." The intended audience is a technical user, ideally one
who is comfortable in the Linux/UNIX command line environment.
IAS3 is an API based upon
Amazon's Simple Storage Service (aka
S3). Whereas Amazon's
S3 API allows you to store items in the Amazon S3 cloud storage service, the
IAS3 API allows you to create items on and upload data to Internet
Archive.
Because of its similarities to Amazon's S3, please familiarize yourself
with the Amazon S3
documentation before using Internet Archive's IAS3.
What the IAS3 API Allows You To Do

foo: Check with Sam re: the examples marked below; they aren't working as
expected. Also: don't like the section title.
In Internet Archive terminology, an item maps directly onto the Amazon S3
concept of a [bucket](http://docs.amazonwebservices.com/AmazonS3/latest/dev/In
troduction.html). IAS3 allows you to create items nee buckets, populate them
with files and maintain the metadata for the item. You can also use IAS3 to
control certain elements of file processing behavior. Internet Archive
currently does not support file-level metadata.
Because Internet Archive items are analogous to Amazon S3 buckets they can be
accessed using similar URL addresses. Items are typically accessed on Internet
Archive using the IA-specific details/IDENTIFIER format. For instance:
http://www.archive.org/details/Sita_Sings_the_Blues
The link above will present the details page for the item on Internet
Archive.
This same item is also available in an S3-like format of:
http://s3.us.archive.org/Sita_Sings_the_Blues
Or:
http://Sita_Sings_the_Blues.s3.us.archive.org/
These URLs will return XML containing information about the item.
Each file contained in an item can similarly be used as an S3-like key in a
URL:
http://Sita_Sings_the_Blues.s3.us.archive.org/Sita_Sings_the_Blues_small.mp4
Performing a PUT on the Internet Archive equivalent to an S3 endpoint will
result in the creation of a new item in Internet Archive. Files may be added
to the item in the same manner. Both of these operations may be combined in a
single PUT command. For example, using curl:
curl --location --header 'x-amz-auto-make-bucket:1' \
--header 'x-archive-meta01-collection:opensource' \

--header 'x-archive-meta-mediatype:texts' \

--header 'x-archive-meta-sponsor:Andrew W. Mellon Foundation' \

--header 'x-archive-meta-language:eng' \

--header "authorization: LOW $accesskey:$secret" \

--upload-file /home/samuel/public_html/intro-to-k.pdf \

http://s3.us.archive.org/sam-s3-test-08/demo-intro-to-k.pdf
How IAS3 Differs From Amazon S3

IAS3 differs from Amazon's S3 API in several significant ways:

IAS3 does not allow DELETE for buckets, only for files. Attempting to DELETE a bucket will result in a Not Authorized error.
IAS3 supports the HTTP 1.1 REST interface for S3 but not the SOAP interface.
IAS3 is much more likely to issue HTTP 307 Location redirects than Amazon S3, therefore it is advised that you use an S3-compatible client with good HTTP 100 Continue support (for example, curl version 7.19 and higher).
Amazon S3 allows users to set ACLs for buckets and objects. IAS3 does not. Instead, all items are created with ACLs of world readable and item uploader writable.
Amazon S3's POST and COPY are not implemented in IAS3.
IAS3 ignores HTTP 1.1 Range headers.

IAS3 also supports several of its own headers. These are discussed in more
detail below.
System Requirements

In order to use IAS3 to upload to Internet Archive, you must have:

An internet connection
An Internet Archive patron account
API keys for IAS3
Client code which supports the Amazon S3 API. Most examples in this document use curl due to simplicity. If you choose to use curl or libcurl to interface with IAS3 please be sure you are using version 7.19 or highter. These versions have excellent HTTP 100 Continue support.

Using S3 Clients to Access IAS3

Internet Archive strives to make IAS3 compatible with current Amazon S3 client
code. Ideally running the following command--replacing amazonaws.com with
us.archive.org--on your S3 client code would allow you to use IAS3 with no
further changes necessary:
perl -pi -e 's/amazonaws.com/us.archive.org/g' *
Some Amazon S3 clients obey configuration files, many of will will allow you
to define the preferred S3 hostname. Setting this hostname to
s3.us.archive.org in the configuration file should allow the client code to
upload to Internet Archive with no further changes.
For instance, adding the following to your ~/.s3cfg configuration file for
s3cmd, a popular Amazon S3 client, will allow you
to connect to IAS3:
[default]
access_key = YOUR-ACCESS-KEY
secret_key = YOUR-SECRET-KEY
host_base = s3.us.archive.org
host_bucket = %(bucket)s.s3.us.archive.org
Passing Authorization Credentials to IAS3

Authorization credentials may be
passed to IAS3 by your Amazon S3-compatible client via configuration file (see
above). In addition there is a clear text password mode. To use this mode,
pass your access and secret keys as values to the Authorization header:
Authorization: LOW $accesskey:$secret
This is the authorization method shown in most of the examples in this
document.
Commonly Used Amazon S3 Headers

foo: are there any more of these? only the one ever appears in the examples
Most Amazon S3 headers can also be used with IAS3. This section briefly
discusses the most commonly used Amazon S3 headers.
x-amz-auto-make-bucket

The x-archive-auto-make-bucket header allows you to both create an item and
upload directly to it with a single command.
To enable this option, pass the x-archive-auto-make-bucket header with a value
of 1. If you do not specify this value you must create an item before you
attempt to upload to it. The default value for this header is 0.
This header only works when PUTting to IAS3.
Internet Archive-specific IAS3 Headers

foo: I really don't like the formatting here. Maybe add a standard table to
each header, listing where it can be used (PUT/GET/DELETE, etc.), valid
values, default value?
Internet Archive has implemented specialized headers for controlling certain
operations upon objects and files via IAS3.
x-archive-cascade-delete

Normal DELETE operation is to remove only the specified file. The x-archive-
cascade-delete header allows you to delete not only a file but also all
derivative and original files associated with it. The Internet Archive
derivatives help page provides
additional information about the files which may be deleted in this operation.
To enable this option, pass the x-archive-cascade-delete header with a value
of 1. The default value for this header is 0.
This header only works when DELETING a file within an item. Nota bene:
DELETE is not allowed for items (buckets) in IAS3. You may only DELETE a file
and its derivatives.
x-archive-ignore-preexisting-bucket

A normal PUT operation including x-archive-meta-* headers will overwrite an
existing IDENTIFIER_meta.xml file. The x-archive-ignore-preexisting-bucket
header will instead overwrite the existing IDENTIFIER_meta.xml file with the x
-archive-meta-*- header values passed in the same PUT command.
To enable this option, pass the x-archive-ignore-preexisting-bucket header
with a value of 1. The default value for this header is 0.
This header only works when PUTting to IAS3.
x-archive-keep-old-version

Normal PUT operation will overwrite a file when it is used to upload a file of
the same name. A normal DELETE operation will remove the specified file. The x
-archive-keep-old-version header will rename the specified file, prepending
the filename with .~~ before proceding with the PUT or DELETE operation.
To enable this option, pass the x-archive-keep-old-version header with a value
of 1. The default value for this header is 0.
Caution! This header is experimental. Its use could result in unexpected results if interleaved with PUTs which do not use this header.
This header works for both PUT and DELETE for IAS3.
x-archive-meta-*

The x-archive-meta-* header is used for setting metadata values for an item.
This header is discussed in detail below.
x-archive-queue-derive

Normal operation after a file has been PUT into an item is to queue it for
derivation to other file
formats. When PUTting either a
very large file or a large number of files can bog down the derivation process
and slow system performance. In these instances it is preferable to disable
automatically derive queueing.
Please note: Files may be queued for derivation following upload. To queue an
individual file, navigate to the item detail page on Internet Archive and
click the Edit Item! link at the top. If you have several files which need
to be queued, [contact Internet
Archive](mailto:info@archive.org?subject=[Queue for Derive]) for assistance.
To disable automated creation of derivative files, pass the x-archive-queue-
derive header with a value of 0. The default value for this header is 1.
This header works only when PUTting to IAS3.
x-archive-size-hint

If the total size of files in your item will exceed 10 gigabytes, Internet
Archive recommends you declare the size at the time of bucket creation. This
allows the Internet Archive catalog to more easily place the item for storage,
facilitating a potential speed boost to the upload.
To enable this option, pass the x-archive-size-hint header with a value of the
file size in bytes. If this header is not defined IAS3 will attempt to
default to the value in the content-length header.
This header works only when PUTting to IAS3.
IAS3 Identifiers

Each item at Internet Archive has a identifier. An identifier is composed of
any unique combination of alphanumeric characters, underscore (_) and dash
(-). While there are no official limits it is strongly suggested that they be
between 5 and 80 characters in length.
Identifiers must be unique across the entirety of Internet Archive, not simply
unique within a single collection.
Once defined an identifier can not be changed. It will travel with the
item or object and is involved in every manner of accessing or referring to
the item.
In IAS3, identifiers are defined implicitly in the target URL. For example:
curl --location --header 'x-amz-auto-make-bucket:1' \
--header "Authorization: LOW $accesskey:$secret" \

--header "x-archive-meta-collection:test_collection" \

--upload-file /Users/archive/Desktop/The_Open_Source_Way_03.pdf \

http://s3.us.archive.org/**vmb_tosw_trial_upload_03**/The_Open_Source_Way_03.p
df
The identifier in this command is vmb_tosw_trial_upload_03. The item may be
viewed at its details page. The details page for any item is simply
http://archive.org/details/ followed by the identifier. The details page for
this example is:
http://archive.org/details/vmb_tosw_trial_upload_03
Settings Metadata Values via Headers

The x-archive-meta-* header is used to set metadata values for items. At this
time Internet Archive does not support file-level metadata. Metadata may only
be defined at an item level.
All metadata fields are defined as key-value pairs passed via headers. The
header format is:
x-archive-meta-FIELDNAME:FIELDVALUE
For instance, if you are using curl you may set a value for the title metadata
field using this header:
--header "x-archive-meta-title:John Muir on Hetch Hetchy" \
Alternatively, you may use the Amazon S3 standard x-amz-meta-
FIELDNAME:FIELDVALUE header for setting metadata.
Metadata headers are sorted prior to processing. This sorting includes the
x-amz- or x-archive- header prefixes, therefore if you use both of these
prefixes when setting metadata values the fields set with x-amz- will be
processed first and may cause unexpected behavior. To avoid potential problems
it is advised that you use either the x-archive- or the x-amz- header prefix
when setting metadata, not both.
All metadata header values are interpreted as UTF-8 encoded characters.
Standard Internet Archive Metadata Fields

There are several standard metadata fields recognized for Internet Archive
items. All metadata fields except identifier are optional.
foo: alphabetize these
foo: standardize wording; it's all over the place
foo: field or tag? Pick a term and stick with it
hidden

foo: what's this do? It's admin/owner-only and doesn't appear on editxml.php
identifier

Each item at Internet Archive has a identifier. An identifier is composed of
any unique combination of alphanumeric characters, underscore (_) and dash
(-). While there are no official limits it is strongly suggested that they be
between 5 and 80 characters in length.
An identifier can not be defined via metadata header. Instead identifiers
are defined implicitly in the target URL. Please see IAS Identifiers above for
additional information.
title

The title for the item. This appears in the header of the item's detail page
on Internet Archive.
If a value is not specified for this field it will default to the identifier
for the item.
creator

An entity primarily responsible for creating the files contained in the item.
mediatype

The primary type of media contained in the item. While an item can contain
files of diverse mediatypes the value in this field defines the appearance and
functionality of the item's detail page on Internet Archive. In particular,
the mediatype of an item defines what sort of online viewer is available for
the files contained in the item.
The mediatype metadata field recognizes a limited set of values:


audio

The majority of audio items should receive this mediatype value. Items for the
Live Music Archive should instead use
the etree value.


data

This is the default value for mediatype. Items with a mediatype of data will
be available in Internet Archive but you will not be able to browse to them.
In addition there will be no online reader/player for the files.


etree

Items which contain files for the Live Music
Archive should have a mediatype value
of etree. The Live Music Archive has very specific upload requirements. Please
consult the
documentation for
the Live Music Archive prior to creating items for it.


image

Items which predominantly consist of image files should receive a mediatype
value of image. Currently these items will not available for browsing or
online viewing in Internet Archive but they will require no additional changes
when this mediatype receives additional support in the Archive.


movies

All videos (television, features, shorts, etc.) should receive a mediatype
value of movies. These items will be displayed with an online video player.


software

Items with a mediatype of software are accessible to browse via Internet
Archive's software collection.
There is no online viewer for software but all files are available for
download.


texts

Items with a mediatype of texts will appear with the online
bookreader. Internet Archive will
also attempt to OCR files in these items.


web

The web mediatype value is reserved for items which contain web archive
WARC files.


If the mediatype value you set is not in the list above it will be saved but
ignored by the system.
This field may be modified only by an administrator or the owner of the item.
If a value is not specified for this field it will default to data.
collection

A collection is a specialized item used for curation and aggregation of other
items. Assigning an item to a collection defines where the item may be located
by a user browsing Internet Archive. To assign an item to a collection, pass
its identifier as the value for an x-archive-metadata-collection header. For
example, if you are using curl you can assign an item to the Community Texts
collection (identifier: opensource) with the following header:
--header 'x-archive-metadata-collection:opensource' \
A collection must exist prior to assigning any items to it. Currently
collections can only be created by Internet Archive staff members. Please
[contact Internet Archive](mailto:info@archive.org?subject=[Collection
Creation Request]) if you need a collection created.
description

A description of the item.
The value of this metadata field may contain HTML. <script> tags and CSS are
not allowed.
date

The publication, production or other similar date of this item. Please use an
ISO 8601 compatible format for this
date. For instance, these are all valid date formats:

YYYY
YYYY-MM-DD
YYYY-MM-DD HH:MM:SS

subject

Keyword(s) or phrase(s) that may be searched for to find your item. Separate
each keyword or phrase with a semicolon (";") character. It is helpful but
not necessary for you to use Library of Congress Subject
Headings for the value of this
metadata header.
licenseurl

A URL to the license which covers the works contained in the item.
Internet Archive recommends (but does not require) Creative
Commons licensing. Creative Commons provides a
[license selector](http://creativecommons.org/choose/?partner=ia&exit_url=http
%3A%2F%2Fwww.archive.org%2Fservices%2Flicense-chooser.php%3Flicense_url%3D%5Bl
icense_url%5D%26license_name%3D%5Blicense_name%5D%26license_image%3D%5Blicense
_button%5D%26deed_url%3D%5Bdeed_url%5D&jurisdiction_choose=1) for finding the
correct license for your needs.
pick

Each collection page on Internet Archive may include a "Staff Picks" section.
This section will highlight a single item in the collection. This item will be
selected at random from the items with a pick metadata value of 1. If there
are no items with this pick metadata value the "Staff Picks" section will not
appear on the collection page.
This field may be modified only by an administrator or the owner of the item.
By default all new items have no pick metadata value.
noindex

All items will have their metadata included in the Internet Archive search
engine. To disable indexing in the search engine, include a noindex metadata
tag. The value of the tag does not matter. Its presense is enough to trigger
not including the metadata in the search engine.
If an item's metadata has already been indexed in the search engine, setting
noindex will remove it from the index.
Items whose metadata is not not included in the search engine index are not
considered "public" per se and therefore will not have a value in the
publicdate metadata field (see below).
publicdate

foo: date format accepted?
Items which have had their metadata included in the Internet Archive search
engine index are considered to be public. The date the metadata is added to
the index is the public date for the item.
This field may be modified only by an administrator or the owner of the item.
While it is possible to set the publicdate metadata value it is not
recommended. This value is typically set by automated processes.
addeddate

foo: date format accepted?
The addeddate metadata tag contains the date the item was added to Internet
Archive.
While it is possible to set the addeddate metadata value it is not
recommended. This value is typically set by automated processes.
adder

foo: pretty sure this value is the username, not the screen name. Screen name
is only in the display.
The screen name of the account which added the item to the Internet Archive.
While is is possible to set the adder metadata value it is not recommended.
This value is typically set by automated processes.
uploader

The Internet Archive username of the account which uploaded the file(s) to the
item.
While it is possible to set the uploader metadata value it is not recommended.
This value is typically set by automated processes.
updater

The Internet Archive username of the account which updated the item. This
field is repeatable.
This field may be modified only by an administrator or the owner of the item.
While it is possible to set the updater metadata value it is not recommended.
This value is typically set by automated processes.
updatedate

foo: date format?
The date on which an update was made to the item. This field is repeatable.
This field may be modified only by an administrator or the owner of the item.
While it is possible to set the updatedate metadata value it is not
recommended. This value is typically set by automated processes.
notes

The notes metadata field can contain any information about the item.
The value of this metadata field may contain HTML. <script> tags and CSS are
not allowed.
rights

The value of the rights metadata field should be a statement of the rights
held in and over the item.
The value of this metadata field may contain HTML. <script> tags and CSS are
not allowed.
contributor

The value of the contributor metadata field is information about the entity
responsible for making contributions to the content of the item. This is often
the library, organization or individual making the item available on Internet
Archive.
The value of this metadata field may contain HTML. <script> tags and CSS are
not allowed.
publisher

The publisher of the material available in the item.
language

The primary language of the material available in the item.
While the value of the language metadata field can be any value, Internet
Archive prefers they be MARC21 Language
Codes.
coverage

The extent or scope of the content of the material available in the item. The
value of the coverage metadata field may include geographic place, temporal
period, jurisdiction, etc. For items which contain multi-volume or serial
content, place the statement of holdings in this metadata field.
credits

If known, enter the participants in the production of the materials contained
in the item in the credits metadata field.
The value of this metadata field may contain HTML. <script> tags and CSS are
not allowed.
Custom Metadata Fields

Internet Archive strives to be metadata agnostic, enabling users to define the
metadata format which best suits the needs of their material. In addition to
the standard metadata fields listed above you may also define as many custom
metadata fields as you require. These metadata fields can be defined ad hoc at
item creation or metadata editing time and do not have to be defined in
advance. For instance, if your organization uses the
PBCORE metadata schema you
can include the appropriate metadata fields in your Internet Archive item:
x-archive-meta-pbcoreGenre:Educational
x-archive-meta-pbcoreCoverage:Long Beach, CA
x-archive-meta-pbcoreCoverageType:Spatial
etc.
PLEASE NOTE! RFC 822 disallows the underscore character () in HTTP header names. Therefore to use an underscore in the name of a custom metadata field you must replace the underscore () with two hyphens (--). These will be translated into an underscore character when the metadata is processed by the server. For example:
x-archive-meta-isbn--10:080652510X
This example will generate a metadata field named isbn_10.
Repeating Metadata Fields

Certain metadata fields such as collection and subject can be repeated. To
repeat a metadata header you must sequentially number each instance of the
header in your command:
x-archive-meta01-$meta_name:$meta_value_a
x-archive-meta02-$meta_name:$meta_value_b
foo: Need a better example? vaguely recall collections don't need the number
but go in order of appearance in the command...?
For example, if an item contains both PDF and mp3 files you may assign it to
both the texts and opensource_audio collections by including the following two
lines in a curl command:
--header 'x-archive-meta01-collection:texts' \

--header 'x-archive-meta02-collection:opensource_audio' \
Setting Metadata Values via Files

While the preferred and recommended method for setting Internet Archive item
metadata is via headers, it is possible to provide files containing metadata.
If you choose to provide a metadata file instead of using headers it is
strongly recommended that the metadata file be the first uploaded during item
creation.
When providing a metadata file, please provide only one file per item. It is
not necessary to provide a metadata file in each format. Additional files will
be generated automatically from the one which you provide.
The valid metadata file formats:
IDENTIFIER_marc.xml

This file must contain metadata in well-formed
MARCXML format. It must be named
appropriately or it will not be recognized. The proper naming scheme is the
items identifier followed by _marc.xml.
An example IDENTIFIER_marc.xml file can be found in the Appendix.
IDENTIFIER_meta.mrc

This file must contain metadata in binary MARC format according to the [ISO 27
09](http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csn
umber=41319) standard. It must be named appropriately or it will not be
recognized. The proper naming scheme is the items identifier followed by
_meta.mrc.
An example IDENTIFIER_meta.mrc file can be found in the Appendix.
How These Metadata Files Are Processed

If an IDENTIFIER_meta.mrc file is located it is used to generate an
IDENTIFIER_marc.xml file. Any existing IDENTIFIER_marc.xml file will be
overwritten by this operation.
The IDENTIFIER_marc.xml file is used to generate a IDENTIFIER_dc.xml file of
Dublin Core metadata. Any existing IDENTIFIER_dc.xml
file will be overwritten by this operation. The Dublin Core fields are
extracted and populated according to the Library of Congress MARC21 to Dublin
Core XSL
stylesheet.
In addition, Internet Archive will extract information from the following
MARCXML fields:

041 subfield a and 130 subfield 1 are searched for language codes. If multiple language codes are located they are all added to the item's metadata.
260 subfield c is searched for an item date.
010 subfield a is searched for the LCCN.

The item's definitive metadata file, IDENTIFIER_meta.xml is generated from the
IDENTIFIER_dc.xml Dublin Core file.
Special Files

Each Internet Archive item is comprised of several files. Many of these files
are automatically generated and should not be either removed or modified.
IDENTIFIER_meta.xml

IDENTIFIER_meta.xml is the definitive metadata file for the item. It is
automatically generated at item creation time using the metadata provided
either via headers or via files.
Please do not delete or modify this file. If you must modify the item's
metadata, please either use the "Edit Item!" link at the top of its detail
page or submit updated metadata via the IAS3 API. See the x-archive-ignore-
preexisting-bucket header for additional information
about updating an item's metadata via IAS3.
IDENTIFIER_files.xml

IDENTIFIER_files.xml is an auto-generated file cataloging all of the files
contained in the item. In addition to the filenames the IDENTIFIER_files.xml
file will also list the file format and various hashes for each file. If the
file is a derivative
IDENTIFIER_files.xml will list the original file from which it was derived.
For example, here is an extract from the japanesefairytal00ozak_files.xml file
for Japanese Fairy
Tales:

&nbsp_place_holder;&nbsp_place_holder;EPUB
&nbsp_place_holder;&nbsp_place_holder;japanesefairytal00ozak_abbyy.g
z
&nbsp_place_holder;&nbsp_place_holder;1294020233
&nbsp_place_holder;&nbsp_place_holder;1230045
&nbsp_place_holder;&nbsp_place_holder;1d87b3e04ca0b617e041bbcb0cd7f1a5</m
d5>
&nbsp_place_holder;&nbsp_place_holder;7326f3ce
&nbsp_place_holder;&nbsp_place_holder;5434df04b1b811b03e7d9a32bde3119d3c
a924c8

Please do not delete or modify this file.
IDENTIFIER_rules.conf

Whereas the x-archive-queue-derive header enables or disables queuing files
for deriving for the entire
item, it it possible to disable the creation of certain derive files using the
IDENTIFIER_rules.conf file. There are three options for selecting which
derivative formats to disable via IDENTIFIER_rules.conf:
Specific File Formats

You may disable creation of specific derivative formats by listing
them#&8212;one format per line--in the IDENTIFIER_rules.conf file. The valid
values for file formats can be found in the header rows of the derivatives
chart.
For example, to disable creation of h.264 and Ogg Video derivative files, your
IDENTIFIER_rules.conf file should contain the following:
h.264
Ogg Video
Only 'lossy' File Formats

Some derivative file formats (mp3, ogg, ogv, mp4, webm) are considered "lossy"
because they use a compression algorithm which produces a file which is not
identical to the original. To prevent the creation of lossy derivatives,
upload a IDENTIFIER_rules.conf file containing this line:
CAT.lossy
All Derivatives

It is possible to disable creation of all derivative files using the
IDENTIFIER_rules.conf file. This is equivalent to setting the x-archive-queue-
derive header to a value of 0.
To disable all derivatives, upload a IDENTIFIER_rules.conf file containing
this line:
CAT.all
A new or modified IDENTIFIER_rules.conf file will not be recognized until a
new derive process is initiate for the item. It is currently not possible to
initiate this process via the IAS3 API. To initiate the derive process in the
Internet Archive interface:

Login to Internet Archive
Navigate to the item's page on Internet Archive
Find the "Edit Item!" link in the upper right of the item
Click the "change the information" link
Click the "Item Manager" link near the top of the page
Click the "derive" button

If now-excluded formats had previously been derived, initiating a derive
process will remove the files from the item.
Troubleshooting

Viewing a log of your IAS3 object

Each file uploaded to Internet Archive via IAS3 will have a log file. To view
the log, append ?log to the URL of the endpoint. For example:
http://s3.us.archive.org/sam-s3-test-08/demo-intro-to-k.pdf?log
Please note: The log format may change at any time.
My file isn't appearing in the item.

When a file is added to an item it is staged in temporary storage and ingested
via the Archive's content management system. While this usually happens very
quickly, during periods of heavy system load this process can take a few
minutes.
It is also possible that you are viewing a cached version of the item's detail
page. Please either clear your web browser's cache or append this parameter to
the item's URL:
reCache=1
Is there sandbox I can use for testing IAS3?

Internet Archive provides a collection where you can test your item creation
and uploads. Items assigned to this collection are removed from the Archive
once every thirty days or so. To use this collection, assign your test items
to it using this header:
x-archive-meta-collection:test_collection
Please remember that item identifiers must be unique across the entire
Archive, including for items in the test collection. Your test scripts may
need to be modified to avoid identifier collision once you start creating and
uploading to non-test items.
What happens to my item/file after uploading?

Several processes will operate on your item after it has been created and
after each file is added to it. These processes include archiving the content,
deriving new files from your originals and backing up the item and its
contents. You may view the progress of any of these processes on the item's
catalog page:
http://www.archive.org/catalog.php?history=1&identifier=IDENTIFIER
Clicking the task_id for a process will display a detailed log for it.
You may also reach this page by clicking the 'Item History' link on the item's
detail page on Internet Archive.
Is there any way to control how files derive?

You may use either the IDENTIFIER_rules.conf file or the x-archive-queue-
derive header to control the creation of derivative
files from your originals.
Downloading via IAS3

While the IAS3 API supports both GET and HEAD methods for retrieving files,
higher performance can be achieved via the Internet Archive web architecture.
Each file in an Internet Archive item can be retrieved via a /download/ link:
http://archive.org/download/IDENTIFIER/FILENAME.EXT
This is the recommended method for downloading files from Internet Archive.
Code Examples

curl

Text item (a PDF will be OCR'd):

curl --location --header 'x-amz-auto-make-bucket:1' \
--header 'x-archive-meta01-collection:opensource' \

--header 'x-archive-meta-mediatype:texts' \

--header 'x-archive-meta-sponsor:Andrew W. Mellon Foundation' \

--header 'x-archive-meta-language:eng' \

--header "authorization: LOW $accesskey:$secret" \

--upload-file /home/samuel/public_html/intro-to-k.pdf \

http://s3.us.archive.org/sam-s3-test-08/demo-intro-to-k.pdf
Movie item (Will get video player on details page):

curl --location --header 'x-amz-auto-make-bucket:1' \
--header 'x-archive-meta01-collection:opensource_movies' \

--header 'x-archive-meta-mediatype:movies' \

--header 'x-archive-meta-title:Ben plays piano.' \

--header "authorization: LOW $accesskey:$secret" \

--upload-file ben-2009-05-09.avi \

http://s3.us.archive.org/ben-plays-piano/ben-plays-piano.avi
Uploading a file to an existing item:

curl --location \
--header "authorization: LOW $accesskey:$secret" \

--upload-file /home/samuel/public_html/intro-to-k.pdf \

http://s3.us.archive.org/sam-s3-test-08/demo-intro-to-k.pdf
Destroy and respecify the metadata for an item:

curl --location \
--header 'x-archive-ignore-preexisting-bucket:1' \

--header 'x-archive-meta01-collection:opensource' \

--header 'x-archive-meta-mediatype:texts' \

--header 'x-archive-meta-title:Fancy new title' \

--header "authorization: LOW $accesskey:$secret" \

--upload-file /dev/null \

http://s3.us.archive.org/sam-s3-test-08
A Movie example with subject keywords, and creative commons license:

curl --location --header 'x-archive-ignore-preexisting-bucket:1' \
--header "authorization: LOW $accesskey:$secret" \

--header 'x-archive-meta-mediatype:movies' \

--header 'x-archive-meta-collection:opensource_movies' \

--header 'x-archive-meta-title:electricsheep-flock-244' \

--header 'x-archive-meta-creator:Scott Draves and the Electric Sheep' \

--header 'x-archive-meta-description:Archive of flock 244 of the Electric Sheep, see http://electricsheep.org and  http://scottdraves.com' \

--header 'x-archive-meta-date:2009' \

--header 'x-archive-meta-year:2009' \

--header 'x-archive-meta-subject:electricsheep,alife,art,draves,spotworks,evolution,algorithm' \

--header 'x-archive-meta-licenseurl:http://creativecommons.org/licenses/by-nc/3.0/us/' \

--upload-file /dev/null \

http://s3.us.archive.org/electricsheep-flock-244
Perl

An extract of a script for uploading multiple files via IAS3 using LWP

my $ua = LWP::UserAgent->new();
$ua->agent('upload_via_IAS3/' . VERSION);
$ua->timeout(20);
$ua->env_proxy;
$ua->default_headers->push_header('authorization'=>"LOW $ias3keys");
start actual upload tasks, doing some optimization.

- items with no file to upload are not created

- item creation is always combined with the first file upload

my @uploadQueue = @{$task->{files}};
while (@uploadQueue) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;my
$file = shift @uploadQueue;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;my
$uripath = "/" . $file->{item}{name} . "/" . $file->{filename};
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;wa
rn "File: ", $file->{file}, " -> ", $uripath, "\n";
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;if
(!$forceupload && $file->{uploaded}) {
&nbsp_place_holder;&nbsp_place_holder;my $last = $file->{uploaded};
&nbsp_place_holder;&nbsp_place_holder;# this file was uploaded in previous
run. re-upload it only when
&nbsp_place_holder;&nbsp_place_holder;# something has changed.
&nbsp_place_holder;&nbsp_place_holder;if ($file->{mtime} <= $last->{mtime} &&
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;$file->{item}{name} eq $last->{itemName}
&&
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;$file->{filename} eq $last->{filename}) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;warn "skipping - no change since last
upload\n";
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;next;
&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;if
($checkstore) {
&nbsp_place_holder;&nbsp_place_holder;my $dlurl = IADLURLBASE . $uripath;
&nbsp_place_holder;&nbsp_place_holder;print STDERR "checking ", $dlurl,
"...\n" if $verbose;
&nbsp_place_holder;&nbsp_place_holder;my $res = $ua->head($dlurl);
&nbsp_place_holder;&nbsp_place_holder;if ($res->is_success) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;# file exists - check date (of last
upload) against file's mtime
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;my $m = $res->headers->{'date'};
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;if ($m && str2time($m) >= $file->{mtime})
{
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;wa
rn "skipping - upload date later than file's mtime\n";
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;ne
xt;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;} else {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;# 404 or other failure - upload the file
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;print $res->status_line, "\n";
&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;my
$waitUntil = $file->{waitUntil};
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;if
(defined $waitUntil) {
&nbsp_place_holder;&nbsp_place_holder;my $sec = $waitUntil - time();
&nbsp_place_holder;&nbsp_place_holder;while ($sec > 0) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;print STDERR "holding off $sec second",
($sec > 1 ? 's' : ''), "... ";
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;sleep(1);
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;$sec--;
&nbsp_place_holder;&nbsp_place_holder;} continue { print STDERR "\r"; }
&nbsp_place_holder;&nbsp_place_holder;print STDERR "\n";
&nbsp_place_holder;&nbsp_place_holder;delete $file->{waitUntil};
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
ok, ready to go
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;my
$item = $file->{item};
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;my
@headers = ();
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
prepare item metadata if the item hasn't been created yet (in this
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
session) - it might exist on the server.
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;un
less ($item->{created}) {
&nbsp_place_holder;&nbsp_place_holder;my $metadata = $item->{metadata};
&nbsp_place_holder;&nbsp_place_holder;# prepare actual HTTP headers for
metadata
&nbsp_place_holder;&nbsp_place_holder;push(@headers, 'x-amz-auto-make-bucket',
1);
&nbsp_place_holder;&nbsp_place_holder;# As metadata (most often 'collection'
and 'subject') may have multiple
&nbsp_place_holder;&nbsp_place_holder;# values, %metadata has an array for
each metadata name (in come case,
&nbsp_place_holder;&nbsp_place_holder;# notably 'title', may be a scalar). If
there in fact multiple values,
&nbsp_place_holder;&nbsp_place_holder;# we use metadata header in indexed
form. If there's only one value
&nbsp_place_holder;&nbsp_place_holder;# (either in an array or as a scalar),
we use basic form. Special metadata
&nbsp_place_holder;&nbsp_place_holder;# 'collection' is also handled by this
same logic.
&nbsp_place_holder;&nbsp_place_holder;while (my ($h, $v) = each %$metadata) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;push(@headers, metadataHeaders($h, $v));
&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;# add metadata headers for collections
item gets associated with
&nbsp_place_holder;&nbsp_place_holder;my @collectionNames = map($_->{name},
@{$item->{collections}});
&nbsp_place_holder;&nbsp_place_holder;push(@headers,
metadataHeaders('collection', @collectionNames));
&nbsp_place_holder;&nbsp_place_holder;# overwrite existing bucket unless user
explicitly told not to.
&nbsp_place_holder;&nbsp_place_holder;unless ($keepExistingMetadata) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;push(@headers, 'x-archive-ignore-
preexisting-bucket', '1');
&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;# size-hint
&nbsp_place_holder;&nbsp_place_holder;if ($item->{size}) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;push(@headers, 'x-archive-size-hint',
$item->{size});
&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
no-derive flag should go with all files
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;if
($noDerive) {
&nbsp_place_holder;&nbsp_place_holder;push(@headers, 'x-archive-queue-derive',
'0');
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
Expect header
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;pu
sh(@headers, 'expect', '100-continue');
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;my
$uri = IAS3URLBASE . $uripath;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;my
$content = $file->{path};
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;if
($verbose) {
&nbsp_place_holder;&nbsp_place_holder;print STDERR "PUT $uri\n";
&nbsp_place_holder;&nbsp_place_holder;for (my $i = 0; $i &lt; $#headers; $i += 2)
{
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;print STDERR $headers[$i], ":",
$headers[$i + 1], "\n";
&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;if
($dryrun) {
&nbsp_place_holder;&nbsp_place_holder;print STDERR "## dry-run; not making
actual request\n";
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
else {
&nbsp_place_holder;&nbsp_place_holder;# use of custom PUT_FILE is for
efficient handling of large files.
&nbsp_place_holder;&nbsp_place_holder;# see comment on PUT_FILE above.
&nbsp_place_holder;&nbsp_place_holder;my $req = PUT_FILE $uri, $content,
@headers;
&nbsp_place_holder;&nbsp_place_holder;#print STDERR $req->as_string;
&nbsp_place_holder;&nbsp_place_holder;my $res = $ua->request($req);
&nbsp_place_holder;&nbsp_place_holder;print STDERR "\n";
&nbsp_place_holder;&nbsp_place_holder;if ($res->is_success) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;print $res->status_line, "\n";
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;$res->headers->scan(sub { print "$[0]:
$[1]\n"; }) if $verbose;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;print $res->content, "\n" if $verbose;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;print "\n";
&nbsp_place_holder;&nbsp_place_holder;} else {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;print $res->status_line, "\n",
$res->content, "\n\n";
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;if ($res->code == 503) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
Service Unavailable - asking to slow down
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;$f
ile->{waitUntil} = time() + 120;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
put it at the head so that it blocks transfer
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;un
shift(@uploadQueue, $file);
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;} elsif (++$file->{failCount} < 5) {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;$f
ile->{waitUntil} = time() + 120;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;pu
sh(@uploadQueue, $file);
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;} else {
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;#
give up
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&n
bsp_place_holder;&nbsp_place_holder;next;
&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;}
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;$i
tem->{created} = 1;
}
Other Languages

Examples for additional languages are still pending. If you have any you would
like to provide, please [contact Internet
Archive](mailto:info@archive.org?subject=[IAS3 Code Example]).
Support

For assistance with the IAS3 API, please [contact Internet
Archive](mailto:info@archive.org?subject=[IAS3 Help]). Please include the
string "IAS3 Help" somewhere in the subject line.
Appendices

Terminology

Bucket 'Bucket' is the Amazon S3 term for a container for your files. For
the IAS3 API a bucket is equivalent to an Internet Archive item.
Collection A collection is a specialized Internet Archive item used for
aggregating related collections and items. An item or collection is assigned
to a collection via the x-archive-meta-collection metadata
header.
Derivative A derivative is a file which Internet Archive will automatically
generate from the original file which you provide. Derivatives enable as many
people as possible to access the file while also protecting against file
format obsolescence. Please refer to this
chart to see which derivative
formats Internet Archive will produce.
Identifier Each item at Internet Archive has a identifier. An identifier is
composed of any unique combination of alphanumeric characters, underscore (_)
and dash (-). While there are no official limits it is strongly suggested that
they be between 5 and 80 characters in length. An identifier must be unique
across the entirety of Internet Archive.
Item An item is the primary entity of the Internet Archive. All of the files
you upload will be contained in items. Each item has its own Internet Archive
page, also known as its details page. The details page can be accessed using
the following URL pattern:
http://archive.org/details/IDENTIFIER
Internet Archive's Item Structure (in brief)

foo: Fill in this section
Items == have 'detail' pages ; metadata
Can be several files per items (http link)
For info on IA's item structure:
http://www.archive.org/about/faqs.php
(sorry!)
You can also look at an item's structure directly by clicking the HTTP link
shown on a details page. ex: http://archive.org/details/stats
IAS3 HTTP Return Codes

The IAS3 API may return the following HTTP Return Codes:
HTTP Return Code Code Meaning
102
Processing
200
Ok
201
Created
204
No Content
207
Multi-Status
400
Bad Request
403
Forbidden
404
Not Found
405
Method Not Allowed
409
Conflict
412
Precondition failed
415
Unsupported Media Type
422
Unprocessable Entity
423
Locked
424
Failed Dependency
502
Bad Gateway
507
Insufficient Storage
Error Messages

IAS3 may return the following error messages:
Error Code Error Message HTTP Code Returned
AccessDenied
Access Denied
403 Forbidden
AccountProblem
There is a problem with your AWS account that prevents the operation from
completing successfully. Please contact customer service at
webservices@amazon.com.
403 Forbidden
AmbiguousGrantByEmailAddress
The e-mail address you provided is associated with more than one account.
400 Bad Request
BadDigest
The Content-MD5 you specified did not match what we received.
400 Bad Request
BucketAlreadyExists
The requested bucket name is not available. The bucket namespace is shared by
all users of the system. Please select a different name and try again.
409 Conflict
BucketAlreadyOwnedByYou
Your previous request to create the named bucket succeeded and you already own
it.
409 Conflict
BucketNotEmpty
The bucket you tried to delete is not empty.
409 Conflict
CredentialsNotSupported
This request does not support credentials.
400 Bad Request
CrossLocationLoggingProhibited
Cross location logging not allowed. Buckets in one geographic location cannot
log information to a bucket in another location.
403 Forbidden
EntityTooSmall
Your proposed upload is smaller than the minimum allowed object size.
400 Bad Request
EntityTooLarge
Your proposed upload exceeds the maximum allowed object size.
400 Bad Request
ExpiredToken
The provided token has expired.
400 Bad Request
IncompleteBody
You did not provide the number of bytes specified by the Content-Length HTTP
header
400 Bad Request
IncorrectNumberOfFilesInPostRequest
POST requires exactly one file upload per request.
400 Bad Request
InlineDataTooLarge
Inline data exceeds the maximum allowed size.
400 Bad Request
InternalError
We encountered an internal error. Please try again.
500 Internal Server Error
InvalidAccessKeyId
The AWS Access Key Id you provided does not exist in our records.
403 Forbidden
InvalidAddressingHeader
You must specify the Anonymous role.
N/A
InvalidArgument
Invalid Argument
400 Bad Request
InvalidBucketName
The specified bucket is not valid.
400 Bad Request
InvalidDigest
The Content-MD5 you specified was an invalid.
400 Bad Request
InvalidLocationConstraint
The specified location constraint is not valid.
400 Bad Request
InvalidPayer
All access to this object has been disabled.
403 Forbidden
InvalidPolicyDocument
The content of the form does not meet the conditions specified in the policy
document.
400 Bad Request
InvalidRange
The requested range cannot be satisfied.
416 Requested Range Not Satisfiable
InvalidSecurity
The provided security credentials are not valid.
403 Forbidden
InvalidSOAPRequest
The SOAP request body is invalid.
400 Bad Request
InvalidStorageClass
The storage class you specified is not valid.
400 Bad Request
InvalidTargetBucketForLogging
The target bucket for logging does not exist, is not owned by you, or does not
have the appropriate grants for the log-delivery group.
400 Bad Request
InvalidToken
The provided token is malformed or otherwise invalid.
400 Bad Request
InvalidURI
Couldn't parse the specified URI.
400 Bad Request
KeyTooLong
Your key is too long.
400 Bad Request
MalformedACLError
The XML you provided was not well-formed or did not validate against our
published schema.
400 Bad Request
MalformedPOSTRequest
The body of your POST request is not well-formed multipart/form-data.
400 Bad Request
MaxMessageLengthExceeded
Your request was too big.
400 Bad Request
MaxPostPreDataLengthExceededError
Your POST request fields preceding the upload file were too large.
400 Bad Request
MetadataTooLarge
Your metadata headers exceed the maximum allowed metadata size.
400 Bad Request
MethodNotAllowed
The specified method is not allowed against this resource.
405 Method Not Allowed
MissingAttachment
A SOAP attachment was expected, but none were found.
N/A
MissingContentLength
You must provide the Content-Length HTTP header.
411 Length Required
MissingSecurityElement
The SOAP 1.1 request is missing a security element.
400 Bad Request
MissingSecurityHeader
Your request was missing a required header.
400 Bad Request
NoLoggingStatusForKey
There is no such thing as a logging status sub-resource for a key.
400 Bad Request
NoSuchBucket
The specified bucket does not exist.
404 Not Found
NoSuchKey
The specified key does not exist.
404 Not Found
NotImplemented
A header you provided implies functionality that is not implemented.
501 Not Implemented
NotSignedUp
Your account is not signed up for the Amazon S3 service. You must sign up
before you can use Amazon S3. You can sign up at the following URL:
http://aws.amazon.com/s3
403 Forbidden
OperationAborted
A conflicting conditional operation is currently in progress against this
resource. Please try again.
409 Conflict
PermanentRedirect
The bucket you are attempting to access must be addressed using the specified
endpoint. Please send all future requests to this endpoint.
301 Moved Permanently
PreconditionFailed
At least one of the pre-conditions you specified did not hold.
412 Precondition Failed
Redirect
Temporary redirect.
307 Moved Temporarily
RequestIsNotMultiPartContent
Bucket POST must be of the enclosure-type multipart/form-data.
400 Bad Request
RequestTimeout
Your socket connection to the server was not read from or written to within
the timeout period.
400 Bad Request
RequestTimeTooSkewed
The difference between the request time and the server's time is too large.
403 Forbidden
RequestTorrentOfBucketError
Requesting the torrent file of a bucket is not permitted.
400 Bad Request
SignatureDoesNotMatch
The request signature we calculated does not match the signature you provided.
Check your AWS Secret Access Key and signing method. For more information, see
Authenticating REST Requests and Authenticating SOAP Requests for details.
403 Forbidden
SlowDown
Please reduce your request rate.
503 Service Unavailable
TemporaryRedirect
You are being redirected to the bucket while DNS updates.
307 Moved Temporarily
TokenRefreshRequired
The provided token must be refreshed.
400 Bad Request
TooManyBuckets
You have attempted to create more buckets than allowed.
400 Bad Request
UnexpectedContent
This request does not support content.
400 Bad Request
UnresolvableGrantByEmailAddress
The e-mail address you provided does not match any account on record.
400 Bad Request
UserKeyMustBeSpecified
The bucket POST must contain the specified field name. If it is specified,
please check the order of the fields.
400 Bad Request
Default Metadata Values

Several metadata fields will receive default values if none is specified at
item creation time:
Field Default Value
uploader
The username of the Internet Archive patron used to create the item.
mediatype
data
collection
opensource
title
The identifier specified for the item.
addeddate
The current date and time formatted as YYYY-mm-dd hh:mm:ss
publicdate
The current date and time formatted as YYYY-mm-dd hh:mm:ss
Other metadata fields will not be added to the item unless explicitly
specifed.
Example IDENTIFIER_marc.xml file

The japanesefairytal00ozak_marc.xml file for Japanese Fairy
Tales:

&nbsp_place_holder;&nbsp_place_holder;00871nam a2200253 4500
&nbsp_place_holder;&nbsp_place_holder;ocm15627400
&nbsp_place_holder;&nbsp_place_holder;20060625113632.0
&nbsp_place_holder;&nbsp_place_holder;900121s1903 nyua
j 000 0 eng d
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;902182803

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;CLO

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;CLO

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;m/c

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;BNY

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;UtOrBLW

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;engeng

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;a-ja---

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;J

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;398

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;O

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Japanese fairy tales /

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;compiled by Yei Theodora Ozaki ; profusely

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;il
lustrated by Japanese artists.
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;New York :

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Grosset & Dunlap,

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;[preface 1903]

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;vii, 305 p., [1] leaf of plates :

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;ill. ;

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;22 cm.

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;NY3

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Fairy tales

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Japan.

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Folklore

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Japan.

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Ozaki, Yei Theodora.

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;j

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;59521

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Donnell Library Center

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;J 398 O

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;checked Out

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;Children's Room Stacks

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;A

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;DLC

&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;96th Street Branch

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;J 398 O

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;checked In

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;CR Reading Room Collection

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;A

&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;NSR

&nbsp_place_holder;&nbsp_place_holder;

The original file may be [viewed here](http://archive.org/download
/japanesefairytal00ozak/japanesefairytal00ozak_marc.xml).
Example IDENTIFIER_meta.mrc file

The japanesefairytal00ozak_meta.mrc file for Japanese Fairy
Tales (line breaks
added and control characters converted to ASCII representations for
readability):
00871nam&nbsp_place_holder;&nbsp_place_holder;2200253&nbsp_place_holder;&nbsp_
place_holder;&nbsp_place_holder;4500001001300000005001700013008004100030035001
40007104000320008504
100110011704300120012809100140014024501040015426000500025830000540030859000080
0362650
002400370650002100394700002500415923000600440995001000446920008100456920008000
537^^oc
m15627400&nbsp_place_holder;^^20060625113632.0^^900121s1903&nbsp_place_holder;
&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;nyua&nbsp_place_holde
r;&nbsp_place_holder;&nbsp_place_holder;j&nbsp_place_holder;&nbsp_place_holder
;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;&nbsp_place_holder;0
00&nbsp_place_holder;0&nbsp_place_holder;eng&nbsp_place_holder;d^^&nbsp_place_
holder;&nbsp_place_holder;^_a90218280
3^^&nbsp_place_holder;&nbsp_place_holder;^_aCLO^_cCLO^_dm/c^_dBNY^_dUtOrBLW^^&
nbsp_place_holder;^_aengeng^^&nbsp_place_holder;&nbsp_place_holder;^_aa-
ja---^^&nbsp_place_holder;&nbsp_place_holder;^_pJ^_a398^_cO^^
0^_aJapanese&nbsp_place_holder;fairy&nbsp_place_holder;tales&nbsp_place_holder
;/^_ccompiled&nbsp_place_holder;by&nbsp_place_holder;Yei&nbsp_place_holder;The
odora&nbsp_place_holder;Ozaki&nbsp_place_holder;;&nbsp_place_holder;profusely&
nbsp_place_holder;illustrated&nbsp_place_holder;b
y&nbsp_place_holder;Japanese&nbsp_place_holder;artists.^^&nbsp_place_holder;&n
bsp_place_holder;^_aNew&nbsp_place_holder;York&nbsp_place_holder;:^_bGrosset&n
bsp_place_holder;&&nbsp_place_holder;Dunlap,^_c[preface&nbsp_place_holder;1903
]^^&nbsp_place_holder;&nbsp_place_holder;^_avii,&nbsp_place_holder;
305&nbsp_place_holder;p.,&nbsp_place_holder;[1]&nbsp_place_holder;leaf&nbsp_pl
ace_holder;of&nbsp_place_holder;plates&nbsp_place_holder;:^_bill.&nbsp_place_h
older;;^_c22&nbsp_place_holder;cm.^^&nbsp_place_holder;&nbsp_place_holder;^_aN
Y3^^&nbsp_place_holder;0^_aFairy&nbsp_place_holder;tales^_zJapan.^^
&nbsp_place_holder;0^_aFolklore^_zJapan.^^1&nbsp_place_holder;^_aOzaki,&nbsp_p
lace_holder;Yei&nbsp_place_holder;Theodora.^^&nbsp_place_holder;&nbsp_place_ho
lder;^_aj^^&nbsp_place_holder;&nbsp_place_holder;^_a59521^^&nbsp_place_holder;
&nbsp_place_holder;^_aDonnell&nbsp_place_holder;L
ibrary&nbsp_place_holder;Center^_bJ&nbsp_place_holder;398&nbsp_place_holder;O^
_cchecked&nbsp_place_holder;Out^_dChildren's&nbsp_place_holder;Room&nbsp_place
_holder;Stacks^_rA^_zDLC^^&nbsp_place_holder;&nbsp_place_holder;^_a96th&nbsp_p
lace_holder;S
treet&nbsp_place_holder;Branch^bJ&nbsp_place_holder;398&nbsp_place_holder;O^
cchecked&nbsp_place_holder;In^_dCR&nbsp_place_holder;Reading&nbsp_place_holder
;Room&nbsp_place_holder;Collection^_rA^_zNSR^^^]
The original file may be [viewed here](http://ia600308.us.archive.org/27/items
/japanesefairytal00ozak/japanesefairytal00ozak_meta.mrc).