Skip to content

Instantly share code, notes, and snippets.

@nking
Created September 1, 2011 03:47
Show Gist options
  • Save nking/1185410 to your computer and use it in GitHub Desktop.
Save nking/1185410 to your computer and use it in GitHub Desktop.
appengine custom transforms
Notes on exporting and importing data to appengine.
Schema refactoring or complex relationships between entities may
require custom transforms to be written and added to the config yaml
file that the bulkloader generates automatically.
A few helpful resources on that are
http://wereword-gae.googlecode.com/hg/backup/helpers.py
http://bulkloadersample.appspot.com/
http://longsystemit.com/javablog/?p=23
and Google I/O 2011:
http://bulkloadersample.appspot.com/showfile/bulkloader-presentation.pdf
and for pointers using the bulkloader default config.yaml
http://ikaisays.com/2010/06/10/using-the-bulkloader-with-java-app-engine/
(1) Collections of items that are stored in the entity (implicity as embedded classes)
such as a list of strings need custom transforms:
In the config yaml file:
- property: parameterNames
external_name: parameterNames
import_transform: transform.split_string('|')
export_transform: additionaltransformers.export_parameternames_to_string
And in the imported additionaltransformers.py:
def export_parameternames_to_string(value, bulkload_state):
parameters = bulkload_state.current_instance['parameterNames']
if parameters is None:
return ''
return "|".join(["%s" % (k) for k in parameters])
(2) A change of ancestor keys to refactor membership in entity groups or
for migration may need the key components separated on export and
reconstructed differently on an import.
Note that the changes should work with the appengine's big table
rules of entity groups and ancestors.
In this case, I've created a new column for the export to be used during a
subsequent import. The new column is parentKey and that's generated here.
In the config yaml file:
- kind: Event
connector: csv
connector_options:
encoding: utf-8
columns: from_header
import_options:
dialect: excel-tab
export_options:
dialect: excel-tab
property_map:
- property: __key__
external_name: key
export:
- external_name: parentKey
export_transform:
additionaltransformers.create_pseduo_parent_event_key_string
- external_name: key
export_transform: transform.key_id_or_name_as_string_n(0)
import_transform: additionaltransformers.create_event_key
And in the imported additionaltransformers.py:
def create_event_key(value, bulkload_state):
pn = bulkload_state.current_dictionary['parentKey']
kn = bulkload_state.current_dictionary['key']
kpath = ['Event', pn] + ['Event', kn]
return datastore.Key.from_path(*kpath)
(3) Exploring the creation of DAGs? Because we can create keys with common
ancestors and hence co-membership in an entity group and because we can create
the one-to-many IDX fields in the "many" entity, we should be able to create
entities that exist in more than one collection of another entity in our tsv files
and then in the datastore via upload.
The DAG relationships are already possible as unowned relationships in appengine,
that is, as collections of keys.
But if it's possible to use the convenient auto-fetching within a transaction
to have the full entities available from a fetch (and force fetches via touches
within the transaction to get as much of the DAG as needed), this might be useful.
Caveat is that even if the import of the DAG with "owned" relationships
succeeds, there may be trouble during transactional updates...haven't
tried this...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment