Skip to content

Instantly share code, notes, and snippets.

@bjeanes
Created June 13, 2016 12:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bjeanes/808926b096b38e30f026bff40e8cd355 to your computer and use it in GitHub Desktop.
Save bjeanes/808926b096b38e30f026bff40e8cd355 to your computer and use it in GitHub Desktop.
Incomplete script to generate schema hints for 18F's rdbms-subsetter tool. However, when running their tool, I kept getting Python exceptions that implied the dependent libraries couldn't deal with constraints in our DB schema, so I tabled it for now.
#!/usr/bin/env ruby
###############################################################################
#
# Uses https://github.com/18F/rdbms-subsetter to pull a subset of data from the
# production database. Because Rails databases rarely define foreign keys, this
# inspects the models for in-Ruby declared associations and generates a config
# file for `rdbms-subsetter` to use for understanding the schema.
#
###############################################################################
abort "Only use this in development mode" if ENV.fetch('RAILS_ENV', 'development') != 'development'
abort "`rdbms-subsetter` required; install with `pip install rdbms-subsetter`" unless system("which rdbms-subsetter >/dev/null")
require_relative "../config/environment"
# Hash of table names to column hashes
#
SCHEMA = Hash.new do |tables, table|
# Each column hash is of a column name (foreign key) to a target table+column
# names tuple.
tables[table] = {}
end
# Iterate over all known models configuration and define association column
# relationships based on it.
Rails.application.eager_load! # make sure all models are loaded
ActiveRecord::Base.descendants.each do |model_class|
model_class.reflections.each do |name, reflection|
next unless ActiveRecord::Reflection::AssociationReflection === reflection
next unless reflection.macro == :belongs_to
next if reflection.options[:polymorphic]
target = [reflection.table_name, reflection.association_primary_key]
SCHEMA[model_class.table_name][reflection.foreign_key] = target
end
end
subsetter_config = {}
subsetter_config[:constraints] = Hash[SCHEMA.map do |table, columns|
[table, columns.flat_map do |col, target|
{
referred_schema: nil,
referred_table: target[0],
referred_columns: [target[1]],
constrained_columns: [col]
}
end]
end]
config_file = Rails.root.join("config/rdbms-subsetter.json")
File.open(config_file, "w") do |f|
f.write subsetter_config.to_json
end
source = `heroku config:get DATABASE_URL`.chomp
destination = "postgresql:///prod"
exec(["rdbms-subsetter", "-l", "--config", config_file, source, destination, "0.4"])
@bjeanes
Copy link
Author

bjeanes commented Jun 14, 2016

Exception I'm getting when rdbms-subsetter actually runs:

/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2503: SAWarning: Did not recognize type 'citext' of column 'email'
  (attype, name))
/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2773: SAWarning: Predicate of partial index review_studies_on_review_id_and_category_name ignored during reflection
  % idx_name)
/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2766: SAWarning: Skipped unsupported reflection of expression-based index index_references_on_md5_title
  % idx_name)
/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2766: SAWarning: Skipped unsupported reflection of expression-based index reference_search_attrs
  % idx_name)
Traceback (most recent call last):
  File "/usr/local/bin/rdbms-subsetter", line 11, in <module>
    sys.exit(generate())
  File "/usr/local/lib/python2.7/site-packages/subsetter.py", line 548, in generate
    source = Db(args.source, args, schemas)
  File "/usr/local/lib/python2.7/site-packages/subsetter.py", line 224, in __init__
    self.fix_postgres_array_of_enum(tbl)
  File "/usr/local/lib/python2.7/site-packages/subsetter.py", line 257, in fix_postgres_array_of_enum
    col_str = str(col.type)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/type_api.py", line 481, in __str__
    return unicode(self.compile()).\
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/type_api.py", line 468, in compile
    return dialect.type_compiler.process(self)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 261, in process
    return type_._compiler_dispatch(self, **kw)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 81, in _compiler_dispatch
    return meth(self, **kw)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 2606, in visit_null
    "type on this Column?" % type_)
sqlalchemy.exc.CompileError: Can't generate DDL for NullType(); did you forget to specify a type on this Column?

I haven't looked into what table/index/column is tripping it up yet, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment