Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danieliser/0bb0f78a0c032419bf0c to your computer and use it in GitHub Desktop.
Save danieliser/0bb0f78a0c032419bf0c to your computer and use it in GitHub Desktop.

Towards a data model for scalable queries against rich WordPress post attribute data

This document is in draft status, and is being made available for peer review. If you have feedback, feel free to comment directly on this gist or email me directly.

Say you have an object with a complex, idiosyncratic attribute structure. Let's say you're a web developer, and a client of yours is a gemstone dealer in Manhattan. They have an inventory of gemestones that needs to be tracked.

Gemstones have a rich plethora of attributes. Gemstone type (e.g. sapphire or emerald), price, weight, width, height, depth, color, shape, country of origin, treatment (if it's heat treated), who certified these details, whether the stone is a single stone or a pair.

You've been given the job to create this gemstone dealer's website. Oh, and this is the kicker: the client wants to offer users a search interface, so a user can say "I want to see sapphires that are no heavier than 1.33 karats that are oval, between $750 and $1250, between 1.11mm and 1.20mm that weren't heat treated, because heat treatment is tawdry." Should you use WordPress for this site?

Let's take a look.

WordPress comes with a public API to define customized content types - these are called "custom post types," a nod to WordPress' history as a blogging platform.

So, we can create a "custom post type" for Gemstones. Cool.

WordPress has a few mechanisms for storing attribute data.

We can store data against the gemstone post object, i.e. any of the columns in the WordPress post table which translate into a WP_Post object. This could be post_title, post_content, post_excerpt, post_date. None of the attributes for a gemstone fit nicely into the conceptual utility of these fields, so that's a no-go.

WordPress has two methods of storing and querying secondary attributes: taxonomies and post meta.

Taxonomies are intended for grouping similar data. Some of our attributes fall into this boat - gemstone type, color, shape, country of origin, certifier, treatment, and single vs. pair. Each of these no more than 15 possible values, so I guess we could make a taxonomy for each of these, and create taxonomy terms like "orange" for color, "pear" for shape, etc.

Post meta is intended for unique data, which the rest of our attributes are. Weight, price, width, height, and depth. Here we see finite float values like 1.32 and 1275.

Thing is, WordPress' database structure isn't built for this. The WordPress post meta and taxonomy tables are designed for generic-use as "entity-attribute-value" tables, and that's sort of an anti-pattern. Querying against both post meta and taxonomy in complex ways as we're describing will not work because of this. And other nuances: the post meta table's meta_value column can't be indexed because WordPress casts the meta_value column when querying, and MySQL doesn't support function-based indexes.

So, we can't use WordPress' internal attribute APIs. Should we even use WordPress at all for this job?

Sure, why not. We get a lot of creating a custom post type for our data structure: application routing, user editability of base data attributes(title, content, excerpt), revision history, etc. It's just attribute storage and querying we need to roll our own.

We can store the data we need to query against in a separate data table, and use a few WordPress filter hooks to modify the database querying.

This attribute data table will employ a strict relational design. Just because WordPress is designed using entity-attribute-value design doesn't mean it's good or that there isn't anything else out there. We'll put some thought into the type definition for each column so SQL queries can use indexes and scale like crazy.

/**
 * Update the gemstone table.
 */
function update_database_gemstone_data_table() {
	global $wpdb;
	$db_version = 3;
	$installed_version = get_option( 'site_db_version', 0 );
	if ( $installed_version == $db_version ) {
		return;
	}
	$table_name = $wpdb->prefix . 'gemstone_attribute_data';

	$charset_collate = $wpdb->get_charset_collate();
	$sql = "CREATE TABLE $table_name (
		post_id bigint(20),
		weight float(11,4),
		color int(11),
		shape int(11),
		origin int(11),
		treatment int(11),
		price int(11),
		width float(11,4),
		height float(11,4),
		certifying_lab int(11),
		single_or_pair int(11),
		PRIMARY KEY (post_id)
	) $charset_collate;";

	require_once( ABSPATH . 'wp-admin/includes/upgrade.php' );
	dbDelta( $sql );

	update_option( 'site_db_version', $db_version );
}
add_action( 'init', 'update_database_gemstone_data_table' );

You'll notice I made int columns for taxonomy-leaning attributes (e.g. color). green is stored in the database as 0, orange as 1. This is application of our strict relational database design. If we want to define more data for these fields - like a human-readable title, a slug, etc., we could make a separate table for color attributes, etc., or perhaps employ those as custom post types in themselves.

WP_Query is the muscle behind WordPress' SQL-querying. It takes as an input query vars like "category_name" or "author_name" from a front-end page's interpreted URL, constructs the proper MySQL, and gives you back posts with a looping interface. If we want to query against this data in a WordPress-y way, let's register our attributes as query vars.

function gemstone_query_vars() {
	return array( 'gemstone_type', 'gemstone_min_price', 'gemstone_max_price', 'gemstone_weight', 
		'gemstone_color', 'gemstone_shape', 'gemstone_origin', 'gemstone_treatment',
		'gemstone_min_width', 'gemstone_max_width', 'gemstone_min_height', 'gemstone_max_height',
		'gemstone_certifying_lab', 'gemstone_single_or_pair' );
}

function add_gemstone_query_vars( $query_vars ) {
	$gemstone_query_vars = gemstone_query_vars();
	foreach ( $gemstone_query_vars as $attribute ) {
		$query_vars[] = $attribute;
	}
	return $query_vars;
}
add_filter( 'query_vars', array( $this, 'add_gemstone_query_vars' ) );

Registering the attributes as query vars allows us to opt-in and out of modifying WP_Query's lower-level where and join clauses for the SQL it's going to run when querying posts.

function get_gemstone_query_vars_from_wp_query( $query ) {
	$gemstone_query_vars = gemstone_query_vars();
	$gemstone_query_vars_found = array();
	foreach ( $gemstone_query_vars as $gemstone_query_var ) {
		$query_var_value = $query->get( $gemstone_query_var );
		if ( ! empty( $query_var_value ) ) {
			$gemstone_query_vars_found[$gemstone_query_var] = $query_var_value;
		}
	}
	return $gemstone_query_vars_found;
}

/**
 * Join the WordPress post table to the gemstone attributes table.
 */
function posts_join_additions_for_gemstones( $join, $query ) {
		$gemstone_query_vars = get_gemstone_query_vars_from_wp_query();
		// Bail if the query doesn't have any gemstone-specific query vars.
		if ( count( $gemstone_query_vars ) == 0 ) {
			return $join;
		}
		$gemstone_attrs_table = $wpdb->prefix . 'gemstone_attribute_data';
		$join .= sprintf( " INNER JOIN $table_name ON %s.ID = $table_name.post_id", $this->wpdb->posts );
		return $join;
	}
add_filter( 'posts_join', array( $this, 'posts_where_additions_for_gemstones' ), 10, 2 );

/**
 * Add a where clause for each gemstone attribute found in the query variables.
 */
function posts_where_additions_for_gemstones( $where, $query ) {
	global $wpdb;
	
	$gemstone_query_vars = get_gemstone_query_vars_from_wp_query();
	// Bail if the query doesn't have any gemstone-specific query vars.
	if ( count( $gemstone_query_vars ) == 0 ) {
		return $join;
	}
	$gemstone_attrs_table = $wpdb->prefix . 'gemstone_attribute_data';
	foreach ( $gemstone_query_vars as $var_name => $value ) {
		$where .= sprintf( " AND %s.%s=%d", $gemstone_attrs_table, $var_name, $value );
	}
	return $where;
}
add_filter( 'posts_where', 'posts_where_additions_for_gemstones', 10, 2 );

There's of course further to go here. You might want to offer your users UI to edit this attribute data. You might want to support multi-select search (i.e. either orange or fuschia gemstones). You might want some actual validation of these query vars before opening up your site to MySQL injection. I've left it sparse be a proof of concept rather than multi-purpose boilerplate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment