Skip to content

Instantly share code, notes, and snippets.

@ericandrewlewis
Last active January 19, 2019 00:02
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ericandrewlewis/a1b127aefbd58c210666 to your computer and use it in GitHub Desktop.
Save ericandrewlewis/a1b127aefbd58c210666 to your computer and use it in GitHub Desktop.
Towards a data model for scalable queries against rich WordPress post attribute data

Towards a data model for scalable queries against rich WordPress post attribute data

This document is in draft status, and is being made available for peer review. If you have feedback, comment on this gist or email me.

Let's say you're a web developer. A client of yours is a gemstone dealer in Manhattan. This dealer has an inventory of gemstones that they'd like to put on their website.

Gemstones have a plethora of attributes: type of gemstone (e.g. sapphire or emerald), price, weight, width, height, depth, color, shape, country of origin, treatment (if it's heat treated), who certified these details, whether the stone is a single stone or a pair.

You're offered the task to make a website to display the gemstone inventory, which will include a search interface. So a user can say "I want to see sapphires that are no heavier than 1.33 karats that are oval, between $750 and $1250, between 1.11mm and 1.20mm that weren't heat treated, because heat treatment is cheap."

Should you use WordPress for this site? Let's take a look.

WordPress offers an API to define custom content types - these are called "custom post types," a nod to WordPress' history as a blogging platform.

So, we can create a "custom post type" for Gemstones. Cool.

WordPress has a few mechanisms for storing attribute data.

We can store data against the gemstone post object, i.e. any of the columns in the WordPress post table which translate into a WP_Post object. This could be post_title, post_content, post_excerpt, post_date. None of the attributes for a gemstone fit nicely into the conceptual utility of these fields, so that's a no-go.

WordPress has two methods of storing and querying secondary attributes: taxonomies and post meta.

Taxonomies are intended for grouping similar data. Some of our attributes fall into this boat - gemstone type, color, shape, country of origin, certifier, treatment, and single vs. pair. Each of these no more than 15 possible values, so I guess we could make a taxonomy for each of these, and create taxonomy terms like "orange" for color, "pear" for shape, etc.

Post meta is intended for unique data, which the rest of our attributes are. Weight, price, width, height, and depth. Here we see finite float values like 1.32 and 1275.

Thing is, WordPress' database structure isn't built for this. The WordPress post meta and taxonomy tables are designed for generic-use as "entity-attribute-value" tables, and that's sort of an anti-pattern. Querying against both post meta and taxonomy in complex ways as we're describing will not work because of this. And other nuances: the post meta table's meta_value column can't be indexed because WordPress casts the meta_value column when querying, and MySQL doesn't support function-based indexes.

So, we can't use WordPress' internal attribute APIs. Should we even use WordPress at all for this job?

Sure, why not. We get a lot by using a custom post type: routing, users can edit base data (title, content, excerpt), revision history, etc. It's attribute storage and querying we need to roll our own.

We can store the data we need to query against in a separate data table, and use a few WordPress filter hooks to modify the database querying.

This attribute data table will employ a strict relational design. Just because WordPress is designed using entity-attribute-value design doesn't mean it's good or that there isn't anything else out there. We'll put some thought into the type definition for each column so SQL queries can use indexes and scale like crazy.

/**
 * Update the gemstone table.
 */
function update_database_gemstone_data_table() {
	global $wpdb;
	$db_version = 3;
	$installed_version = get_option( 'site_db_version', 0 );
	if ( $installed_version == $db_version ) {
		return;
	}
	$table_name = $wpdb->prefix . 'gemstone_attribute_data';

	$charset_collate = $wpdb->get_charset_collate();
	$sql = "CREATE TABLE $table_name (
		post_id bigint(20),
		weight float(11,4),
		color int(11),
		shape int(11),
		origin int(11),
		treatment int(11),
		price int(11),
		width float(11,4),
		height float(11,4),
		certifying_lab int(11),
		single_or_pair int(11),
		PRIMARY KEY (post_id)
	) $charset_collate;";

	require_once( ABSPATH . 'wp-admin/includes/upgrade.php' );
	dbDelta( $sql );

	update_option( 'site_db_version', $db_version );
}
add_action( 'init', 'update_database_gemstone_data_table' );

You'll notice I made int columns for taxonomy-leaning attributes (e.g. color). green is stored in the database as 0, orange as 1. This is application of our strict relational database design. If we want to define more data for these fields - like a human-readable title, a slug, etc., we could make a separate table for color attributes, etc., or perhaps employ those as custom post types in themselves.

WP_Query is the muscle behind WordPress' SQL-querying. It takes as an input query vars like "category_name" or "author_name" from a front-end page's interpreted URL, constructs the proper MySQL, and gives you back posts with a looping interface. If we want to query against this data in a WordPress-y way, let's register our attributes as query vars.

function gemstone_query_vars() {
	return array( 'gemstone_type', 'gemstone_min_price', 'gemstone_max_price', 'gemstone_weight', 
		'gemstone_color', 'gemstone_shape', 'gemstone_origin', 'gemstone_treatment',
		'gemstone_min_width', 'gemstone_max_width', 'gemstone_min_height', 'gemstone_max_height',
		'gemstone_certifying_lab', 'gemstone_single_or_pair' );
}

function add_gemstone_query_vars( $query_vars ) {
	$gemstone_query_vars = gemstone_query_vars();
	foreach ( $gemstone_query_vars as $attribute ) {
		$query_vars[] = $attribute;
	}
	return $query_vars;
}
add_filter( 'query_vars', array( $this, 'add_gemstone_query_vars' ) );

Registering the attributes as query vars allows us to opt-in and out of modifying WP_Query's lower-level where and join clauses for the SQL it's going to run when querying posts.

function get_gemstone_query_vars_from_wp_query( $query ) {
	$gemstone_query_vars = gemstone_query_vars();
	$gemstone_query_vars_found = array();
	foreach ( $gemstone_query_vars as $gemstone_query_var ) {
		$query_var_value = $query->get( $gemstone_query_var );
		if ( ! empty( $query_var_value ) ) {
			$gemstone_query_vars_found[$gemstone_query_var] = $query_var_value;
		}
	}
	return $gemstone_query_vars_found;
}

/**
 * Join the WordPress post table to the gemstone attributes table.
 */
function posts_join_additions_for_gemstones( $join, $query ) {
		$gemstone_query_vars = get_gemstone_query_vars_from_wp_query();
		// Bail if the query doesn't have any gemstone-specific query vars.
		if ( count( $gemstone_query_vars ) == 0 ) {
			return $join;
		}
		$gemstone_attrs_table = $wpdb->prefix . 'gemstone_attribute_data';
		$join .= sprintf( " INNER JOIN $table_name ON %s.ID = $table_name.post_id", $this->wpdb->posts );
		return $join;
	}
add_filter( 'posts_join', array( $this, 'posts_where_additions_for_gemstones' ), 10, 2 );

/**
 * Add a where clause for each gemstone attribute found in the query variables.
 */
function posts_where_additions_for_gemstones( $where, $query ) {
	global $wpdb;
	
	$gemstone_query_vars = get_gemstone_query_vars_from_wp_query();
	// Bail if the query doesn't have any gemstone-specific query vars.
	if ( count( $gemstone_query_vars ) == 0 ) {
		return $join;
	}
	$gemstone_attrs_table = $wpdb->prefix . 'gemstone_attribute_data';
	foreach ( $gemstone_query_vars as $var_name => $value ) {
		$where .= sprintf( " AND %s.%s=%d", $gemstone_attrs_table, $var_name, $value );
	}
	return $where;
}
add_filter( 'posts_where', 'posts_where_additions_for_gemstones', 10, 2 );

There's of course further to go here. You might want to offer your users UI to edit this attribute data. You might want to support multi-select search (i.e. either orange or fuschia gemstones). You might want some actual validation of these query vars before opening up your site to MySQL injection. I've left it sparse be a proof of concept rather than multi-purpose boilerplate.

@JoeCrash
Copy link

Are you stalking me? JK. I'm a web developer with a gemstone dealer client in Manhattan. The only difference is that we want to upload our clients inventory, or allow them an easy way to do so from their own website/system. I will see if I can build on this and of course share the work that comes from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment