Skip to content

Instantly share code, notes, and snippets.

@Nodirbek-Sharipov
Created June 27, 2022 09:43
Show Gist options
  • Save Nodirbek-Sharipov/25082bc19fe17d393b35ebe474ba1727 to your computer and use it in GitHub Desktop.
Save Nodirbek-Sharipov/25082bc19fe17d393b35ebe474ba1727 to your computer and use it in GitHub Desktop.
Recent SQL experiences

Recent SQL experiences

1. From O(N) to O(1)

In a recent project, we had a REST API with GET /profile route, which returned profile data. Profile data needed to be taken from users table as well as many other tables like: universities, identities, sectors, companies, question_answers, can_help. All of those subsequent tables are related to user with foreign key and initially there were N network requests for each table query, which was taking O(N) linear time to execute. After optimisations the time function was reduced to O(1) constant time. The optimisation was to send queries in bulk (Multiple select statemets in a single network request) and extract each table afterwards from the DB response like so:

const [ query_tables ] = await db.read.raw(`
	SELECT
		un.id AS id,
		un.name AS name
	FROM universities un
	INNER JOIN user_universities us
	ON un.id = us.university_id
	WHERE us.user_id = ?
	ORDER BY us.id DESC;

	SELECT
		i.id AS id,
		i.name AS name,
		i.emoji AS emoji,
		i.is_active AS is_active
	FROM user_identities ui
	INNER JOIN identities i
	ON i.id = ui.identity_id
	WHERE ui.user_id = ?
	ORDER BY ui.id DESC;

	SELECT
		s.id AS id,
		s.name AS name,
		s.emoji AS emoji
	FROM user_sectors us
	INNER JOIN sectors s
	ON s.id = us.sector_id
	WHERE us.user_id = ?
	ORDER BY us.id DESC;

	SELECT
		c.id AS id,
		c.company AS company,
		c.is_active AS is_active
	FROM user_companies uc
	INNER JOIN companies c
	ON c.id = uc.company_id
	WHERE uc.user_id = ?
	ORDER BY uc.id DESC;

	SELECT
		aq.id AS question_id,
		aq.question AS question,
		aq.placeholder AS placeholder,
		aq.is_required AS is_required,
		aqa.id AS answer_id,
		aqa.answer AS answer,
		aqa.hashtags AS hashtags
	FROM ama_question_answer aqa
	INNER JOIN ama_questions aq
	ON aq.id = aqa.question_id
	WHERE aqa.user_id = ?
	ORDER BY aq.is_required DESC, aq.id ASC;
	
	SELECT
		ch.id AS id,
		ch.name AS name,
		ch.group_title AS group_title,
		ch.created_at AS created_at
	FROM can_help_user chu
	INNER JOIN can_help ch
	ON ch.id = chu.can_help_id
	WHERE chu.user_id = ?
`, [id, id, id, id, id, id])

const [ universities, identities, sectors, companies, question_answers, can_help ] = query_tables

user['universities']		= universities
user['identities']			= identities
user['sectors']				= sectors
user['companies']			= companies
user['question_answers'] 	= question_answers
user['can_help']			= can_help

knex.js was used as a query builder

2. Raw bulk insert with an update ON DUPLICATE KEY

Another SQL query optimisation I can remember was regarding hashtag system, where there was a hashtags table with following structure:

id hashtag times_used
pk varchar(50), unique int

Each time a post is tagged with hashtags, all the hashtags are extracted and inserted into the table, the catch is, unique hashtag column would result in error on duplicate insertions, so, upon those ON DUPLICATE KEY errors, I could update number of times the hashtag is used by incrementing times_used column. The query used multiple requests for each hashtag, making redundant network calls to database, and I come up with following optimisation where all the hashtags are validated and converted to raw sql insertable string like so:

const data_table = answers.map(a => ({
	user_id: user.id,
	question_id: a.question_id,
	answer: a.answer.trim(),
	hashtags: a.hashtags || null,
}))

const flattened_hashtags = data_table
  .filter(x => !!x.hashtags && typeof x.hashtags === 'string')
  .map(({ hashtags }) => hashtags)
  // ['founders', 'fundraising', 'nofilter', 'tag', 'tag']

const uniquie_hashtags = Array(...new Set(flattened_hashtags))
  // ['founders', 'fundraising', 'nofilter', 'tag']

if(uniquie_hashtags.length > 0){
	const sqlized_values = uniquie_hashtags.map(hashtag => `('${hashtag}')`).join(', ')
	// ('founders'), ('fundraising'), ('nofilter'), ('tag')

	await db.write.raw(`INSERT INTO hashtags (hashtag) VALUES ${sqlized_values} ON DUPLICATE KEY UPDATE hashtag = VALUES(hashtag), times_used = times_used + 1`)
}

data_table comes from well validated request.body

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment