Skip to content

Instantly share code, notes, and snippets.

@SirNeuman
Last active February 28, 2017 04:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SirNeuman/c3f8c3f6d8f38dfd81fde59862761b8a to your computer and use it in GitHub Desktop.
Save SirNeuman/c3f8c3f6d8f38dfd81fde59862761b8a to your computer and use it in GitHub Desktop.
Bastille Agency Test

All of the following code is written in Python.

Problem 1:

Data is just a dictionary, where the keys are user tokens. The value for each key is another dictionary, where the keys are the URLs. I could use a set instead of a dictionary for the URLs, but finding a matching value in a dictionary is O(1) as opposed to a set, which is O(n). This also would allow us in the future to add more data to each URL if we wanted to. E.G. data = {'john': {'google.com': {'potential_future_bool': True}, 'cnn.com': {}}, 'aaron': {'reddit.com': {}}}

Of course in a real world application you could not store all of this data in memory, because it's not scalable or persistent. Every time you'd restart your server, which would have to happen if you wanted to make any code changes, or if your server crashed, you would lose all of your data. As your data would grow the memory on the server would also get eaten up, and your server would likely crash. You could write the data to a file object on the server but I chose not to do that here since we aren't utilizing a realistic way to store data anyway, and I wanted to make the code as clear as possible.

data = {}

# Save the URL to the user in the data. If the UserToken is not already in data,
# then add it to the data.
def save_url(user_token, url):
	saved = False

	if user_token not in data:
		data[user_token] = {}

	if url not in data[user_token]:
		data[user_token][url] = {}
		saved = True

	return saved

# This is one place where the efficiency is actually slower by having the urls stored as
# keys in a dict rather than a flat list or set, but the performance in most other
# functions should outweigh this. O(n) to do dict.keys() as opposed to O(1) for just returning the list. 
# However we could just return the dictionary as is without converting it to a list first, if that was okay
# with whatever services are using this function.
def get_urls(user_token):
	# Return None for case where the user token is not in data.
	val = None

	if user_token in data:
		val = data[user_token].keys()

	return val

# Remove the url from the user token in the data. 
def remove_url(user_token, url):
	deleted = False

	# Since python looks at 'and' statemenets from left to right this will not cause a KeyError,
	# since it will exit the statement if the first half of the statement is not True.
	if user_token in data and url in data[user_token]:
		del data[user_token][url]
		deleted = True

	return deleted

Problem 2:

One potentially better way to have achieved this would have been to save the domains to each user in the data. E.G. data = {'john': {'http://xyz.com/hello': {'domain': 'xyz.com'}}}. If we had done this then we would only need to call the extract domain function on save rather than on every url every time we wanted to call this function. Another option would be to have it set up as {'john': {'xyz.com': {'http://xyz.com/hello': {}, 'http://xyz.com/goodbye': {}}}}, which would improve efficiency from O(num_users * num_urls_per_user) to just O(num_users). But because I'm not sure I can go back and have access to the extract_domain function in the previous problem I will continue with the less optimal method. I would have to go back and change the logic of the functions in the previous step to work with a different data structure. I'm also unsure of how the app will be used in general as it scales and what features would need to be added in the future, so it may not be optimal in the general performance of the app to set it with {'user_token': {'domain': {'url_1': {}, 'url_2': {}}}}.

def get_users_by_domain(url):
	domain = extract_domain(url)
	# iterate through all of the users
	users = []
	for user, urls in data.items():
		for url in urls.keys():
			if domain == extract_domain(url):
				users.append(user)
				# Break out of the loop of urls since we already found it and do not need to check any more urls
				break
	return users

Problem 3:

# Return list of urls based on the recommended url passed in for the specific user
def get_recommended_urls(user_token, url):
	recommendations = []
	url_node = get_node(url)
	user_urls = []
	if user_token in data:
		user_urls = data[user_token]
	traverse_nodes(0, url_node, recommendations, user_urls)
	return recommendations

def traverse_nodes(num_hops, url_node, recommendations, urls_to_ignore, hop_limit=3):
	num_hops += 1

	# This is set to 3 by default, but we could change it to something else if we override the default param
	if num_hops == hop_limit:
		# Case where we have reached the 3rd hop
		# Get a, b, and c nodes of current node
		nodes_list = get_a_b_c_nodes(url_node)
		for node in nodes_list:
			# Check to see if the value is not None
			if node:
				url_to_add = node.get_url()
				# Only add it to the list of recommendations if it is not in the users list of urls arleady
				if url_to_add not in urls_to_ignore:
					recommendations.append(url_to_add)
	else:
		nodes_list = get_a_b_c_nodes(url_node)
		for node in nodes_list:
			if node:
				# Recursive call to function, with child nodes of current node.
				traverse_nodes(num_hops, node, recommendations, urls_to_ignore)

# Return a list of 3 objects everytime. Returns None in that item of the list if there is no node.
def get_a_b_c_nodes(url_node):
	nodes = []
	nodes.append(url_node.A())
	nodes.append(url_node.B())
	nodes.append(url_node.C())
	return nodes

Tests:

Use unittest as our Python testing framework.

import unittest
import main

class TestNode(object):

	def __init__(self, url='test', node_a=None, node_b=None, node_c=None):
		self.node_a = node_a
		self.node_b = node_b
		self.node_c = node_c
		self.url = url

	def A(self):
		return self.node_a

	def B(self):
		return self.node_b

	def C(self):
		return self.node_c

	def get_url(self):
		return self.url

class Tests(unittest.TestCase):

	def setUp(self):
		# Start the data with some mock data
		main.data = {
			'jonny': {
				'a.com': {},
				'b.net': {},
				'aaa.com': {},
				'f.cool': {}
			},
			'xx': {
				'sam.com': {}
			},
			'zzz': {},
			'bob': {
				'b.com': {},
				'a.com': {},
				'sam.com': {}
			}
		}

	def tearDown(self):
		pass

	def test__save_url(self):
		# Case for adding new person and url
		saved = main.save_url('bill', 'arena.net')
		self.assertTrue(saved)
		self.assertTrue('bill' in main.data)
		self.assertTrue('arena.net' in main.data['bill'])
		# Case for trying to save url that arlready exists
		saved = main.save_url('jonny', 'b.net')
		self.assertFalse(saved)
		# Case for saving new url to token that already exists
		saved = main.save_url('jonny', 'xyz.gov')
		self.assertTrue(saved)
		self.assertTrue('xyz.gov' in main.data['jonny'])

	def test__get_urls(self):
		# Case for getting valid user with 3 saved urls
		urls = main.get_urls('bob')
		self.assertEqual(len(urls), 3)
		self.assertTrue('sam.com' in urls)
		self.assertTrue('b.com' in urls)
		self.assertTrue('a.com' in urls)
		# Case for getting valid user with no saved urls
		urls = main.get_urls('zzz')
		self.assertEqual(urls, [])
		# Case for trying to get invalid user
		urls = main.get_urls('notindata')
		self.assertIsNone(urls)

	def test__remove_url(self):
		# Case for valid removal of urls
		saved = main.remove_url('jonny', 'aaa.com')
		self.assertTrue(saved)
		self.assertTrue('aaa.com' not in main.data['jonny'])
		saved = main.remove_url('jonny', 'asdf.com')
		self.assertFalse(saved)

	def test__get_users_by_domain(self):
		users = main.get_users_by_domain('a.com')
		self.assertEqual(len(users), 2)
		self.assertTrue('jonny' in users)
		self.assertTrue('bob' in users)

	def test_traverse_nodes(self):
		A_A_A = TestNode('blah.com')
		A_A_B = TestNode('arg.com')
		A_A = TestNode('a.com', node_a=A_A_A, node_b=A_A_B)
		A_B_C = TestNode('b.com')
		A_B = TestNode('c.com', node_c=A_B_C)
		A_C_A = TestNode('d.com')
		A_C_B = TestNode('e.com')
		A_C_C = TestNode('f.com')
		A_C = TestNode('g.com', A_C_A, A_C_B, A_C_C)
		A = TestNode('h.com', A_A, A_B, A_C)
		B_A = TestNode('i.com')
		B_B_A = TestNode('j.com')
		B_B_C = TestNode('k.com')
		B_B = TestNode('l.com', node_a=B_B_A, node_c=B_B_C)
		B = TestNode('m.com', B_A, B_B)
		C = TestNode('n.com')
		head_node = TestNode('google.com', A, B, C)
		recommendations = []
		main.traverse_nodes(0, head_node, recommendations, {'arg.com': {}, 'e.com': {}, 'notinurls.com': {}})
		urls_that_should_be_returned = ['blah.com', 'b.com', 'd.com', 'f.com', 'j.com', 'k.com']
		for url in urls_that_should_be_returned:
			self.assertTrue(url in recommendations)
@karnie6
Copy link

karnie6 commented Feb 28, 2017

@SirNeuman - thanks so much for taking the time in completing our assessment. I really enjoyed your work and your thoughtful + very detailed explanation of your approach! Here are my comments:

  • Your solution to #1 looks good. To your point, we could debate on what the right datastructure is - I personally would have gone with something simpler (userToken mapping to a list of URLs), since there are no current requirements to optimize for adding extra data to each URL or finding a specific URL. However, your approach is totally legitimate and I enjoyed your explanation of the different approaches you thought through.
  • Your solution to #2 would certainly work, however I think you were on the right train of thought of either adding domains to be extra data to the URL, or creating a brand new datastructure that perhaps maps domains -> a set of users and updating saveUrl and removeUrl to add/remove from that new datastructure. Otherwise, your existing solution to #2 isn't optimal from a run-time perspective since you're essentially grepping the whole datastructure and calling extractDomain on every URL (to your point)
  • Your solution to #3 looked great!

Overall, I thought you did an excellent job and really enjoyed your reasonings on all 3 problems as well as the tests! Please email me directly (karthik@bastilleagency.com) if you have any questions/comments (as I don't get notifications on Gists).

Looking forward to working together, Karthik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment