SirNeuman/BastilleAgencyTest.md

## BastilleAgencyTest.md

      
    Raw
  

              BastilleAgencyTest.md
            
          
    All of the following code is written in Python.
Problem 1:

Data is just a dictionary, where the keys are user tokens.
The value for each key is another dictionary, where the keys are the URLs.
I could use a set instead of a dictionary for the URLs, but finding a matching
value in a dictionary is O(1) as opposed to a set, which is O(n). This also
would allow us in the future to add more data to each URL if we wanted to.
E.G. data = {'john': {'google.com': {'potential_future_bool': True}, 'cnn.com': {}}, 'aaron': {'reddit.com': {}}}
Of course in a real world application you could not store all of this data in memory,
because it's not scalable or persistent. Every time you'd restart your server, which would have
to happen if you wanted to make any code changes, or if your server crashed, you would lose all of your
data. As your data would grow the memory on the server would also get eaten up, and your server would likely
crash. You could write the data to a file object on the server but I chose not to do that here since
we aren't utilizing a realistic way to store data anyway, and I wanted to make the code as clear as possible.
data = {}

# Save the URL to the user in the data. If the UserToken is not already in data,
# then add it to the data.
def save_url(user_token, url):
	saved = False

	if user_token not in data:
		data[user_token] = {}

	if url not in data[user_token]:
		data[user_token][url] = {}
		saved = True

	return saved

# This is one place where the efficiency is actually slower by having the urls stored as
# keys in a dict rather than a flat list or set, but the performance in most other
# functions should outweigh this. O(n) to do dict.keys() as opposed to O(1) for just returning the list. 
# However we could just return the dictionary as is without converting it to a list first, if that was okay
# with whatever services are using this function.
def get_urls(user_token):
	# Return None for case where the user token is not in data.
	val = None

	if user_token in data:
		val = data[user_token].keys()

	return val

# Remove the url from the user token in the data. 
def remove_url(user_token, url):
	deleted = False

	# Since python looks at 'and' statemenets from left to right this will not cause a KeyError,
	# since it will exit the statement if the first half of the statement is not True.
	if user_token in data and url in data[user_token]:
		del data[user_token][url]
		deleted = True

	return deleted

Problem 2:

One potentially better way to have achieved this would have been to save the domains
to each user in the data. E.G. data = {'john': {'http://xyz.com/hello': {'domain': 'xyz.com'}}}.
If we had done this then we would only need to call the extract domain function on save rather
than on every url every time we wanted to call this function. Another option would be to have it set
up as {'john': {'xyz.com': {'http://xyz.com/hello': {}, 'http://xyz.com/goodbye': {}}}}, which would improve efficiency
from O(num_users * num_urls_per_user) to just O(num_users). But because I'm not sure I can go back
and have access to the extract_domain function in the previous problem I will continue with the less optimal method. I would
have to go back and change the logic of the functions in the previous step to work with a different data structure.
I'm also unsure of how the app will be used in general as it scales and what features would need to be added in the future,
so it may not be optimal in the general performance of the app to set it with {'user_token': {'domain': {'url_1': {}, 'url_2': {}}}}.
def get_users_by_domain(url):
	domain = extract_domain(url)
	# iterate through all of the users
	users = []
	for user, urls in data.items():
		for url in urls.keys():
			if domain == extract_domain(url):
				users.append(user)
				# Break out of the loop of urls since we already found it and do not need to check any more urls
				break
	return users

Problem 3:

# Return list of urls based on the recommended url passed in for the specific user
def get_recommended_urls(user_token, url):
	recommendations = []
	url_node = get_node(url)
	user_urls = []
	if user_token in data:
		user_urls = data[user_token]
	traverse_nodes(0, url_node, recommendations, user_urls)
	return recommendations

def traverse_nodes(num_hops, url_node, recommendations, urls_to_ignore, hop_limit=3):
	num_hops += 1

	# This is set to 3 by default, but we could change it to something else if we override the default param
	if num_hops == hop_limit:
		# Case where we have reached the 3rd hop
		# Get a, b, and c nodes of current node
		nodes_list = get_a_b_c_nodes(url_node)
		for node in nodes_list:
			# Check to see if the value is not None
			if node:
				url_to_add = node.get_url()
				# Only add it to the list of recommendations if it is not in the users list of urls arleady
				if url_to_add not in urls_to_ignore:
					recommendations.append(url_to_add)
	else:
		nodes_list = get_a_b_c_nodes(url_node)
		for node in nodes_list:
			if node:
				# Recursive call to function, with child nodes of current node.
				traverse_nodes(num_hops, node, recommendations, urls_to_ignore)

# Return a list of 3 objects everytime. Returns None in that item of the list if there is no node.
def get_a_b_c_nodes(url_node):
	nodes = []
	nodes.append(url_node.A())
	nodes.append(url_node.B())
	nodes.append(url_node.C())
	return nodes

Tests:

Use unittest as our Python testing framework.
import unittest
import main

class TestNode(object):

	def __init__(self, url='test', node_a=None, node_b=None, node_c=None):
		self.node_a = node_a
		self.node_b = node_b
		self.node_c = node_c
		self.url = url

	def A(self):
		return self.node_a

	def B(self):
		return self.node_b

	def C(self):
		return self.node_c

	def get_url(self):
		return self.url

class Tests(unittest.TestCase):

	def setUp(self):
		# Start the data with some mock data
		main.data = {
			'jonny': {
				'a.com': {},
				'b.net': {},
				'aaa.com': {},
				'f.cool': {}
			},
			'xx': {
				'sam.com': {}
			},
			'zzz': {},
			'bob': {
				'b.com': {},
				'a.com': {},
				'sam.com': {}
			}
		}

	def tearDown(self):
		pass

	def test__save_url(self):
		# Case for adding new person and url
		saved = main.save_url('bill', 'arena.net')
		self.assertTrue(saved)
		self.assertTrue('bill' in main.data)
		self.assertTrue('arena.net' in main.data['bill'])
		# Case for trying to save url that arlready exists
		saved = main.save_url('jonny', 'b.net')
		self.assertFalse(saved)
		# Case for saving new url to token that already exists
		saved = main.save_url('jonny', 'xyz.gov')
		self.assertTrue(saved)
		self.assertTrue('xyz.gov' in main.data['jonny'])

	def test__get_urls(self):
		# Case for getting valid user with 3 saved urls
		urls = main.get_urls('bob')
		self.assertEqual(len(urls), 3)
		self.assertTrue('sam.com' in urls)
		self.assertTrue('b.com' in urls)
		self.assertTrue('a.com' in urls)
		# Case for getting valid user with no saved urls
		urls = main.get_urls('zzz')
		self.assertEqual(urls, [])
		# Case for trying to get invalid user
		urls = main.get_urls('notindata')
		self.assertIsNone(urls)

	def test__remove_url(self):
		# Case for valid removal of urls
		saved = main.remove_url('jonny', 'aaa.com')
		self.assertTrue(saved)
		self.assertTrue('aaa.com' not in main.data['jonny'])
		saved = main.remove_url('jonny', 'asdf.com')
		self.assertFalse(saved)

	def test__get_users_by_domain(self):
		users = main.get_users_by_domain('a.com')
		self.assertEqual(len(users), 2)
		self.assertTrue('jonny' in users)
		self.assertTrue('bob' in users)

	def test_traverse_nodes(self):
		A_A_A = TestNode('blah.com')
		A_A_B = TestNode('arg.com')
		A_A = TestNode('a.com', node_a=A_A_A, node_b=A_A_B)
		A_B_C = TestNode('b.com')
		A_B = TestNode('c.com', node_c=A_B_C)
		A_C_A = TestNode('d.com')
		A_C_B = TestNode('e.com')
		A_C_C = TestNode('f.com')
		A_C = TestNode('g.com', A_C_A, A_C_B, A_C_C)
		A = TestNode('h.com', A_A, A_B, A_C)
		B_A = TestNode('i.com')
		B_B_A = TestNode('j.com')
		B_B_C = TestNode('k.com')
		B_B = TestNode('l.com', node_a=B_B_A, node_c=B_B_C)
		B = TestNode('m.com', B_A, B_B)
		C = TestNode('n.com')
		head_node = TestNode('google.com', A, B, C)
		recommendations = []
		main.traverse_nodes(0, head_node, recommendations, {'arg.com': {}, 'e.com': {}, 'notinurls.com': {}})
		urls_that_should_be_returned = ['blah.com', 'b.com', 'd.com', 'f.com', 'j.com', 'k.com']
		for url in urls_that_should_be_returned:
			self.assertTrue(url in recommendations)