shreyasgokhale/Writeup.ipynb

## Writeup.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              Writeup.ipynb
            
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## Writeup.py
#!/usr/bin/env python
# coding: utf-8

# To begin, I created a simple search for flights going from Berlin (```BERL-sky```) to London (```LON-sky```) on 22 Jan using ```requests``` library in Python. (Note that ```header```  contains custom api key)

# In[1]:


import requests, json
headers = {
    'x-rapidapi-host': "skyscanner-skyscanner-flight-search-v1.p.rapidapi.com",
    'x-rapidapi-key': "ae922034c6mshbd47a2c270cbe96p127c54jsnfec4819a7799"
    }


# In[2]:


origin = "BERL-sky"
destination = "LOND-sky"
currancy = "EUR"
originCountry = "DE"
locale = "en-US"

myurl = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/" + originCountry + "/" + currancy + "/" + locale + "/"  + destination + "/" + origin + "/"+ "2020-01-22"
response = requests.request("GET", myurl, headers=headers)
print(response.text)


# Cool! We got our first result. But there is a lot to unpack to know what we did happend! First, let's have a look at what we requested. You see the last field in the ```myurl```? That is the date we inted to fly. Then we have origin and destination, which is specfied as the Skyscanner's location format. Now Berlin has 2 airports: Tegel and Schoenefeld, but in the API, we can just use ```BERL-sky``` which will give results from both! (Similarly London has 4!)
#
# ```originCountry``` is the place where we are doing search from. A little sidenote: changing this can drastically change the prices!! I found the German prices to be the cheapest. However, changing currency didn't affect the prices.

# Let's parse and pretty print the response JSON so we can clearly understand is the information that we are getting back.

# In[3]:


j =json.loads(response.text)
print(json.dumps(j, indent =2))


# That is so much information!! First we have ```"Quotes"```. This gives us the cheapest quote for that day and combination. We have exact airlines, origin and destination mentioned in form of IDs, which have to be resloved by looking at the subsequent fields in ```Carriers``` and ```Places```. For example ```"Places":``` is the json list from where the flights are possible on that day.
#
# But did you notice that we do not receive any kind of time? Look at the following:
#
# >        "DepartureDate": "2019-12-22T00:00:00"

# Yes! This is what they call _Browsing_ for the flights. Once we are more sure about the search, we have get more information by asking for a different request. That will give us more details and also the exact URL of where to book this selection.

# For now, let's can focus on collecting the data. The simplest way I could imagine is:
# - We select what airports are suitable for us for flying out and the target airports in the destination country.
# - The price of all the options is compared and best 3 are presented
#
#     Let's continue and create 2 arrays for origin and destination. The best thing about the API is that you can have whole countries as a place! So we can use ```IT-sky``` for all the airports in Italy and ```ES-sky``` as all the airports in Spain and so on!!! But if we do that, we include _all_ the possible airports in the country. Which means for Spain, it selects Palma island, which we don't want to go (at least for now!). Let's just use all the airports that we mentioned earlier.
#

# In[4]:


# Airports where we can fly from: Berlin
source_array = {"BERL-sky"}

# Our destination airports: Madrid, Barcelona, Seville, Valencia
destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"}

### ( Note that technically these are Python **Sets** and not arrays.)

# And to make our life easier
rootURL = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/"


# Now we loop through all possible options in the array and request the results for each pair. We are still looking for one way result for 22 of January.
#
# But we can already start making our algorithm smarter. Every time, we get a list of airports which we then have to cross reference with the ```Places``` field. Let's create a simple python dictionary which will save all the airports and their IDs. As python dictionaries are hashmaps, we will get the result in ```O(1)``` average case complexity. To top it off, lets add print statements which print the exact airports and price instead of the whole response JSON.

# In[5]:


airports = { }
for destination in destination_array:
    for source in source_array:
        myurl = rootURL + originCountry + "/" + currancy + "/" + locale + "/" + source + "/"  + destination + "/" + "2020-01-22"
        response = requests.request("GET", myurl, headers=headers)
        temp = json.loads(response.text)

        # This checks if we have a quote or there were no flights
        if("Quotes" in temp):
            for Places in temp["Places"]:
                # Add the airport in the dictionary.
                airports[Places["PlaceId"]] = Places["Name"]
            for Quotes in temp["Quotes"]:
                print("************")
                # print("%s --> to  -->%s" %(origin,destination))
                ori = Quotes["OutboundLeg"]["OriginId"]
                dest = Quotes["OutboundLeg"]["DestinationId"]
                # Look for Airports in the dictionary
                print("Journy:  %s  --> %s"%(airports[ori],airports[dest]))
                print("Price: %s EUR" %Quotes["MinPrice"])


# Interesting! We have 5 flight options and we already see that flying to Madrid will be the cheapest! Now, lets say we want to fly on some date in 18th Jan - 24 th Jan. We will have to add another ```for``` loop  which loops through all possible dates. And to ignore expensive flights, lets add a ```maxbudget``` variable which sets our one way budget to 40 €.

# In[6]:


import time, datetime, dateutil
import pandas as pd

source_begin_date = "2020-01-18"
source_end_date =  "2020-01-24"
daterange = pd.date_range(source_begin_date, source_end_date)
airports = { }
maxbudget = 40


# I want to create a class so we can create a neat system. I know this would be an overkill! But bear with me, it might be useful later!

# In[7]:


class findingCheapestFlights:

    def __init__(self, originCountry = "DE", currency = "EUR", locale = "en-US", rootURL="https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com"):
        self.currency = currency
        self.locale =  locale
        self.rootURL = rootURL
        self.originCountry = originCountry

    def setHeaders(self, headers):
        self.headers =  headers

    def browseQuotes(self, source, destination, date):
        quoteRequestPath = "/apiservices/browsequotes/v1.0/"
        browseQuotesURL = self.rootURL + quoteRequestPath + self.originCountry + "/" + self.currency + "/" + self.locale + "/" + source + "/" + destination + "/" + date.strftime("%Y-%m-%d")
        response = requests.request("GET", url = browseQuotesURL, headers = self.headers)
        resultJSON = json.loads(response.text)
        return resultJSON


# To analyze the performance of the code, I want see which parts the program is spending it's most time.

# In[11]:


import time
cheapest_flight_finder = findingCheapestFlights()
cheapest_flight_finder.setHeaders(headers)

total_compute_time = 0.0
total_request_time = 0.0

function_start = time.time()
for single_date in daterange:
    for destination in destination_array:
        for source in source_array:
            request_start = time.time()
            resultJSON = cheapest_flight_finder.browseQuotes(source, destination,single_date)
            request_end = time.time()
            if("Quotes" in resultJSON):
                for Places in resultJSON["Places"]:
                    # Add the airport in the dictionary.
                    airports[Places["PlaceId"]] = Places["Name"]
                for Quotes in resultJSON["Quotes"]:
                    if(Quotes["MinPrice"]<maxbudget):
                        print("************")
                        print(single_date.strftime("%d-%b %a"))
                        # print("%s --> to  -->%s" %(origin,destination))
                        source = Quotes["OutboundLeg"]["OriginId"]
                        dest = Quotes["OutboundLeg"]["DestinationId"]
                        # Look for Airports in the dictionary
                        print("Journy:  %s  --> %s"%(airports[source],airports[dest]))
                        print("Price: %s EUR" %Quotes["MinPrice"])
            calculation_end = time.time()
            total_compute_time += calculation_end - request_end
            total_request_time += request_end - request_start


print("\nBenchmark Stats :")
print("Time spent in computating: %f seconds"%total_compute_time )
print("Time spent in requesting: %f seconds"%total_request_time )
print("Time spent in program: %f seconds"%(time.time()-function_start))


# In[ ]:
	#!/usr/bin/env python
	# coding: utf-8

	# To begin, I created a simple search for flights going from Berlin (```BERL-sky```) to London (```LON-sky```) on 22 Jan using ```requests``` library in Python. (Note that ```header``` contains custom api key)

	# In[1]:


	import requests, json
	headers = {
	'x-rapidapi-host': "skyscanner-skyscanner-flight-search-v1.p.rapidapi.com",
	'x-rapidapi-key': "ae922034c6mshbd47a2c270cbe96p127c54jsnfec4819a7799"
	}


	# In[2]:


	origin = "BERL-sky"
	destination = "LOND-sky"
	currancy = "EUR"
	originCountry = "DE"
	locale = "en-US"

	myurl = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/" + originCountry + "/" + currancy + "/" + locale + "/" + destination + "/" + origin + "/"+ "2020-01-22"
	response = requests.request("GET", myurl, headers=headers)
	print(response.text)


	# Cool! We got our first result. But there is a lot to unpack to know what we did happend! First, let's have a look at what we requested. You see the last field in the ```myurl```? That is the date we inted to fly. Then we have origin and destination, which is specfied as the Skyscanner's location format. Now Berlin has 2 airports: Tegel and Schoenefeld, but in the API, we can just use ```BERL-sky``` which will give results from both! (Similarly London has 4!)
	#
	# ```originCountry``` is the place where we are doing search from. A little sidenote: changing this can drastically change the prices!! I found the German prices to be the cheapest. However, changing currency didn't affect the prices.

	# Let's parse and pretty print the response JSON so we can clearly understand is the information that we are getting back.

	# In[3]:


	j =json.loads(response.text)
	print(json.dumps(j, indent =2))


	# That is so much information!! First we have ```"Quotes"```. This gives us the cheapest quote for that day and combination. We have exact airlines, origin and destination mentioned in form of IDs, which have to be resloved by looking at the subsequent fields in ```Carriers``` and ```Places```. For example ```"Places":``` is the json list from where the flights are possible on that day.
	#
	# But did you notice that we do not receive any kind of time? Look at the following:
	#
	# > "DepartureDate": "2019-12-22T00:00:00"

	# Yes! This is what they call _Browsing_ for the flights. Once we are more sure about the search, we have get more information by asking for a different request. That will give us more details and also the exact URL of where to book this selection.

	# For now, let's can focus on collecting the data. The simplest way I could imagine is:
	# - We select what airports are suitable for us for flying out and the target airports in the destination country.
	# - The price of all the options is compared and best 3 are presented
	#
	# Let's continue and create 2 arrays for origin and destination. The best thing about the API is that you can have whole countries as a place! So we can use ```IT-sky``` for all the airports in Italy and ```ES-sky``` as all the airports in Spain and so on!!! But if we do that, we include _all_ the possible airports in the country. Which means for Spain, it selects Palma island, which we don't want to go (at least for now!). Let's just use all the airports that we mentioned earlier.
	#

	# In[4]:


	# Airports where we can fly from: Berlin
	source_array = {"BERL-sky"}

	# Our destination airports: Madrid, Barcelona, Seville, Valencia
	destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"}

	### ( Note that technically these are Python Sets and not arrays.)

	# And to make our life easier
	rootURL = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/"


	# Now we loop through all possible options in the array and request the results for each pair. We are still looking for one way result for 22 of January.
	#
	# But we can already start making our algorithm smarter. Every time, we get a list of airports which we then have to cross reference with the ```Places``` field. Let's create a simple python dictionary which will save all the airports and their IDs. As python dictionaries are hashmaps, we will get the result in ```O(1)``` average case complexity. To top it off, lets add print statements which print the exact airports and price instead of the whole response JSON.

	# In[5]:


	airports = { }
	for destination in destination_array:
	for source in source_array:
	myurl = rootURL + originCountry + "/" + currancy + "/" + locale + "/" + source + "/" + destination + "/" + "2020-01-22"
	response = requests.request("GET", myurl, headers=headers)
	temp = json.loads(response.text)

	# This checks if we have a quote or there were no flights
	if("Quotes" in temp):
	for Places in temp["Places"]:
	# Add the airport in the dictionary.
	airports[Places["PlaceId"]] = Places["Name"]
	for Quotes in temp["Quotes"]:
	print("************")
	# print("%s --> to -->%s" %(origin,destination))
	ori = Quotes["OutboundLeg"]["OriginId"]
	dest = Quotes["OutboundLeg"]["DestinationId"]
	# Look for Airports in the dictionary
	print("Journy: %s --> %s"%(airports[ori],airports[dest]))
	print("Price: %s EUR" %Quotes["MinPrice"])


	# Interesting! We have 5 flight options and we already see that flying to Madrid will be the cheapest! Now, lets say we want to fly on some date in 18th Jan - 24 th Jan. We will have to add another ```for``` loop which loops through all possible dates. And to ignore expensive flights, lets add a ```maxbudget``` variable which sets our one way budget to 40 €.

	# In[6]:


	import time, datetime, dateutil
	import pandas as pd

	source_begin_date = "2020-01-18"
	source_end_date = "2020-01-24"
	daterange = pd.date_range(source_begin_date, source_end_date)
	airports = { }
	maxbudget = 40


	# I want to create a class so we can create a neat system. I know this would be an overkill! But bear with me, it might be useful later!

	# In[7]:


	class findingCheapestFlights:

	def __init__(self, originCountry = "DE", currency = "EUR", locale = "en-US", rootURL="https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com"):
	self.currency = currency
	self.locale = locale
	self.rootURL = rootURL
	self.originCountry = originCountry

	def setHeaders(self, headers):
	self.headers = headers

	def browseQuotes(self, source, destination, date):
	quoteRequestPath = "/apiservices/browsequotes/v1.0/"
	browseQuotesURL = self.rootURL + quoteRequestPath + self.originCountry + "/" + self.currency + "/" + self.locale + "/" + source + "/" + destination + "/" + date.strftime("%Y-%m-%d")
	response = requests.request("GET", url = browseQuotesURL, headers = self.headers)
	resultJSON = json.loads(response.text)
	return resultJSON


	# To analyze the performance of the code, I want see which parts the program is spending it's most time.

	# In[11]:


	import time
	cheapest_flight_finder = findingCheapestFlights()
	cheapest_flight_finder.setHeaders(headers)

	total_compute_time = 0.0
	total_request_time = 0.0

	function_start = time.time()
	for single_date in daterange:
	for destination in destination_array:
	for source in source_array:
	request_start = time.time()
	resultJSON = cheapest_flight_finder.browseQuotes(source, destination,single_date)
	request_end = time.time()
	if("Quotes" in resultJSON):
	for Places in resultJSON["Places"]:
	# Add the airport in the dictionary.
	airports[Places["PlaceId"]] = Places["Name"]
	for Quotes in resultJSON["Quotes"]:
	if(Quotes["MinPrice"]<maxbudget):
	print("************")
	print(single_date.strftime("%d-%b %a"))
	# print("%s --> to -->%s" %(origin,destination))
	source = Quotes["OutboundLeg"]["OriginId"]
	dest = Quotes["OutboundLeg"]["DestinationId"]
	# Look for Airports in the dictionary
	print("Journy: %s --> %s"%(airports[source],airports[dest]))
	print("Price: %s EUR" %Quotes["MinPrice"])
	calculation_end = time.time()
	total_compute_time += calculation_end - request_end
	total_request_time += request_end - request_start


	print("\nBenchmark Stats :")
	print("Time spent in computating: %f seconds"%total_compute_time )
	print("Time spent in requesting: %f seconds"%total_request_time )
	print("Time spent in program: %f seconds"%(time.time()-function_start))


	# In[ ]: