Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
EuroTrip-Planner-Part-1
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#!/usr/bin/env python
# coding: utf-8
# To begin, I created a simple search for flights going from Berlin (```BERL-sky```) to London (```LON-sky```) on 22 Jan using ```requests``` library in Python. (Note that ```header``` contains custom api key)
# In[1]:
import requests, json
headers = {
'x-rapidapi-host': "skyscanner-skyscanner-flight-search-v1.p.rapidapi.com",
'x-rapidapi-key': "ae922034c6mshbd47a2c270cbe96p127c54jsnfec4819a7799"
}
# In[2]:
origin = "BERL-sky"
destination = "LOND-sky"
currancy = "EUR"
originCountry = "DE"
locale = "en-US"
myurl = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/" + originCountry + "/" + currancy + "/" + locale + "/" + destination + "/" + origin + "/"+ "2020-01-22"
response = requests.request("GET", myurl, headers=headers)
print(response.text)
# Cool! We got our first result. But there is a lot to unpack to know what we did happend! First, let's have a look at what we requested. You see the last field in the ```myurl```? That is the date we inted to fly. Then we have origin and destination, which is specfied as the Skyscanner's location format. Now Berlin has 2 airports: Tegel and Schoenefeld, but in the API, we can just use ```BERL-sky``` which will give results from both! (Similarly London has 4!)
#
# ```originCountry``` is the place where we are doing search from. A little sidenote: changing this can drastically change the prices!! I found the German prices to be the cheapest. However, changing currency didn't affect the prices.
# Let's parse and pretty print the response JSON so we can clearly understand is the information that we are getting back.
# In[3]:
j =json.loads(response.text)
print(json.dumps(j, indent =2))
# That is so much information!! First we have ```"Quotes"```. This gives us the cheapest quote for that day and combination. We have exact airlines, origin and destination mentioned in form of IDs, which have to be resloved by looking at the subsequent fields in ```Carriers``` and ```Places```. For example ```"Places":``` is the json list from where the flights are possible on that day.
#
# But did you notice that we do not receive any kind of time? Look at the following:
#
# > "DepartureDate": "2019-12-22T00:00:00"
# Yes! This is what they call _Browsing_ for the flights. Once we are more sure about the search, we have get more information by asking for a different request. That will give us more details and also the exact URL of where to book this selection.
# For now, let's can focus on collecting the data. The simplest way I could imagine is:
# - We select what airports are suitable for us for flying out and the target airports in the destination country.
# - The price of all the options is compared and best 3 are presented
#
# Let's continue and create 2 arrays for origin and destination. The best thing about the API is that you can have whole countries as a place! So we can use ```IT-sky``` for all the airports in Italy and ```ES-sky``` as all the airports in Spain and so on!!! But if we do that, we include _all_ the possible airports in the country. Which means for Spain, it selects Palma island, which we don't want to go (at least for now!). Let's just use all the airports that we mentioned earlier.
#
# In[4]:
# Airports where we can fly from: Berlin
source_array = {"BERL-sky"}
# Our destination airports: Madrid, Barcelona, Seville, Valencia
destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"}
### ( Note that technically these are Python **Sets** and not arrays.)
# And to make our life easier
rootURL = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/"
# Now we loop through all possible options in the array and request the results for each pair. We are still looking for one way result for 22 of January.
#
# But we can already start making our algorithm smarter. Every time, we get a list of airports which we then have to cross reference with the ```Places``` field. Let's create a simple python dictionary which will save all the airports and their IDs. As python dictionaries are hashmaps, we will get the result in ```O(1)``` average case complexity. To top it off, lets add print statements which print the exact airports and price instead of the whole response JSON.
# In[5]:
airports = { }
for destination in destination_array:
for source in source_array:
myurl = rootURL + originCountry + "/" + currancy + "/" + locale + "/" + source + "/" + destination + "/" + "2020-01-22"
response = requests.request("GET", myurl, headers=headers)
temp = json.loads(response.text)
# This checks if we have a quote or there were no flights
if("Quotes" in temp):
for Places in temp["Places"]:
# Add the airport in the dictionary.
airports[Places["PlaceId"]] = Places["Name"]
for Quotes in temp["Quotes"]:
print("************")
# print("%s --> to -->%s" %(origin,destination))
ori = Quotes["OutboundLeg"]["OriginId"]
dest = Quotes["OutboundLeg"]["DestinationId"]
# Look for Airports in the dictionary
print("Journy: %s --> %s"%(airports[ori],airports[dest]))
print("Price: %s EUR" %Quotes["MinPrice"])
# Interesting! We have 5 flight options and we already see that flying to Madrid will be the cheapest! Now, lets say we want to fly on some date in 18th Jan - 24 th Jan. We will have to add another ```for``` loop which loops through all possible dates. And to ignore expensive flights, lets add a ```maxbudget``` variable which sets our one way budget to 40 €.
# In[6]:
import time, datetime, dateutil
import pandas as pd
source_begin_date = "2020-01-18"
source_end_date = "2020-01-24"
daterange = pd.date_range(source_begin_date, source_end_date)
airports = { }
maxbudget = 40
# I want to create a class so we can create a neat system. I know this would be an overkill! But bear with me, it might be useful later!
# In[7]:
class findingCheapestFlights:
def __init__(self, originCountry = "DE", currency = "EUR", locale = "en-US", rootURL="https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com"):
self.currency = currency
self.locale = locale
self.rootURL = rootURL
self.originCountry = originCountry
def setHeaders(self, headers):
self.headers = headers
def browseQuotes(self, source, destination, date):
quoteRequestPath = "/apiservices/browsequotes/v1.0/"
browseQuotesURL = self.rootURL + quoteRequestPath + self.originCountry + "/" + self.currency + "/" + self.locale + "/" + source + "/" + destination + "/" + date.strftime("%Y-%m-%d")
response = requests.request("GET", url = browseQuotesURL, headers = self.headers)
resultJSON = json.loads(response.text)
return resultJSON
# To analyze the performance of the code, I want see which parts the program is spending it's most time.
# In[11]:
import time
cheapest_flight_finder = findingCheapestFlights()
cheapest_flight_finder.setHeaders(headers)
total_compute_time = 0.0
total_request_time = 0.0
function_start = time.time()
for single_date in daterange:
for destination in destination_array:
for source in source_array:
request_start = time.time()
resultJSON = cheapest_flight_finder.browseQuotes(source, destination,single_date)
request_end = time.time()
if("Quotes" in resultJSON):
for Places in resultJSON["Places"]:
# Add the airport in the dictionary.
airports[Places["PlaceId"]] = Places["Name"]
for Quotes in resultJSON["Quotes"]:
if(Quotes["MinPrice"]<maxbudget):
print("************")
print(single_date.strftime("%d-%b %a"))
# print("%s --> to -->%s" %(origin,destination))
source = Quotes["OutboundLeg"]["OriginId"]
dest = Quotes["OutboundLeg"]["DestinationId"]
# Look for Airports in the dictionary
print("Journy: %s --> %s"%(airports[source],airports[dest]))
print("Price: %s EUR" %Quotes["MinPrice"])
calculation_end = time.time()
total_compute_time += calculation_end - request_end
total_request_time += request_end - request_start
print("\nBenchmark Stats :")
print("Time spent in computating: %f seconds"%total_compute_time )
print("Time spent in requesting: %f seconds"%total_request_time )
print("Time spent in program: %f seconds"%(time.time()-function_start))
# In[ ]:
@vivekvashist

This comment has been minimized.

Copy link

@vivekvashist vivekvashist commented Jan 17, 2020

Great article/work Shreyas.

Couple of thing:

  1. destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"}

(Note that technically these are Python Dictionaries and not arrays.)
destination_array is a set

  1. As python dictionaries are hashmaps, we will get the result in O(n) complexity.

Dictionaries have O(1) and list have O(n) lookup.

@shreyasgokhale

This comment has been minimized.

Copy link
Owner Author

@shreyasgokhale shreyasgokhale commented Jan 17, 2020

Hi Vivek!

Thank you for pointing that out! You are absolutely right! I have made appropriate changes to the notebook and the blog post!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.