Last active
July 12, 2022 02:03
-
-
Save shreyasgokhale/b6702740a46efa579304cb9bfd0bbaff to your computer and use it in GitHub Desktop.
EuroTrip-Planner-Part-1
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# coding: utf-8 | |
# To begin, I created a simple search for flights going from Berlin (```BERL-sky```) to London (```LON-sky```) on 22 Jan using ```requests``` library in Python. (Note that ```header``` contains custom api key) | |
# In[1]: | |
import requests, json | |
headers = { | |
'x-rapidapi-host': "skyscanner-skyscanner-flight-search-v1.p.rapidapi.com", | |
'x-rapidapi-key': "ae922034c6mshbd47a2c270cbe96p127c54jsnfec4819a7799" | |
} | |
# In[2]: | |
origin = "BERL-sky" | |
destination = "LOND-sky" | |
currancy = "EUR" | |
originCountry = "DE" | |
locale = "en-US" | |
myurl = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/" + originCountry + "/" + currancy + "/" + locale + "/" + destination + "/" + origin + "/"+ "2020-01-22" | |
response = requests.request("GET", myurl, headers=headers) | |
print(response.text) | |
# Cool! We got our first result. But there is a lot to unpack to know what we did happend! First, let's have a look at what we requested. You see the last field in the ```myurl```? That is the date we inted to fly. Then we have origin and destination, which is specfied as the Skyscanner's location format. Now Berlin has 2 airports: Tegel and Schoenefeld, but in the API, we can just use ```BERL-sky``` which will give results from both! (Similarly London has 4!) | |
# | |
# ```originCountry``` is the place where we are doing search from. A little sidenote: changing this can drastically change the prices!! I found the German prices to be the cheapest. However, changing currency didn't affect the prices. | |
# Let's parse and pretty print the response JSON so we can clearly understand is the information that we are getting back. | |
# In[3]: | |
j =json.loads(response.text) | |
print(json.dumps(j, indent =2)) | |
# That is so much information!! First we have ```"Quotes"```. This gives us the cheapest quote for that day and combination. We have exact airlines, origin and destination mentioned in form of IDs, which have to be resloved by looking at the subsequent fields in ```Carriers``` and ```Places```. For example ```"Places":``` is the json list from where the flights are possible on that day. | |
# | |
# But did you notice that we do not receive any kind of time? Look at the following: | |
# | |
# > "DepartureDate": "2019-12-22T00:00:00" | |
# Yes! This is what they call _Browsing_ for the flights. Once we are more sure about the search, we have get more information by asking for a different request. That will give us more details and also the exact URL of where to book this selection. | |
# For now, let's can focus on collecting the data. The simplest way I could imagine is: | |
# - We select what airports are suitable for us for flying out and the target airports in the destination country. | |
# - The price of all the options is compared and best 3 are presented | |
# | |
# Let's continue and create 2 arrays for origin and destination. The best thing about the API is that you can have whole countries as a place! So we can use ```IT-sky``` for all the airports in Italy and ```ES-sky``` as all the airports in Spain and so on!!! But if we do that, we include _all_ the possible airports in the country. Which means for Spain, it selects Palma island, which we don't want to go (at least for now!). Let's just use all the airports that we mentioned earlier. | |
# | |
# In[4]: | |
# Airports where we can fly from: Berlin | |
source_array = {"BERL-sky"} | |
# Our destination airports: Madrid, Barcelona, Seville, Valencia | |
destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"} | |
### ( Note that technically these are Python **Sets** and not arrays.) | |
# And to make our life easier | |
rootURL = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/" | |
# Now we loop through all possible options in the array and request the results for each pair. We are still looking for one way result for 22 of January. | |
# | |
# But we can already start making our algorithm smarter. Every time, we get a list of airports which we then have to cross reference with the ```Places``` field. Let's create a simple python dictionary which will save all the airports and their IDs. As python dictionaries are hashmaps, we will get the result in ```O(1)``` average case complexity. To top it off, lets add print statements which print the exact airports and price instead of the whole response JSON. | |
# In[5]: | |
airports = { } | |
for destination in destination_array: | |
for source in source_array: | |
myurl = rootURL + originCountry + "/" + currancy + "/" + locale + "/" + source + "/" + destination + "/" + "2020-01-22" | |
response = requests.request("GET", myurl, headers=headers) | |
temp = json.loads(response.text) | |
# This checks if we have a quote or there were no flights | |
if("Quotes" in temp): | |
for Places in temp["Places"]: | |
# Add the airport in the dictionary. | |
airports[Places["PlaceId"]] = Places["Name"] | |
for Quotes in temp["Quotes"]: | |
print("************") | |
# print("%s --> to -->%s" %(origin,destination)) | |
ori = Quotes["OutboundLeg"]["OriginId"] | |
dest = Quotes["OutboundLeg"]["DestinationId"] | |
# Look for Airports in the dictionary | |
print("Journy: %s --> %s"%(airports[ori],airports[dest])) | |
print("Price: %s EUR" %Quotes["MinPrice"]) | |
# Interesting! We have 5 flight options and we already see that flying to Madrid will be the cheapest! Now, lets say we want to fly on some date in 18th Jan - 24 th Jan. We will have to add another ```for``` loop which loops through all possible dates. And to ignore expensive flights, lets add a ```maxbudget``` variable which sets our one way budget to 40 €. | |
# In[6]: | |
import time, datetime, dateutil | |
import pandas as pd | |
source_begin_date = "2020-01-18" | |
source_end_date = "2020-01-24" | |
daterange = pd.date_range(source_begin_date, source_end_date) | |
airports = { } | |
maxbudget = 40 | |
# I want to create a class so we can create a neat system. I know this would be an overkill! But bear with me, it might be useful later! | |
# In[7]: | |
class findingCheapestFlights: | |
def __init__(self, originCountry = "DE", currency = "EUR", locale = "en-US", rootURL="https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com"): | |
self.currency = currency | |
self.locale = locale | |
self.rootURL = rootURL | |
self.originCountry = originCountry | |
def setHeaders(self, headers): | |
self.headers = headers | |
def browseQuotes(self, source, destination, date): | |
quoteRequestPath = "/apiservices/browsequotes/v1.0/" | |
browseQuotesURL = self.rootURL + quoteRequestPath + self.originCountry + "/" + self.currency + "/" + self.locale + "/" + source + "/" + destination + "/" + date.strftime("%Y-%m-%d") | |
response = requests.request("GET", url = browseQuotesURL, headers = self.headers) | |
resultJSON = json.loads(response.text) | |
return resultJSON | |
# To analyze the performance of the code, I want see which parts the program is spending it's most time. | |
# In[11]: | |
import time | |
cheapest_flight_finder = findingCheapestFlights() | |
cheapest_flight_finder.setHeaders(headers) | |
total_compute_time = 0.0 | |
total_request_time = 0.0 | |
function_start = time.time() | |
for single_date in daterange: | |
for destination in destination_array: | |
for source in source_array: | |
request_start = time.time() | |
resultJSON = cheapest_flight_finder.browseQuotes(source, destination,single_date) | |
request_end = time.time() | |
if("Quotes" in resultJSON): | |
for Places in resultJSON["Places"]: | |
# Add the airport in the dictionary. | |
airports[Places["PlaceId"]] = Places["Name"] | |
for Quotes in resultJSON["Quotes"]: | |
if(Quotes["MinPrice"]<maxbudget): | |
print("************") | |
print(single_date.strftime("%d-%b %a")) | |
# print("%s --> to -->%s" %(origin,destination)) | |
source = Quotes["OutboundLeg"]["OriginId"] | |
dest = Quotes["OutboundLeg"]["DestinationId"] | |
# Look for Airports in the dictionary | |
print("Journy: %s --> %s"%(airports[source],airports[dest])) | |
print("Price: %s EUR" %Quotes["MinPrice"]) | |
calculation_end = time.time() | |
total_compute_time += calculation_end - request_end | |
total_request_time += request_end - request_start | |
print("\nBenchmark Stats :") | |
print("Time spent in computating: %f seconds"%total_compute_time ) | |
print("Time spent in requesting: %f seconds"%total_request_time ) | |
print("Time spent in program: %f seconds"%(time.time()-function_start)) | |
# In[ ]: | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Vivek!
Thank you for pointing that out! You are absolutely right! I have made appropriate changes to the notebook and the blog post!