Please refer to Instacart-scraper and myWebGrocerScraper for individual readmes
Walmart is still under development (Ideally completed before Thursday 22nd December)
data-preperation
{ | |
"name":"Nature's Promise Organics Beef Tenderloin Filet Grass-fed Fresh", | |
"sozoId":"peapod210269", | |
"source":"instacart", | |
"itemIdInSource":"210269", | |
"sku":"210269", | |
"upc":"68826715092", | |
"esin":null, | |
"brandId":128, | |
"brandName":"Nature's Promise", |
{"categoryId": 2839, | |
"id": 3756711, | |
"isInStock": true, | |
"price": 8.69, | |
"priceQuantity": 0, | |
"priceUnitOfMeasure": "each", | |
"storeId": 1, | |
"zoneId": 2} |
{"aisle": "1255027787961", | |
"department": "1255027787021", | |
"id": "1255027787961", | |
"name": "Formula & Baby Food", | |
"parent_id": "1255027787021", | |
"zipcode": "782491735"} |
{"address1": "657 Phoenix Dr", | |
"address2": "", | |
"id": 2529, | |
"lat": "36.822284", | |
"long": "-76.068541", | |
"name": "Walmart Virginia Beach Store #1688", | |
"zipcode": "234527318"} |
The complete scraping, "middleware" and "post-processing" is done via calling the master.sh bash script examining this file, along with the three python files that are called from within it, will give you an underdtanding of how this spider works.
I have chosen to create seperate external files for zones, stores, warehouses, categories, and products, rather than keeping the data in memory. But we could amend the scrapers to utilize an item system.
All scrapers within this folder are for scraping MyWebGrocer powered platforms.
mwg sites reside on the mywebgrocer domain, where as Curbside Express, Harris Teeter and Shoprite, have their own domains. The three external sites have a JSON API layer, so are much quicker to scrape than the Xpath sraping of the mywebgrocer sites. There are similarities in the structure of the sites, but not enough to allow one spider built for multi domain.
The complete scraping, "middleware" and "post-processing" is done via calling the python files in the root proceeded with "run_"
/* This Fails tests */ | |
public static boolean isSetOf1toN(int[][] t){ | |
int l = 0; | |
for(int outer = 0; outer < t.length; outer++){ | |
l += t[outer].length; | |
} | |
int[] oneDimensional = new int[l]; | |
int i = 0; | |
for(int outer = 0; outer < t.length; outer++){ |
Based off the method described at https://developer.genability.com/how-to/bill-to-kwh/#annual-bill-solve-request We want users to input their annual $ and we determine the kWh that this equates to. Conversely we want users to input their yearly kwH usage and we determine the $ this equates to.
These both work great if performed during our initial account creation (Prior to a billing usage profile getting assigned for the account), but the kWh -> $ calculation bugs out for subsequent requests after this.
Our user journey requires the creation of an account with either their yearly kWh or bill $ and us working out the other value. From this given result, we create an average usage billing profile for this account based on their inputted kWh amount, or the calculated kWh amount if they entered a bill instead and store the other values on our own CRM.
This billing profile is required for us to run scenarios against their previous usage, and the future solar installation.
{ | |
"status": "success", | |
"count": 13, | |
"type": "TariffRate", | |
"results": [ | |
{ | |
"tariffRateId": 18274059, | |
"tariffId": 3358751, | |
"rateGroupName": "Consumption", | |
"rateName": "Energy Surcharge", |