Usage: python parser.py --input /your/input/directory --output /your/output/directory --sleep 5 --step 500
Output example:
---
layout: 'single-product'
categories: 'SaksFifthAvenue-UK Kids Toys-and-Books'
merchantName: 'Saks Fifth Avenue - UK'
manufacturer_name: 'Janod'
sku_number: '0405148128850'
product_id: '110013244684007288214612306030'
name: 'Barbecue Trolley'
primary: 'Kids'
secondary: 'Toys and Books'
product: 'http://click.linksynergy.com/link?id=v3EaLjWOvJQ&offerid=268285.110013244684007288214612306030&type=15&murl=http%3A%2F%2Fwww.saksfifthavenue.com%2Fmain%2FProductDetail.jsp%3FFOLDER%253C%253Efolder_id%3D2534374306439561%26PRODUCT%253C%253Eprd_id%3D845524446623895'
productImage: 'http://image.s5a.com/is/image/saks/0405148128850_396x528.jpg'
short: 'Your budding chef will cook up a storm on this rolling barbecue trolley, complete with one magnetic spatula, one magnetic barbecue fork, one piece of pork, two sausages, one fish, three tomatoes and one piece of beef.;Wheeled bottom;12.8" X 12.8" X 17.3";Recommended for ages 18 months and up;Assembly required;Wood;Wipe clean;Imported'
long: 'Your budding chef will cook up a storm on this rolling barbecue trolley, complete with one magnetic spatula, one magnetic barbecue fork, one piece of pork, two sausages, one fish, three tomatoes and one piece of beef.;Wheeled bottom;12.8" X 12.8" X 17.3";Recommended for ages 18 months and up;Assembly required;Wood;Wipe clean;Imported'
currency: 'GBP'
type: 'amount'
sale: '65.91'
retail: '65.91'
brand: 'Janod'
information: '5 - 14 business days'
availability: 'in stock'
keywords: 'Janod'
pixel: 'http://ad.linksynergy.com/fs-bin/show?id=v3EaLjWOvJQ&bids=268285.110013244684007288214612306030&type=15&subid=0'
class_id: '60'
Misc: 'No'
Age: 'Adult'
---
I mispelled
manufacturerName
. Fixed it to checkmanufacturer_name
attribute.I added a check to skip duplicate titles based on filename.
You just need the default python in Ubuntu 14.04 (Python 2.7). You don't need anything extra.