Skip to content

Instantly share code, notes, and snippets.

@renatocassino
Created February 4, 2020 17:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save renatocassino/fd6f41838336925520ee82857157eb99 to your computer and use it in GitHub Desktop.
Save renatocassino/fd6f41838336925520ee82857157eb99 to your computer and use it in GitHub Desktop.
Crawl Airbnb example
import axios from 'axios';
import cheerio from 'cheerio';
import logger from './lib/logger';
const BASE_URL = 'https://www.airbnb.com';
const URL = `${BASE_URL}/s/Miami--FL--United-States/homes`;
const CSS_QUERY = '._fhph4u ._8ssblpx';
const CSS_QUERY_PAGINATOR = 'nav[data-id="SearchResultsPagination"] > ul > li a[aria-label="Next"]';
const content = {
data: [],
};
const getPlaceInfo = (element) => {
const $ = cheerio.load(element)
const title = $('meta[itemprop="name"]').attr('content');
const image = $('img').attr('src');
return { title, image };
}
const crawlPage = async (url) => {
const request = await axios.get(url);
const { data } = request;
const $ = cheerio.load(data);
const paginator = $(CSS_QUERY_PAGINATOR);
const elements = $(CSS_QUERY);
return {
paginator,
elements,
}
}
const main = async () => {
let url = URL;
while(true) {
logger.info(`Reading page ${url} .`)
const { paginator, elements } = await crawlPage(url);
for(let i = 0; i < elements.length; i++) {
content.data.push(getPlaceInfo(elements[i]));
}
const next = paginator.attr('href');
if (!next) break;
url = `${BASE_URL}${next}`;
}
logger.info(JSON.stringify(content, null, 2));
};
main();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment