Skip to content

Instantly share code, notes, and snippets.

@adrianhajdin
Created September 30, 2023 10:15
Show Gist options
  • Save adrianhajdin/686326bc20e24810128637a9053c49a0 to your computer and use it in GitHub Desktop.
Save adrianhajdin/686326bc20e24810128637a9053c49a0 to your computer and use it in GitHub Desktop.
Web Scraping Full Course 2023 | Build and Deploy eCommerce Price Tracker
import { NextResponse } from "next/server";
import { getLowestPrice, getHighestPrice, getAveragePrice, getEmailNotifType } from "@/lib/utils";
import { connectToDB } from "@/lib/mongoose";
import Product from "@/lib/models/product.model";
import { scrapeAmazonProduct } from "@/lib/scraper";
import { generateEmailBody, sendEmail } from "@/lib/nodemailer";
export const maxDuration = 300; // This function can run for a maximum of 300 seconds
export const dynamic = "force-dynamic";
export const revalidate = 0;
export async function GET(request: Request) {
try {
connectToDB();
const products = await Product.find({});
if (!products) throw new Error("No product fetched");
// ======================== 1 SCRAPE LATEST PRODUCT DETAILS & UPDATE DB
const updatedProducts = await Promise.all(
products.map(async (currentProduct) => {
// Scrape product
const scrapedProduct = await scrapeAmazonProduct(currentProduct.url);
if (!scrapedProduct) return;
const updatedPriceHistory = [
...currentProduct.priceHistory,
{
price: scrapedProduct.currentPrice,
},
];
const product = {
...scrapedProduct,
priceHistory: updatedPriceHistory,
lowestPrice: getLowestPrice(updatedPriceHistory),
highestPrice: getHighestPrice(updatedPriceHistory),
averagePrice: getAveragePrice(updatedPriceHistory),
};
// Update Products in DB
const updatedProduct = await Product.findOneAndUpdate(
{
url: product.url,
},
product
);
// ======================== 2 CHECK EACH PRODUCT'S STATUS & SEND EMAIL ACCORDINGLY
const emailNotifType = getEmailNotifType(
scrapedProduct,
currentProduct
);
if (emailNotifType && updatedProduct.users.length > 0) {
const productInfo = {
title: updatedProduct.title,
url: updatedProduct.url,
};
// Construct emailContent
const emailContent = await generateEmailBody(productInfo, emailNotifType);
// Get array of user emails
const userEmails = updatedProduct.users.map((user: any) => user.email);
// Send email notification
await sendEmail(emailContent, userEmails);
}
return updatedProduct;
})
);
return NextResponse.json({
message: "Ok",
data: updatedProducts,
});
} catch (error: any) {
throw new Error(`Failed to get all products: ${error.message}`);
}
}
export async function generateEmailBody(
product: EmailProductInfo,
type: NotificationType
) {
const THRESHOLD_PERCENTAGE = 40;
// Shorten the product title
const shortenedTitle =
product.title.length > 20
? `${product.title.substring(0, 20)}...`
: product.title;
let subject = "";
let body = "";
switch (type) {
case Notification.WELCOME:
subject = `Welcome to Price Tracking for ${shortenedTitle}`;
body = `
<div>
<h2>Welcome to PriceWise 🚀</h2>
<p>You are now tracking ${product.title}.</p>
<p>Here's an example of how you'll receive updates:</p>
<div style="border: 1px solid #ccc; padding: 10px; background-color: #f8f8f8;">
<h3>${product.title} is back in stock!</h3>
<p>We're excited to let you know that ${product.title} is now back in stock.</p>
<p>Don't miss out - <a href="${product.url}" target="_blank" rel="noopener noreferrer">buy it now</a>!</p>
<img src="https://i.ibb.co/pwFBRMC/Screenshot-2023-09-26-at-1-47-50-AM.png" alt="Product Image" style="max-width: 100%;" />
</div>
<p>Stay tuned for more updates on ${product.title} and other products you're tracking.</p>
</div>
`;
break;
case Notification.CHANGE_OF_STOCK:
subject = `${shortenedTitle} is now back in stock!`;
body = `
<div>
<h4>Hey, ${product.title} is now restocked! Grab yours before they run out again!</h4>
<p>See the product <a href="${product.url}" target="_blank" rel="noopener noreferrer">here</a>.</p>
</div>
`;
break;
case Notification.LOWEST_PRICE:
subject = `Lowest Price Alert for ${shortenedTitle}`;
body = `
<div>
<h4>Hey, ${product.title} has reached its lowest price ever!!</h4>
<p>Grab the product <a href="${product.url}" target="_blank" rel="noopener noreferrer">here</a> now.</p>
</div>
`;
break;
case Notification.THRESHOLD_MET:
subject = `Discount Alert for ${shortenedTitle}`;
body = `
<div>
<h4>Hey, ${product.title} is now available at a discount more than ${THRESHOLD_PERCENTAGE}%!</h4>
<p>Grab it right away from <a href="${product.url}" target="_blank" rel="noopener noreferrer">here</a>.</p>
</div>
`;
break;
default:
throw new Error("Invalid notification type.");
}
return { subject, body };
}
@tailwind base;
@tailwind components;
@tailwind utilities;
* {
margin: 0;
padding: 0;
box-sizing: border-box;
scroll-behavior: smooth;
}
@layer base {
body {
@apply font-inter;
}
}
@layer utilities {
.btn {
@apply py-4 px-4 bg-secondary hover:bg-opacity-70 rounded-[30px] text-white text-lg font-semibold;
}
.head-text {
@apply mt-4 text-6xl leading-[72px] font-bold tracking-[-1.2px] text-gray-900;
}
.section-text {
@apply text-secondary text-[32px] font-semibold;
}
.small-text {
@apply flex gap-2 text-sm font-medium text-primary;
}
.paragraph-text {
@apply text-xl leading-[30px] text-gray-600;
}
.hero-carousel {
@apply relative sm:px-10 py-5 sm:pt-20 pb-5 max-w-[560px] h-[700px] w-full bg-[#F2F4F7] rounded-[30px] sm:mx-auto;
}
.carousel {
@apply flex flex-col-reverse h-[700px];
}
.carousel .control-dots {
@apply static !important;
}
.carousel .control-dots .dot {
@apply w-[10px] h-[10px] bg-[#D9D9D9] rounded-full bottom-0 !important;
}
.carousel .control-dots .dot.selected {
@apply bg-[#475467] !important;
}
.trending-section {
@apply flex flex-col gap-10 px-6 md:px-20 py-24;
}
/* PRODUCT DETAILS PAGE STYLES */
.product-container {
@apply flex flex-col gap-16 flex-wrap px-6 md:px-20 py-24;
}
.product-image {
@apply flex-grow xl:max-w-[50%] max-w-full py-16 border border-[#CDDBFF] rounded-[17px];
}
.product-info {
@apply flex items-center flex-wrap gap-10 py-6 border-y border-y-[#E4E4E4];
}
.product-hearts {
@apply flex items-center gap-2 px-3 py-2 bg-[#FFF0F0] rounded-10;
}
.product-stars {
@apply flex items-center gap-2 px-3 py-2 bg-[#FBF3EA] rounded-[27px];
}
.product-reviews {
@apply flex items-center gap-2 px-3 py-2 bg-white-200 rounded-[27px];
}
/* MODAL */
.dialog-container {
@apply fixed inset-0 z-10 overflow-y-auto bg-black bg-opacity-60;
}
.dialog-content {
@apply p-6 bg-white inline-block w-full max-w-md my-8 overflow-hidden text-left align-middle transition-all transform shadow-xl rounded-2xl;
}
.dialog-head_text {
@apply text-secondary text-lg leading-[24px] font-semibold mt-4;
}
.dialog-input_container {
@apply px-5 py-3 mt-3 flex items-center gap-2 border border-gray-300 rounded-[27px];
}
.dialog-input {
@apply flex-1 pl-1 border-none text-gray-500 text-base focus:outline-none border border-gray-300 rounded-[27px] shadow-xs;
}
.dialog-btn {
@apply px-5 py-3 text-white text-base font-semibold border border-secondary bg-secondary rounded-lg mt-8;
}
/* NAVBAR */
.nav {
@apply flex justify-between items-center px-6 md:px-20 py-4;
}
.nav-logo {
@apply font-spaceGrotesk text-[21px] text-secondary font-bold;
}
/* PRICE INFO */
.price-info_card {
@apply flex-1 min-w-[200px] flex flex-col gap-2 border-l-[3px] rounded-10 bg-white-100 px-5 py-4;
}
/* PRODUCT CARD */
.product-card {
@apply sm:w-[292px] sm:max-w-[292px] w-full flex-1 flex flex-col gap-4 rounded-md;
}
.product-card_img-container {
@apply flex-1 relative flex flex-col gap-5 p-4 rounded-md;
}
.product-card_img {
@apply max-h-[250px] object-contain w-full h-full bg-transparent;
}
.product-title {
@apply text-secondary text-xl leading-6 font-semibold truncate;
}
/* SEARCHBAR INPUT */
.searchbar-input {
@apply flex-1 min-w-[200px] w-full p-3 border border-gray-300 rounded-lg shadow-xs text-base text-gray-500 focus:outline-none;
}
.searchbar-btn {
@apply bg-gray-900 border border-gray-900 rounded-lg shadow-xs px-5 py-3 text-white text-base font-semibold hover:opacity-90 disabled:pointer-events-none disabled:cursor-not-allowed disabled:opacity-40;
}
}
"use server"
import axios from 'axios';
import * as cheerio from 'cheerio';
import { extractCurrency, extractDescription, extractPrice } from '../utils';
export async function scrapeAmazonProduct(url: string) {
if(!url) return;
// BrightData proxy configuration
const username = String(process.env.BRIGHT_DATA_USERNAME);
const password = String(process.env.BRIGHT_DATA_PASSWORD);
const port = 22225;
const session_id = (1000000 * Math.random()) | 0;
const options = {
auth: {
username: `${username}-session-${session_id}`,
password,
},
host: 'brd.superproxy.io',
port,
rejectUnauthorized: false,
}
try {
// Fetch the product page
const response = await axios.get(url, options);
const $ = cheerio.load(response.data);
// Extract the product title
const title = $('#productTitle').text().trim();
const currentPrice = extractPrice(
$('.priceToPay span.a-price-whole'),
$('.a.size.base.a-color-price'),
$('.a-button-selected .a-color-base'),
);
const originalPrice = extractPrice(
$('#priceblock_ourprice'),
$('.a-price.a-text-price span.a-offscreen'),
$('#listPrice'),
$('#priceblock_dealprice'),
$('.a-size-base.a-color-price')
);
const outOfStock = $('#availability span').text().trim().toLowerCase() === 'currently unavailable';
const images =
$('#imgBlkFront').attr('data-a-dynamic-image') ||
$('#landingImage').attr('data-a-dynamic-image') ||
'{}'
const imageUrls = Object.keys(JSON.parse(images));
const currency = extractCurrency($('.a-price-symbol'))
const discountRate = $('.savingsPercentage').text().replace(/[-%]/g, "");
const description = extractDescription($)
// Construct data object with scraped information
const data = {
url,
currency: currency || '$',
image: imageUrls[0],
title,
currentPrice: Number(currentPrice) || Number(originalPrice),
originalPrice: Number(originalPrice) || Number(currentPrice),
priceHistory: [],
discountRate: Number(discountRate),
category: 'category',
reviewsCount:100,
stars: 4.5,
isOutOfStock: outOfStock,
description,
lowestPrice: Number(currentPrice) || Number(originalPrice),
highestPrice: Number(originalPrice) || Number(currentPrice),
averagePrice: Number(currentPrice) || Number(originalPrice),
}
return data;
} catch (error: any) {
console.log(error);
}
}
/** @type {import('next').NextConfig} */
const nextConfig = {
experimental: {
serverActions: true,
serverComponentsExternalPackages: ['mongoose']
},
images: {
domains: ['m.media-amazon.com']
}
}
module.exports = nextConfig
https://drive.google.com/file/d/1v6h993BgYX6axBoIXFbZ9HQAgqbR4PSH/view?usp=sharing
/** @type {import('tailwindcss').Config} */
module.exports = {
content: [
"./pages/**/*.{js,ts,jsx,tsx,mdx}",
"./components/**/*.{js,ts,jsx,tsx,mdx}",
"./app/**/*.{js,ts,jsx,tsx,mdx}",
],
theme: {
extend: {
colors: {
primary: {
DEFAULT: "#E43030",
"orange": "#D48D3B",
"green": "#3E9242"
},
secondary: "#282828",
"gray-200": "#EAECF0",
"gray-300": "D0D5DD",
"gray-500": "#667085",
"gray-600": "#475467",
"gray-700": "#344054",
"gray-900": "#101828",
"white-100": "#F4F4F4",
"white-200": "#EDF0F8",
"black-100": "#3D4258",
"neutral-black": "#23263B",
},
boxShadow: {
xs: "0px 1px 2px 0px rgba(16, 24, 40, 0.05)",
},
maxWidth: {
"10xl": '1440px'
},
fontFamily: {
inter: ['Inter', 'sans-serif'],
spaceGrotesk: ['Space Grotesk', 'sans-serif'],
},
borderRadius: {
10: "10px"
}
},
},
plugins: [],
};
export type PriceHistoryItem = {
price: number;
};
export type User = {
email: string;
};
export type Product = {
_id?: string;
url: string;
currency: string;
image: string;
title: string;
currentPrice: number;
originalPrice: number;
priceHistory: PriceHistoryItem[] | [];
highestPrice: number;
lowestPrice: number;
averagePrice: number;
discountRate: number;
description: string;
category: string;
reviewsCount: number;
stars: number;
isOutOfStock: Boolean;
users?: User[];
};
export type NotificationType =
| "WELCOME"
| "CHANGE_OF_STOCK"
| "LOWEST_PRICE"
| "THRESHOLD_MET";
export type EmailContent = {
subject: string;
body: string;
};
export type EmailProductInfo = {
title: string;
url: string;
};
import { PriceHistoryItem, Product } from "@/types";
const Notification = {
WELCOME: 'WELCOME',
CHANGE_OF_STOCK: 'CHANGE_OF_STOCK',
LOWEST_PRICE: 'LOWEST_PRICE',
THRESHOLD_MET: 'THRESHOLD_MET',
}
const THRESHOLD_PERCENTAGE = 40;
// Extracts and returns the price from a list of possible elements.
export function extractPrice(...elements: any) {
for (const element of elements) {
const priceText = element.text().trim();
if(priceText) {
const cleanPrice = priceText.replace(/[^\d.]/g, '');
let firstPrice;
if (cleanPrice) {
firstPrice = cleanPrice.match(/\d+\.\d{2}/)?.[0];
}
return firstPrice || cleanPrice;
}
}
return '';
}
// Extracts and returns the currency symbol from an element.
export function extractCurrency(element: any) {
const currencyText = element.text().trim().slice(0, 1);
return currencyText ? currencyText : "";
}
// Extracts description from two possible elements from amazon
export function extractDescription($: any) {
// these are possible elements holding description of the product
const selectors = [
".a-unordered-list .a-list-item",
".a-expander-content p",
// Add more selectors here if needed
];
for (const selector of selectors) {
const elements = $(selector);
if (elements.length > 0) {
const textContent = elements
.map((_: any, element: any) => $(element).text().trim())
.get()
.join("\n");
return textContent;
}
}
// If no matching elements were found, return an empty string
return "";
}
export function getHighestPrice(priceList: PriceHistoryItem[]) {
let highestPrice = priceList[0];
for (let i = 0; i < priceList.length; i++) {
if (priceList[i].price > highestPrice.price) {
highestPrice = priceList[i];
}
}
return highestPrice.price;
}
export function getLowestPrice(priceList: PriceHistoryItem[]) {
let lowestPrice = priceList[0];
for (let i = 0; i < priceList.length; i++) {
if (priceList[i].price < lowestPrice.price) {
lowestPrice = priceList[i];
}
}
return lowestPrice.price;
}
export function getAveragePrice(priceList: PriceHistoryItem[]) {
const sumOfPrices = priceList.reduce((acc, curr) => acc + curr.price, 0);
const averagePrice = sumOfPrices / priceList.length || 0;
return averagePrice;
}
export const getEmailNotifType = (
scrapedProduct: Product,
currentProduct: Product
) => {
const lowestPrice = getLowestPrice(currentProduct.priceHistory);
if (scrapedProduct.currentPrice < lowestPrice) {
return Notification.LOWEST_PRICE as keyof typeof Notification;
}
if (!scrapedProduct.isOutOfStock && currentProduct.isOutOfStock) {
return Notification.CHANGE_OF_STOCK as keyof typeof Notification;
}
if (scrapedProduct.discountRate >= THRESHOLD_PERCENTAGE) {
return Notification.THRESHOLD_MET as keyof typeof Notification;
}
return null;
};
export const formatNumber = (num: number = 0) => {
return num.toLocaleString(undefined, {
minimumFractionDigits: 0,
maximumFractionDigits: 0,
});
};
@MikeeBuilds
Copy link

Screenshot 2023-12-23 at 3 44 50 PM

import axios from "axios";
import * as cheerio from "cheerio";
import { extractPrice } from "../utils";

export async function scrapeAmazonProduct(url: string) {
    if(!url) return;

    

    const username = String(process.env.BRIGHT_DATA_USERNAME);
    const password = String(process.env.BRIGHT_DATA_PASSWORD);
    const port = 2225;
    const session_id = (1000000 * Math.random()) | 0;

    const options = {
        auth: {
            username: `${username}-session${session_id}`,
            password,
        },
        host: 'brd.superproxy.io',
        port,
        rejectUnauthorized: false,
    }

    try {
        // Fetch product page
        const response = await axios.get(url, options);
        const $ = cheerio.load(response.data);

        // Get product title
        const title = $('#productTitle').text().trim();
        const currentPrice = extractPrice(
            $('.priceToPay span.a-price-whole'),
            $('a.size.base.a-color-price'),
            $('.a-button-selected .a-color-base'),
        );

        console.log({title, currentPrice});
    } catch (error: any) {
        throw new Error(`Failed to scrape product: ${error.message}`);
    }
}

Can anyone help me figure out why price price is displaying these extra numbers? I cannot figure this out

@Harshkr75
Copy link

I've created and hosted my website on vercel but the website does not always work , it works sometimes properly and becomes unresponsive sometimes, It does not show any error in the console . Even the cron job is failing as i've recieved a few mails of cron jobs failure.

@Allan2000-Git
Copy link

Screenshot 2023-12-23 at 3 44 50 PM

import axios from "axios";
import * as cheerio from "cheerio";
import { extractPrice } from "../utils";

export async function scrapeAmazonProduct(url: string) {
    if(!url) return;

    

    const username = String(process.env.BRIGHT_DATA_USERNAME);
    const password = String(process.env.BRIGHT_DATA_PASSWORD);
    const port = 2225;
    const session_id = (1000000 * Math.random()) | 0;

    const options = {
        auth: {
            username: `${username}-session${session_id}`,
            password,
        },
        host: 'brd.superproxy.io',
        port,
        rejectUnauthorized: false,
    }

    try {
        // Fetch product page
        const response = await axios.get(url, options);
        const $ = cheerio.load(response.data);

        // Get product title
        const title = $('#productTitle').text().trim();
        const currentPrice = extractPrice(
            $('.priceToPay span.a-price-whole'),
            $('a.size.base.a-color-price'),
            $('.a-button-selected .a-color-base'),
        );

        console.log({title, currentPrice});
    } catch (error: any) {
        throw new Error(`Failed to scrape product: ${error.message}`);
    }
}

Can anyone help me figure out why price price is displaying these extra numbers? I cannot figure this out

You need to parse it properly. Try the below code
export function extractPrice(...args: any){ for(const element of args){ const price = element.text().trim(); if(price){ return price.replace(/\D/g, ''); } } return ""; }

@harshbajani
Copy link

I am not able to get any mails in my inbox after submitting even after doing everything correctly

"use server";

import { EmailContent, EmailProductInfo, NotificationType } from "@/types";
import nodemailer from "nodemailer";

const Notification = {
WELCOME: "WELCOME",
CHANGE_OF_STOCK: "CHANGE_OF_STOCK",
LOWEST_PRICE: "LOWEST_PRICE",
THRESHOLD_MET: "THRESHOLD_MET",
};

export async function generateEmailBody(
product: EmailProductInfo,
type: NotificationType
) {
const THRESHOLD_PERCENTAGE = 40;
// Shorten the product title
const shortenedTitle =
product.title.length > 20
? ${product.title.substring(0, 20)}...
: product.title;

let subject = "";
let body = "";

switch (type) {
case Notification.WELCOME:
subject = Welcome to Price Tracking for ${shortenedTitle};
body = <div> <h2>Welcome to PriceWise 🚀</h2> <p>You are now tracking ${product.title}.</p> <p>Here's an example of how you'll receive updates:</p> <div style="border: 1px solid #ccc; padding: 10px; background-color: #f8f8f8;"> <h3>${product.title} is back in stock!</h3> <p>We're excited to let you know that ${product.title} is now back in stock.</p> <p>Don't miss out - <a href="${product.url}" target="_blank" rel="noopener noreferrer">buy it now</a>!</p> <img src="https://i.ibb.co/pwFBRMC/Screenshot-2023-09-26-at-1-47-50-AM.png" alt="Product Image" style="max-width: 100%;" /> </div> <p>Stay tuned for more updates on ${product.title} and other products you're tracking.</p> </div>;
break;

case Notification.CHANGE_OF_STOCK:
  subject = `${shortenedTitle} is now back in stock!`;
  body = `
    <div>
      <h4>Hey, ${product.title} is now restocked! Grab yours before they run out again!</h4>
      <p>See the product <a href="${product.url}" target="_blank" rel="noopener noreferrer">here</a>.</p>
    </div>
  `;
  break;

case Notification.LOWEST_PRICE:
  subject = `Lowest Price Alert for ${shortenedTitle}`;
  body = `
    <div>
      <h4>Hey, ${product.title} has reached its lowest price ever!!</h4>
      <p>Grab the product <a href="${product.url}" target="_blank" rel="noopener noreferrer">here</a> now.</p>
    </div>
  `;
  break;

case Notification.THRESHOLD_MET:
  subject = `Discount Alert for ${shortenedTitle}`;
  body = `
    <div>
      <h4>Hey, ${product.title} is now available at a discount more than ${THRESHOLD_PERCENTAGE}%!</h4>
      <p>Grab it right away from <a href="${product.url}" target="_blank" rel="noopener noreferrer">here</a>.</p>
    </div>
  `;
  break;

default:
  throw new Error("Invalid notification type.");

}

return { subject, body };
}

const transporter = nodemailer.createTransport({
pool: true,
service: "hotmail",
port: 2525,
auth: {
user: process.env.EMAIL,
pass: process.env.EMAIL_PASSWORD,
},
maxConnections: 1,
});

export const sendEmail = async (
emailContent: EmailContent,
sendTo: string[]
) => {
const mailOptions = {
from: "harsh.xira@outlook.com",
to: sendTo,
html: emailContent.body,
subject: emailContent.subject,
};

transporter.sendMail(mailOptions, (error: any, info: any) => {
if (error) return console.log(error);

console.log("Email sent: ", info);

});
};

@Abubaker6898
Copy link

where is a zipped folder

@tito-Saptarshi
Copy link

can only receive mails on localhost, but after deploying, it's not working

@DillBal
Copy link

DillBal commented Mar 12, 2024

I am getting error 503 when I try to scrape a product. Anyone else dealing with this?

=> Already connected to MongoDB
  lib\actions\index.ts (50:10) @ scrapeAndStoreProductError: Failed to create/update product: Failed to scrape product: Request failed with status code 503
    at scrapeAndStoreProduct (./lib/actions/index.ts:60:15)
  48 |     revalidatePath(`/products/${newProduct._id}`);
  49 |   } catch (error: any) {
> 50 |     throw new Error(`Failed to create/update product: ${error.message}`);
     |          ^
  51 |   }
  52 | }
  53 |

Did you find a solution I am experiencing the same problem right now. It worked the first time, but now I get a 503 error.

Not yet, I will post one here if I find it, or maybe someone will post one here when they find it.

Alright thanks, he mentions in the video that it may not work during development when you are using a localhost, so I am hoping that is what is going wrong and it works in production.

i got its solution. it might be due to the imgUrls it was returning.

image: imgUrls[0]||productImg,

changing to this line helped solve the error

Brother where do you change that line?

@wilsonlaww
Copy link

After deployment, how can the public open the website? I tried opening the link and it says I am not a member of the team and therefore cant view the page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment