Skip to content

Instantly share code, notes, and snippets.

@rex-chien
rex-chien / items.py
Last active October 22, 2019 09:32
用 Scrapy 爬 iT 邦幫忙的技術文章和回文
import scrapy
class IthomeArticleItem(scrapy.Item):
_id = scrapy.Field()
url = scrapy.Field()
title = scrapy.Field()
author = scrapy.Field()
publish_time = scrapy.Field()
tags = scrapy.Field()
@rex-chien
rex-chien / ithome.py
Last active October 12, 2019 13:09
用 Scrapy 爬 iT 邦幫忙的技術文章 (使用 Items 類別)
import scrapy
from datetime import datetime
import re
import ithome_crawlers.items as items
class IthomeSpider(scrapy.Spider):
name = 'ithome'
allowed_domains = ['ithome.com.tw']
def start_requests(self):
@rex-chien
rex-chien / ithome.py
Created October 9, 2019 07:57
用 Scrapy 爬 iT 邦幫忙的技術文章
import scrapy
from datetime import datetime
import re
class IthomeSpider(scrapy.Spider):
name = 'ithome'
allowed_domains = ['ithome.com.tw']
def start_requests(self):
for page in range(1, 11):
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import re
from pymongo import MongoClient
host = 'localhost'
dbname = '<'
client = MongoClient(host, 27017)
@rex-chien
rex-chien / ithome_crawler_postgres.py
Last active February 7, 2021 08:03
iT 邦幫忙技術文章爬蟲,新增到 postgres
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import re
import psycopg2
host = "localhost"
user = "postgres"
dbname = "<your_database>"
password = "<server_admin_password>"
@rex-chien
rex-chien / ithome_crawler_2.py
Last active October 1, 2019 07:08
iT 邦幫忙技術文章爬蟲,新增到 postgres
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import re
import psycopg2
host = "localhost"
user = "postgres"
dbname = "ithome2019"
password = "<server_admin_password>"
@rex-chien
rex-chien / ithome_crawler.py
Last active January 14, 2020 07:02
iT 邦幫忙技術文章爬蟲,新增到 postgres
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import re
import psycopg2
host = 'localhost'
user = 'postgres'
dbname = 'ithome2019'
password = '<server_admin_password>'