Skip to content

Instantly share code, notes, and snippets.

@WayneCui
Last active December 19, 2015 18:18
Show Gist options
  • Save WayneCui/5997656 to your computer and use it in GitHub Desktop.
Save WayneCui/5997656 to your computer and use it in GitHub Desktop.
REBOL []
start-url: http://exam.com
done-file: %done.txt
storage: %strore.txt
done-urls: attempt [ read/lines done-file ]
urls: copy []
get-a: func [ url [ url! ] /local rule data s] [
rule: [ while [ thru "<a" thru "href=" skip copy s to [{"} | {'} ]
(if all [find [ %.htm %.html] suffix? s not find urls s not find done-urls s find/match s "http"] [ append urls to-url s ] )
]
]
try/except [
data: to-string read url
parse data rule
wait 1
] func [value] [ ?? value print [ "error from get-a: " url ] 0]
]
extract-data: func [ url [ url! ]/local data d1 d2 d3 d4 d5 filename ][
print url
try/except [
data: to-string read url
write/lines/append storage copy/part data 100
print "good!!!"
write/append done-file reform [ url newline ]
] func [value] [ ?? value print "error from exract-data: " 0]
]
main: func [] [
get-a start-url
foreach url urls [
get-a to-url url
extract-data to-url url
]
]
forever [ main ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment