Skip to content

Instantly share code, notes, and snippets.

@occidere
Created September 26, 2018 15:16
Show Gist options
  • Save occidere/a3d2fc97a6c01bf7f2ddc65076832bc6 to your computer and use it in GitHub Desktop.
Save occidere/a3d2fc97a6c01bf7f2ddc65076832bc6 to your computer and use it in GitHub Desktop.
오마이걸 트위터 첫 페이지의 이미지들만 긁어오는 테스트 코드. 아직 전체 페이지 스트리밍 크롤링은 불가
package org.occidere.test;
import org.apache.commons.lang3.StringUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.util.List;
import java.util.stream.Collectors;
public class OhMyGirlTwitterImageCrawler {
public static void main(String[] args) throws Exception {
String url = "https://twitter.com/WM_OHMYGIRL/media";
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
List<String> imgUrlList = doc.getElementsByClass("AdaptiveMedia-container")
.stream()
.map(container -> container.select("img"))
.map(imgTag -> imgTag.attr("src"))
.filter(StringUtils::isNotBlank)
.collect(Collectors.toList());
imgUrlList.forEach(System.out::println);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment