Skip to content

Instantly share code, notes, and snippets.

View caryfitzhugh's full-sized avatar

Cary FitzHugh caryfitzhugh

View GitHub Profile

edit /etc/tomcat7/tomcat7.conf

JAVA_OPTS="-XX:SoftRefLRUPolicyMSPerMB=36000 -Xms8192m -Xmx8192m -XX:+UseConcMarkSweepGC -Djava.awt.headless=true "

Install Marlin Rasterizer:

Download. Put into

wget https://github.com/bourgesl/marlin-renderer/releases/download/v0_9_1/marlin-0.9.1-Unsafe.jar
(ns trebuchet-idp.jose
(:require [clojure.reflect :as r]
[clojure.data.json :as json])
(:import com.nimbusds.jose.jwk.RSAKey
org.bouncycastle.openssl.jcajce.JcaPEMWriter
java.io.StringWriter))
(comment
(def jwk {:use "sig", :kty "RSA", :kid "public", :n "uLhh0hTuBEsKJ-Ujqr8tQnNRQPzeNlue9SEWt89xkzjTzENqZxj90JaWH1m6cTMYhByaGOAjylI9f47gfT92StCNyLsRt-eqOUig5Bqe93V9bLF9F-1BjgiLShZYMwM-dZ940pg9PqStosGmkAl2wA7hHq4zNltwbysyDgX_fuSn_UWLOmm5rigIl9xhcBpL5lVtiE2JTR46lPMENoFC5J8mcmSedDg92NdCujLky1SY0vszGVH7hphd_y3T-wX_2RfVPOeBFBKWw_rQ7SGTs3GD_akMO_QkAwm7DXBB7cuLfRcdvYhBkuzuiHzGS1TgRHFVBevQ-qE24ZobJvvMx7pfrbSqkBacT-THFACP0JgvGyxAYI_lMu-wRUAeBUsOT5c-XRaH3PoYhScKURBLEbA_wM7P2px88qGFZJEDAIVkRwe7Yob-F_i0cSKfeSQIPgQgA3ppXVCJdi4EBL075R5JILNAbemkdtKAbW75kn4ruh2P0ylUVfanF2ZjHwUkQ4_SGgKk5d7_KZu4cbmVAS7lazbLMJtYM6ml7gcFLlYJl-mgSa5qB9uq_lyGpfAsN8j6RylTWYGr2x3hCnhgCqfNYNFmgkPlJIv2Uu17zasbQVvnyQfIIUyMUeZKDqhJ0KfIvXZn-acIedEbgfhAig05V-OyA1aC7ft21qeDUb0", :e "AQAB"})
(jwk->pem-str jwk))
{
;; Data is the subscriptions (in reflux / re-frame)
;; Data flows from subscription to subscription
;; Some data is sourced by an API path (and is requested automatically if needed)
;; Other data is derived from other subscriptions (:sources, etc)
:data {
:all-tenants {
:api-path “/v1/tenants” ;; This would be a source / root
:key :tenants/all-tenants ;; Others could use this. And the data is stored in this key
}
swagger: "2.0"
info:
title: Test external refs
version: 0.0.0
paths:
/craziness/{complex-id}/:
get:
parameters:
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title> Yahoo Articles arstechnica</title>
<description>The Articles from CondeNast for Yahoo Syndication, for brand: arstechnica</description>
<link>https://s3.amazonaws.com/staging.publish.sindicati.cnds.io/yahoo-publications/arstechnica-articles.xml</link>
<item>
<media:content url="http://cdn.arstechnica.net/wp-content/uploads/2016/06/aip-classification-640x159.png" type="image/gif"/>
<title>Azure Information Protection makes warding off data leaks easier</title>
<description>Based on tech bought last year, new system builds on Azure Rights Management.</description>
(require '[clojure.core.async :as async])
(def file ["a" "b" {:url "http://ip.jsontest.com/"} "d" "e" {:url "http://time.jsontest.com"}])
(println "Loaded file..." (pr-str file))
(defmacro while-let
"Repeatedly executes body while test expression is true, evaluating the body with binding-form bound to the value of test."
[[form test] & body]
`(loop [~form ~test]
(when ~form
<html>
<body>
<select>
<option>1</option>
<option>2</option>
<option>3</option>
<option>4</option>
<option>5</option>
</select>
(def imgs [{:url "http://media.vanityfair.com/photos/55f700a1fad0d98d444d2531/master/w_690/Pandora_VF_690x460_Girl3_2.jpg"} {:url "http://media.vanityfair.com/photos/55f7002c200c34353591cd5d/master/w_690/Pandora_VF_690x460_Girl3_1.jpg"}
{:url "http://media.vanityfair.com/photos/55f7002c200c34353591cd5a/master/w_690/Pandora_VF_690x460_Girl3_2.jpg"}])
(defn remove-dups
[images]
(:images (reduce (fn [results image]
(let [filename (last (clojure.string/split (:url image) #"\/"))]
;; If we've already seen this filename.
(if (contains? (:filenames results) filename)
;; Just return results
results
(defn sitemap-article-urls
[sitemap-url syndicate]
(try
(let [
;; Look back N days (how ever long our publication lifespan is on the syndicate)
scrape-date (tf/unparse (:date-time tf/formatters) (t/minus (t/now) (t/days (:plife syndicate))))
log (logger/info "Scraping: " scrape-date (pr-str sitemap-url))
;; Go and scrape all those sitemaps and get the set of unique urls
article-urls (set (first (sitemap/urls sitemap-url scrape-date)))
log (logger/info "Article URLs: " sitemap-url " -> " (pr-str article-urls))
Spugnating url http://www.allure.com/beauty-trends/blogs/daily-beauty-reporter/2015/10/mom-does-amazing-braids-on-daughters.html
Spugnating doc:
{:description "Many parents would probably consider successfully packing a lunch while half-asleep an impressive accomplishment. And honestly, it is. But should you be interested in stepping up your kid-hairstyling game, we found some killer inspiration. Beth Belshaw, a hairstylist in...",
:original-url "http://www.allure.com/beauty-trends/blogs/daily-beauty-reporter/2015/10/mom-does-amazing-braids-on-daughters.html",
:uid "www.allure.com/beauty-trends/blogs/daily-beauty-reporter/2015/10/mom-does-amazing-braids-on-daughters",
:_scraper_version "1.1.22",
:section "beauty-trends",
:modified "2015-10-13T14:30:00.000+00:00",
:keywords ("braids" "hair" "parenting" "technology"),
:brand "Allure",