Skip to content

Instantly share code, notes, and snippets.

View milescrawford's full-sized avatar

Miles Crawford milescrawford

View GitHub Profile
Python 2.7.14 (default, Sep 23 2017, 22:06:14)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> foo = "hi"
>>> bar = "bye"
>>> (foo, bar,)
('hi', 'bye')
>>> (foo,)
('hi',)
>>> (foo)
2018-04-03 23:30:14,574 ERROR [Executor task launch worker for task 571523] org.apache.spark.executor.Executor: Exception in task 254.0 in stage 107.0 (TID 571523)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuffer.append(StringBuffer.java:367)
at java.io.BufferedReader.readLine(BufferedReader.java:370)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at org.apache.commons.io.IOUtils.readLines(IOUtils.java:1033)
at org.apache.commons.io.IOUtils.readLines(IOUtils.java:987)
milesc@torre 0 tmp-docker ↠ cat Dockerfile
FROM library/ubuntu:16.04
ENTRYPOINT ["head", "-n", "1"]
milesc@torre 0 tmp-docker ↠ cat bigfile | docker run -i example && echo "sucesss"
paperid paper_title publisher doi field pdf_processed viewable users_28days users_7days frac_users_28days frac_users_7days
read unix @->/var/run/docker.sock: read: connection reset by peer
milesc@torre 0 tmp-docker ↠ head bigfile | docker run -i example && echo "sucesss"
paperid paper_title publisher doi field pdf_processed viewable users_28days users_7days frac_users_28days frac_users_7days
Creative Commons Attribution License Http://creativecommons.org/licenses/by/3.0
Community-associated Methicillin-resistant Staphylococcus Aureus CA-MRSA
Alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionic Acid AMPA Receptor
Community-associated Methicillin-resistant Staphylococcus Aureus
Endobronchial Ultrasound-guided Transbronchial Needle Aspiration
Matrix-assisted Laser Desorption/ionization Mass Spectrometry
Chromatin Immunoprecipitation Sequencing ChIP-seq Experiments
Reverse Transcription Loop-mediated Isothermal Amplification
National Polar-orbiting Operational Environmental Satellite
Creative Commons Attribution-NonCommercial-NoDerivs License
Miriam Blatt Dennis Chen Scott Cooke Piyush Desai Manjunath Doreswamy Mark Elgood Gary Feierbach Tim Goldsbury Dale Greenley Raju Joshi Mike Khosraviani Robert Kwong Manish Motwani Chitresh Narasimhaiah Sam J. Nicolino Jr. Tooru Ozeki Gary Peterson Chris Salzmann Nas James Gateley
A. Adalal J. Bauman P. Delisle P. Dedood P. Donehue M. Dell'OcaKhouja T. Doan M. Doreswamy P. Ferolito O. Geva D. Greenhill S. Gopaladhine J. Irwin L. Lev J. MacDonald M. Ma S. Mitra P. Patel A. Prabhu R. Puranik S. Rozanski N. Ross P. Saggurti S. Simovich R. Sunder A. Cao
Elena Biasibetti Alberto Valazza Maria T Capucchio Laura Annovazzi Luigi Battaglia Daniela Chirio Marina Gallarate Marta Mellai Elisabetta Muntoni Elena Peira Chiara Riganti Davide Schiffer Pierpaolo Panciani And Michele Lanotte
Galit H Frydman Robert P Marini Vasudevan Bakthavatchalu Kathleen E Biddle Sureshkumar Muthupalani Charles R Vanderburg Barry Lai Pavan K Bendapudi Ronald G Tompkins And James G Fox
En Representacion Del Grupo Colaborativo Para El Estudio
1. what's the news launch loves warm
Recording technology improves and makes television easier to edit, satellite technology continues to get better
2. what's the news 1 plus 1
See http://www.nytimes.com/2010/12/03/science/03arsenic.html?pagewanted=1&_r=3 and http://science.nasa.gov/science-news/science-at-nasa/2010/02dec_monolake/ for further information on this controversial finding.
3. how big is earth
5th largest planet in the solar system
4. what is the find mean orbit the sky
planets; sun
5. where does photosynthesis take place
In a plant's leaves
2017-03-23 20:25:00 [scrapy.extensions.logstats] INFO: Crawled 2631 pages (at 556 pages/min), scraped 96 items (at 25 items/min)
2017-03-23 20:25:06 [anansi.dao.frontier] INFO: ['infocenter.arm.com', 'landmark.cs.cornell.edu', 'dblp.uni-trier.de', 'aclanthology.info', 'events.cornell.edu']
2017-03-23 20:25:06 [anansi.dao.frontier] INFO: Dequeuing batch of frontier URIs; frontier size 912644; select using sample {TABLESAMPLE BERNOULLI(0.10957174977318648)}, dominant {AND uri NOT LIKE '%infocenter.arm.com%' AND uri NOT LIKE '%landmark.cs.cornell.edu%' AND uri NOT LIKE '%dblp.uni-trier.de%' AND uri NOT LIKE '%aclanthology.info%' AND uri NOT LIKE '%events.cornell.edu%'}
2017-03-23 20:25:07 [anansi.dao.frontier] INFO: Populated cache with 737 frontier URIs
2017-03-23 20:26:00 [scrapy.extensions.logstats] INFO: Crawled 3055 pages (at 424 pages/min), scraped 113 items (at 17 items/min)
2017-03-23 20:26:28 [anansi.dao.frontier] INFO: ['infocenter.arm.com', 'aclanthology.info', 'events.cornell.edu', 'dblp.uni-trier.de
Time: 361.734 ms
s2crawler=> explain analyze SELECT frontier_uri_id FROM frontier_uri TABLESAMPLE BERNOULLI(0.01) WHERE ( started IS NULL OR (completed IS NULL AND started < now() - interval '3 hours'));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Sample Scan on frontier_uri (cost=0.00..235476.19 rows=148 width=4) (actual time=3.139..349.858 rows=93 loops=1)
Sampling: bernoulli ('0.01'::real)
Filter: ((started IS NULL) OR ((completed IS NULL) AND (started < (now() - '03:00:00'::interval))))
Rows Removed by Filter: 1105
Planning time: 0.056 ms
Execution time: 350.009 ms
2017-03-21 22:04:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-21 22:04:59 [anansi.dao.frontier] INFO: Dequeuing batch of frontier URIs; frontier size 91134; select using {TABLESAMPLE BERNOULLI(1.0972853161278997)}
2017-03-21 22:04:59 [anansi.dao.frontier] INFO: Populated cache with 1005 frontier URIs
2017-03-21 22:04:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-21 22:05:07 [anansi.dao.frontier] INFO: Dequeuing batch of frontier URIs; frontier size 89157; select using {TABLESAMPLE BERNOULLI(1.1216169229561335)}
2017-03-21 22:05:08 [anansi.dao.frontier] INFO: Populated cache with 943 frontier URIs
2017-03-21 22:05:42 [anansi.dao.frontier] INFO: Dequeuing batch of frontier URIs; frontier size 81513; select using {TABLESAMPLE BERNOULLI(1.2267981794315017)}
2017-03-21 22:05:42 [anansi.dao.frontier] INFO: Populated cache with 985 frontier URIs
2017-03-21 22:06:06 [scrapy.extensions
milesc@torre 0 tmp-datasets ↠ grep -h 8677da2812971b454c26c29994cabd8dbf72aebf * | jq -S .
{
"facets": [
{
"facetType": "dataset",
"values": [
"Penn Treebank",
"QuestionBank"
]
}