Skip to content

Instantly share code, notes, and snippets.

⌚️
cargo test --release

Deyan Ginev dginev

⌚️
cargo test --release
Block or report user

Report or block dginev

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@dginev
dginev / subject_metadata.md
Created May 1, 2019
arXMLiv 08.2018 dataset, subject classification frequencies
View subject_metadata.md
Subject Document count
math 334932
astro-ph 223437
cond-mat 212384
cs 132338
hep-ph 130788
hep-th 116499
physics 99881
quant-ph 80888
@dginev
dginev / arxiv_metadata_packer.rs
Last active Apr 24, 2019
Extracting arXiv category metadata from OAI_PMHv2.0 xml harvest
View arxiv_metadata_packer.rs
//! Convert arXiv's OAI harvested XML files into a lookup table for classification labels
// Step 0. Prerequisite: download all needed arXiv metadata via OAI, e.g.
//```
// $ pip install git+http://github.com/bloomonkey/oai-harvest.git#egg=oaiharvest
// $ mkdir metadata/arxiv; cd metadata/arxiv
// $ oai-reg add arxiv http://export.arxiv.org/oai2?verb=Identify
// $ oai-harvest arxiv --until 2018-09-09
//```
// endpoint documentation at: https://arxiv.org/help/oa
use jwalk::WalkDir;
@dginev
dginev / corpus_statistics_ref.csv
Created Mar 30, 2019
"Words prior \ref", arXMLiv 08.2018
View corpus_statistics_ref.csv
word frequency
figure 3290488
theorem 3052607
section 2802295
lemma 2408488
table 1544961
proposition 1334759
and 1031640
corollary 476062
appendix 416964
@dginev
dginev / apply_cutoffs.pl
Last active Mar 24, 2019
arXMLiv 08.2018, MathML element report
View apply_cutoffs.pl
#!/usr/bin/env perl
# Applies cutoffs to the very noisy 250 MB mathml_statistics.txt
# which was generated by llamapun over arXMLiv 08.2018.
#
# It rewrites to a CSV file, throwing out all known erroneous markup, including:
# - discard all SVG-associated markup (wrongly in MathML)
# - discard all (non-math) HTML-associated markup (wrongly in MathML)
# - discard all XMath-associated markup (wrongly in MathML)
# - less noisy for uninteresting values (numbers with known units, hex colors, open-ended id schemes, etc)
#
@dginev
dginev / dlmf_mathml_report.csv
Created Mar 23, 2019
DLMF v0.1.20 MathML element report
View dlmf_mathml_report.csv
name@attr[value] frequency
mo 390704
mi 317263
mrow 265247
mi@href 230061
math@display 108952
math@class 108952
math 108952
math@alttext 108952
math@class[ltx_Math] 108944
@dginev
dginev / rustc.log
Created Jan 30, 2019
rtx_package$ time cargo rustc -- -Z time-passes
View rustc.log
time: 0.026; rss: 58MB parsing
time: 0.000; rss: 58MB attributes injection
time: 0.000; rss: 58MB garbage collect incremental cache directory
time: 0.000; rss: 58MB recursion limit
time: 0.000; rss: 58MB crate injection
time: 0.000; rss: 58MB plugin loading
time: 0.000; rss: 58MB plugin registration
time: 0.000; rss: 58MB background load prev dep-graph
time: 0.003; rss: 58MB pre ast expansion lint checks
time: 1.662; rss: 237MB expand crate
@dginev
dginev / custom_derive_lib.rs
Last active Jan 24, 2019
Contextual variable capture in Rust, via Custom Derive
View custom_derive_lib.rs
static mut CONTEXT_DEPTH: u32 = 0;
#[proc_macro_derive(BoundState)]
pub fn bound_state(_input: TokenStream) -> TokenStream {
let state_declaration = if unsafe {CONTEXT_DEPTH == 0} {
quote!(
macro_rules! state {
() => {
outer_state!()
};
@dginev
dginev / annual_dependency_status.csv
Last active Sep 19, 2018
arXiv 08.2018, LaTeX dependencies report
View annual_dependency_status.csv
We can't make this file beautiful and searchable because it's too large.
00,-4,amsbsy.sty.ltxml,1
00,-4,amsfonts.sty.ltxml,4
00,-4,amsmath.sty.ltxml,1
00,-4,amsopn.sty.ltxml,1
00,-4,amssymb.sty.ltxml,3
00,-4,amstext.sty.ltxml,1
00,-4,amsthm.sty.ltxml,1
00,-4,array.sty.ltxml,5
00,-4,article.cls.ltxml,13
00,-4,color.sty.ltxml,2
@dginev
dginev / annual_conversion_status.csv
Last active Sep 18, 2018
Status report for LaTeXML 0.8.3 run over arXiv upto 08.2018 (in CorTeX)
View annual_conversion_status.csv
Year No Problems Warnings Errors Fatal Invalid
1991 23 153 87 41 2
1992 232 1532 1179 286 32
1993 606 3723 1612 480 309
1994 915 5582 1877 569 1145
1995 1189 7321 2069 495 1924
1996 1605 9209 3584 871 592
1997 1965 11333 4650 1180 472
1998 2552 13373 5952 1519 627
1999 2895 15432 6563 1643 860
@dginev
dginev / integral_example.xml
Last active Aug 20, 2018
Integral snippet for eq 7.2.1 of DLMF, August 2018
View integral_example.xml
<apply>
<apply>
<csymbol cd="ambiguous">superscript</csymbol>
<apply>
<csymbol cd="ambiguous">subscript</csymbol>
<int />
<cn type="integer">0</cn>
</apply>
<ci>italic-z</ci>
</apply>
You can’t perform that action at this time.