Skip to content

Instantly share code, notes, and snippets.

View Chubek's full-sized avatar

Chubak Bidpaa Chubek

View GitHub Profile
@Chubek
Chubek / README.md
Created March 29, 2024 14:29
This is, hands-down, the best way to convert PDFs to EPUB (or any other format)

This document describes several shell pipelines for converting PDF files to any format.

I'm not sure if it's true for all people, but my e-reader sucks at displaying PDF --- which is, in all reality, a giant executable file (we'll discuss this soon). Also, there's dozens of other reasons one may wish to convert a PDF to a better 'text format'. Let's say, you wanna put it up on your website, feed it to a mathematical optimization model, feed it to an script, etc.

Before you read this document, yes, I know there is a utility, nay, dozens that converty PDFs directly to text (like pdftotext). I ALSO know that. there are millions, if not BILLIONS of crappy web services that serve you a malware on the platter alongisde converting the files. So let's not talk about them! It's about "owning" your software, read this!

What are PDF Files?

This is not meant to be a description or history of PDF files, you can consult Sahih Al-Bukhari f

@Chubek
Chubek / ECMAScript.ebnf
Last active April 2, 2024 00:12
EBNF Grammar for JavaScript (aka ECMAScript)
# Syntactic Grammar for ECMAScript
ecma-script-module ::= { top-level | ignorable }
top-level ::= statement
| function-declaration
| class-declaration
function-declaration ::= [ "async" ] "function" identifier function-params-postfix compound-statement
@Chubek
Chubek / README.md
Last active March 10, 2024 13:04
EBNF Grammar for AWK

Note: if you wish to understand these notations, please read this: https://gist.github.com/Chubek/52884d1fa766fa16ae8d8f226ba105ad

So, again, why did I write the EBNF grammar for AWK?

Basically, I have two ongoing projects where AWK is involved. Firs is Squawk, and implementation of AWK and second is AWK2c, which obviously translates AWK to C.

Plus, I am thinking of making a Github page called 'The Internet Grammar Database' where I would post EBNF, Yacc, PEG, Lex, definitions of languages. However, I don't have much experience in web development, so if you can help me, let me know (chubakbidpaa [at] riseup [dot] net).

So anyways, awk.ebnf contains the EBNF grammar for AWK. Some considerations:

@Chubek
Chubek / README.md
Last active March 9, 2024 16:04
EBNF Grammar for C

c.ebnf contains grammar ofr C99. Note that this is ANSI C, not ISO C, so there are some omissions. The reason I wrote this is, I am currently writing a C compiler, with my own backend (and hopefully, frontend) in OCaml. And I needed to collect the grammar in one place.

How to Read EBNF Grammars?

Reading EBNF grammars is pretty simple:

  • Enclosed within two ?s is a global capture, it does not mean optional. It means 'I am writing a free-style sentence'.
  • Enclosed within { and } means : repeat at least zero times
  • Enclosed within [ and ] means : this is optional
  • Enclosed within ( and ) means : this is a group
@Chubek
Chubek / ebnf.vim
Last active March 9, 2024 08:49
Syntax Highlighters for NeoVim/VIm
" ebnf.vim - Syntax highlighting for EBNF
" Author: Chubak Bidpaa (chubakbidpaa@riseup.net)
if exists("b:current_syntax")
finish
endif
syntax region ebnfComment start=/\v#\s+/ end=/\v$/
syntax region ebnfMultiCharTerminal start=/\v"/ end=/\v"/
syntax region ebnfSingleCharTerminal start=/\v'/ end=/\v'/
@Chubek
Chubek / allocpp.pl
Last active March 2, 2024 16:06
New AllocPP
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Std;
my %opts;
getopts('i:o:', \%opts);
@Chubek
Chubek / README.md
Last active March 1, 2024 22:25
Simple Preprocessor in Perl for any File!

preprocess.pl is a 'general-purpose preprocessor' -- a very simple one at that. All it can do is to include files, and define simple phrases.

It currently accepts input via STDIN, and outputs the preprocessed file via STDOUT. There are two directives:

#include <path_to_file.ext>

#define Id Def

@Chubek
Chubek / README.md
Last active February 28, 2024 22:39
Writing Grammars: Mixing Regular Definitions with EBNF

The reason you could be writing a grammar is:

1- You wish people to learn the ins-and-outs of your language;

2- You wish to feed this grammar into an LLM, a lower-spec generator like BNFC, etc;

3- You wish to 'mindstorm' features of your language, or a superset that you are defining for an existing language.

I have devised a method of specifying grammars which is not very original, but the positive side of it is, the lexical and the syntactic grammar are separate, so an LLM like ChatGPT can easily create separate lexical analyszer specs or even the lexical analyzer itself, from the parser --- which is of course defined by the syntactic grammar.

@Chubek
Chubek / j2sexp.rb
Last active February 28, 2024 21:48
JSON to S-Expressions
#!/usr/bin/env ruby
# The following script translates a JSON file into S-Expressions. I wrote this after I learned about
# attempts to use Scheme as the lingua franca of Web in the early 90s. Someone told me S-Expressions
# are bascilly XML and well, he's right.
# You can either pass a JSON file as the first argument, or pipe it via STDIN. The translated S-Expression
# data will be written to STDOUT. A regular session would look like:
# cat my_file.json | ruby j2sexp.rb > my_file.sexp
# What is the use of this program? Well, basically, consider this: It's much easier to parse S-Expressions
# that it is to parse JSON. There, I don't need any further explanations!
@Chubek
Chubek / witty.rb
Created February 25, 2024 09:41
Witty.rb -> Parse .git/index
#!/usr/bin/env ruby
# === Witty.rb ===
# A very simple Ruby Script
# Author: Chubak Bidpaa (github.com/Chubek)
#
# ** What does this do? **
# This script demonstrates how to parse a Git index file (.git/index)
# using nothing but the languages IO facilities. This is perhaps best
# done in a systems language, or a strongly-typed language where there