Skip to content

Instantly share code, notes, and snippets.

@abathur
Created July 13, 2021 17:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abathur/74e7a63b25b7bbd4a6fa9ad7e728ab70 to your computer and use it in GitHub Desktop.
Save abathur/74e7a63b25b7bbd4a6fa9ad7e728ab70 to your computer and use it in GitHub Desktop.
05:32 <abathur> this may end up being a dumb question, but does anyone with a decent grip on C have a guess at how much effort it'd take to compile any given coreutil into bash as a builtin?
05:33 <samueldr> I guess it's more about coreutil's design, no?
05:34 <samueldr> if a given function is self-contained, then it should be pretty trivial AFAIK
05:34 <samueldr> if it uses a bunch of helpers from within coreutils, less so
05:34 <abathur> that's my rough guess, especially since at least some coreutils grew out of shell builtins
05:36 <samueldr> (probably misusing some terminlogy) abathur: probably as complex as how many symbols it uses that are not from its own compilation unit
05:38 <abathur> (I've been vaguely wondering if there's much time on the table at, say, hydra scale, by generating a build-bash with some of the most-common commands built in
05:38 <abathur> )
05:42 <abathur> ls, cat, touch, tee, cp, mv, mkdir, ln, install, sort, uniq, stat, chmod, chown?
05:42 <samueldr> abathur: isn't that called busybox?
05:42 <samueldr> ;)
05:42 <abathur> it do
05:42 <samueldr> I kid
05:43 <samueldr> it sure is an interesting idea to fold in commonly used coreutils into bash itself
05:43 <samueldr> would it make sense to, at that point, fold all of coreutils into bash rather than pick a few tools?
05:44 <samueldr> bigger bash, but we were going to add a bunch of them anyway
05:44 <abathur> it was something I noticed when I played around with the bash FUSE, which comes with a builtin--I'd never played with the "loadable" builtins
05:44 <energizer> abathur: have you benchmarked any of them?
05:44 <samueldr> and they already know how to live in a single binary, given coreutils is...
05:44 <samueldr> uh... that thing...
05:44 <samueldr> multi-call binary?
05:45 <abathur> yeah, fair question, I'm not sure if it's relatively more/less work to split them or do them all
05:45 <samueldr> abathur: I wonder how hard it'd be to make coreutils.so out of coreutils, and use ctypes.sh
05:45 <abathur> hehe
05:45 <abathur> well
05:46 <abathur> there's already a concept of loadable builtins that are precompiled but added with 'enable' at runtime
05:46 <abathur> but there's a fair amount of overhead when you first enable, like 2-400ms iirc
05:47 <abathur> you can mitigate that by including many builtins in a single file
05:47 <Ke> notably, if you want speed, many things can be done with pure bash
05:47 <Ke> busybox can do many of the busybox commands without exec
05:47 <Ke> yes
05:47 <Ke> multicall
05:47 <Ke> like reading files and finding and transforming output
05:47 <Ke> so if you run ftruncate from busybox ash, there is no fork-exec only ftruncate syscall
05:47 <abathur> but, I assume in a build context that enabling them for every bash invocation would chew through a fair slice of the gain
05:48 <Ke> not sure bash is a thing you want consolidated this way
05:48 <Ke> more like I do it, because I have no shame, but things that consolidate should move to a real programming language most of the time
05:48 <abathur> right, but bash is the Nix build shell, so it's the fungible unit
05:49 <Ke> hmm, you mean nix is considerably slowed down by bash?
05:49 <Ke> because nix itself is dead slow
05:49 <Ke> in my experience
05:49 <abathur> oh, sure
05:50 <Ke> and to note, I use nixos on aarch64
05:51 <samueldr> savings in nix wouldn't affect builds, and vice versa
05:51 <samueldr> saving in builds wouldn't affect nix evals
05:51 <abathur> my question isn't so much about *me* getting speed
05:51 <samueldr> yeah, is there some benefit to emphatten bash with more builtins *at hydra scale*
05:51 <Ke> normally, if I get to the build phase, I don't do things interactively, so it does not matter
05:51 <abathur> more like, at scale, is there enough on the table that it's worth the work
05:51 <Ke> nix evaluation I often watch, so I am not really fair that way
05:52 <energizer> but like, who's paying and do they care
05:52 <abathur> i.e., without having to go rewriting all of the code
05:52 <Ke> performance is not scientific time, but the use cases it allows for
05:52 <Ke> either you wait for things or you need to get coffee
05:52 <samueldr> this is a situation where the individual thinking a thought wouldn't work on the Nix eval side of things
05:52 <samueldr> so it's no good to go "but you should instead work on Nix eval" or similar
05:54 <Ke> maybe I shouldn't discourage others from doing things sure, if that's what you mean
05:54 <Ke> GNU people will probably do that for me though
05:55 <Ke> I made a patch for truncate that truncated a sparse file to trim out the sparse tail, maintainer said the patch should maybe go to fallocate, which is obviously wrong, as the syscall it finally makes is truncate
05:57 <Ke> which better than the silence you get on most projects, never does anyone just tell you they don't want your path
05:57 <Ke> patch
05:57 <abathur> energizer: if you want to poke at the existing "example" loadable builtins, a few of which are also in coreutils (though generally skeletal in comparison), you can look in `/nix/store/*-bash-*/lib/bash/`
05:58 <abathur> and then like, `enable <abspath> <builtin-name>` I think to load one
05:59 <Ke> loadable here does not mean .so, does it?
06:01 <abathur> /nix/store/lpiwyrgzffhndmxsx4b50y7as98qf3qv-bash-interactive-5.1-p4/lib/bash/accept: Mach-O 64-bit dynamically linked shared library x86_64
06:02 <abathur> I think that's equivalent? yeah?
06:03 <Ke> wow
06:03 <Ke> is that normal GNU bash?
06:04 <abathur> the ones nix builds all come in the source under examples/loadables iirc
06:04 <abathur> or similar
06:05 <Ke> /nix/store/ysi1wbcv30pcwbr06s66qx3li56vv2fp-bash-interactive-4.4-p23/lib/bash/printenv: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, not stripped
06:05 <Ke> I also learned something today
06:05 <abathur> https://git.savannah.gnu.org/cgit/bash.git/tree/examples/loadables?h=bash-5.1
06:06 <Ke> so can one just add more plugins there so that they work
06:06 <Ke> or are they hardcoded
06:06 <abathur> you can code up your own, don't even technically have to be there
06:06 <abathur> but you do have to pay the reaper to load them
06:07 <samueldr> I think this is done using the same mechanisms: https://github.com/taviso/ctypes.sh
06:07 <samueldr> not the ffi interfaces, but the way it gets loaded
06:07 <abathur> I found out about all of this when I stumbled on to https://github.com/zevweiss/booze
06:07 <abathur> that would make sense
06:08 <Ke> wonder, if bash does fork for the plugin calls
06:09 <Ke> if the coreutils do not clean up after themselves, it might hurt
06:09 <Ke> if there is fork, would not matter
06:09 <Ke> but perf is not that much better either
06:11 <abathur> you can see the end of each one for some common data structures that seem to "define" each builtin
06:11 <abathur> e.g. https://github.com/zevweiss/booze/blob/master/booze.c#L684-L711
06:11 <Ke> also much of the utils use is in pipes, where you can't do things like this
06:11 <abathur> and likewise https://git.savannah.gnu.org/cgit/bash.git/tree/examples/loadables/printenv.c?h=bash-5.1#n80
06:12 <abathur> not sure I follow; can't do what in pipes?
06:13 <Ke> have things in same process without implementing scheduling and stuff
06:13 <Ke> that I do not think bash has
06:13 <V> abathur: you'd be surprised at how many basic UNIX commands rely on implicit behaviour like SIGPIPE
06:14 <Ke> like aa=$(builtin1 | builtin2)
06:14 <V> if you cat /some/humungous/file | head, cat automatically knows when to exit b/c it hits SIGPIPE b/c head exits
06:15 <V> there is no explicit communication here
06:15 <V> and scheduling is handled by the kernel, because it knows when the pipe is full. if it's still full, the blocked process (cat) will not be woken
06:16 <Ke> I guess one can use poll and nonblock io there also
06:17 <V> sure, but you're reimplementing a preexisting kernel feature
06:17 <V> what's the point
06:17 <Ke> often when I do pipes, I do want the parallelism
06:17 <Ke> one part of the pipe is often lzop eg.
06:17 <Ke> well if you want to keep things in the same process
06:18 <V> write a unikernel ;)
06:19 <abathur> hmm
06:21 <abathur> energizer sorry, I lied, you need `enable -f filename builtin` :)
06:22 <Ke> this will look bad on your credit score
06:24 <abathur> :[
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment