-
-
Save abathur/74e7a63b25b7bbd4a6fa9ad7e728ab70 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
05:32 <abathur> this may end up being a dumb question, but does anyone with a decent grip on C have a guess at how much effort it'd take to compile any given coreutil into bash as a builtin? | |
05:33 <samueldr> I guess it's more about coreutil's design, no? | |
05:34 <samueldr> if a given function is self-contained, then it should be pretty trivial AFAIK | |
05:34 <samueldr> if it uses a bunch of helpers from within coreutils, less so | |
05:34 <abathur> that's my rough guess, especially since at least some coreutils grew out of shell builtins | |
05:36 <samueldr> (probably misusing some terminlogy) abathur: probably as complex as how many symbols it uses that are not from its own compilation unit | |
05:38 <abathur> (I've been vaguely wondering if there's much time on the table at, say, hydra scale, by generating a build-bash with some of the most-common commands built in | |
05:38 <abathur> ) | |
05:42 <abathur> ls, cat, touch, tee, cp, mv, mkdir, ln, install, sort, uniq, stat, chmod, chown? | |
05:42 <samueldr> abathur: isn't that called busybox? | |
05:42 <samueldr> ;) | |
05:42 <abathur> it do | |
05:42 <samueldr> I kid | |
05:43 <samueldr> it sure is an interesting idea to fold in commonly used coreutils into bash itself | |
05:43 <samueldr> would it make sense to, at that point, fold all of coreutils into bash rather than pick a few tools? | |
05:44 <samueldr> bigger bash, but we were going to add a bunch of them anyway | |
05:44 <abathur> it was something I noticed when I played around with the bash FUSE, which comes with a builtin--I'd never played with the "loadable" builtins | |
05:44 <energizer> abathur: have you benchmarked any of them? | |
05:44 <samueldr> and they already know how to live in a single binary, given coreutils is... | |
05:44 <samueldr> uh... that thing... | |
05:44 <samueldr> multi-call binary? | |
05:45 <abathur> yeah, fair question, I'm not sure if it's relatively more/less work to split them or do them all | |
05:45 <samueldr> abathur: I wonder how hard it'd be to make coreutils.so out of coreutils, and use ctypes.sh | |
05:45 <abathur> hehe | |
05:45 <abathur> well | |
05:46 <abathur> there's already a concept of loadable builtins that are precompiled but added with 'enable' at runtime | |
05:46 <abathur> but there's a fair amount of overhead when you first enable, like 2-400ms iirc | |
05:47 <abathur> you can mitigate that by including many builtins in a single file | |
05:47 <Ke> notably, if you want speed, many things can be done with pure bash | |
05:47 <Ke> busybox can do many of the busybox commands without exec | |
05:47 <Ke> yes | |
05:47 <Ke> multicall | |
05:47 <Ke> like reading files and finding and transforming output | |
05:47 <Ke> so if you run ftruncate from busybox ash, there is no fork-exec only ftruncate syscall | |
05:47 <abathur> but, I assume in a build context that enabling them for every bash invocation would chew through a fair slice of the gain | |
05:48 <Ke> not sure bash is a thing you want consolidated this way | |
05:48 <Ke> more like I do it, because I have no shame, but things that consolidate should move to a real programming language most of the time | |
05:48 <abathur> right, but bash is the Nix build shell, so it's the fungible unit | |
05:49 <Ke> hmm, you mean nix is considerably slowed down by bash? | |
05:49 <Ke> because nix itself is dead slow | |
05:49 <Ke> in my experience | |
05:49 <abathur> oh, sure | |
05:50 <Ke> and to note, I use nixos on aarch64 | |
05:51 <samueldr> savings in nix wouldn't affect builds, and vice versa | |
05:51 <samueldr> saving in builds wouldn't affect nix evals | |
05:51 <abathur> my question isn't so much about *me* getting speed | |
05:51 <samueldr> yeah, is there some benefit to emphatten bash with more builtins *at hydra scale* | |
05:51 <Ke> normally, if I get to the build phase, I don't do things interactively, so it does not matter | |
05:51 <abathur> more like, at scale, is there enough on the table that it's worth the work | |
05:51 <Ke> nix evaluation I often watch, so I am not really fair that way | |
05:52 <energizer> but like, who's paying and do they care | |
05:52 <abathur> i.e., without having to go rewriting all of the code | |
05:52 <Ke> performance is not scientific time, but the use cases it allows for | |
05:52 <Ke> either you wait for things or you need to get coffee | |
05:52 <samueldr> this is a situation where the individual thinking a thought wouldn't work on the Nix eval side of things | |
05:52 <samueldr> so it's no good to go "but you should instead work on Nix eval" or similar | |
05:54 <Ke> maybe I shouldn't discourage others from doing things sure, if that's what you mean | |
05:54 <Ke> GNU people will probably do that for me though | |
05:55 <Ke> I made a patch for truncate that truncated a sparse file to trim out the sparse tail, maintainer said the patch should maybe go to fallocate, which is obviously wrong, as the syscall it finally makes is truncate | |
05:57 <Ke> which better than the silence you get on most projects, never does anyone just tell you they don't want your path | |
05:57 <Ke> patch | |
05:57 <abathur> energizer: if you want to poke at the existing "example" loadable builtins, a few of which are also in coreutils (though generally skeletal in comparison), you can look in `/nix/store/*-bash-*/lib/bash/` | |
05:58 <abathur> and then like, `enable <abspath> <builtin-name>` I think to load one | |
05:59 <Ke> loadable here does not mean .so, does it? | |
06:01 <abathur> /nix/store/lpiwyrgzffhndmxsx4b50y7as98qf3qv-bash-interactive-5.1-p4/lib/bash/accept: Mach-O 64-bit dynamically linked shared library x86_64 | |
06:02 <abathur> I think that's equivalent? yeah? | |
06:03 <Ke> wow | |
06:03 <Ke> is that normal GNU bash? | |
06:04 <abathur> the ones nix builds all come in the source under examples/loadables iirc | |
06:04 <abathur> or similar | |
06:05 <Ke> /nix/store/ysi1wbcv30pcwbr06s66qx3li56vv2fp-bash-interactive-4.4-p23/lib/bash/printenv: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, not stripped | |
06:05 <Ke> I also learned something today | |
06:05 <abathur> https://git.savannah.gnu.org/cgit/bash.git/tree/examples/loadables?h=bash-5.1 | |
06:06 <Ke> so can one just add more plugins there so that they work | |
06:06 <Ke> or are they hardcoded | |
06:06 <abathur> you can code up your own, don't even technically have to be there | |
06:06 <abathur> but you do have to pay the reaper to load them | |
06:07 <samueldr> I think this is done using the same mechanisms: https://github.com/taviso/ctypes.sh | |
06:07 <samueldr> not the ffi interfaces, but the way it gets loaded | |
06:07 <abathur> I found out about all of this when I stumbled on to https://github.com/zevweiss/booze | |
06:07 <abathur> that would make sense | |
06:08 <Ke> wonder, if bash does fork for the plugin calls | |
06:09 <Ke> if the coreutils do not clean up after themselves, it might hurt | |
06:09 <Ke> if there is fork, would not matter | |
06:09 <Ke> but perf is not that much better either | |
06:11 <abathur> you can see the end of each one for some common data structures that seem to "define" each builtin | |
06:11 <abathur> e.g. https://github.com/zevweiss/booze/blob/master/booze.c#L684-L711 | |
06:11 <Ke> also much of the utils use is in pipes, where you can't do things like this | |
06:11 <abathur> and likewise https://git.savannah.gnu.org/cgit/bash.git/tree/examples/loadables/printenv.c?h=bash-5.1#n80 | |
06:12 <abathur> not sure I follow; can't do what in pipes? | |
06:13 <Ke> have things in same process without implementing scheduling and stuff | |
06:13 <Ke> that I do not think bash has | |
06:13 <V> abathur: you'd be surprised at how many basic UNIX commands rely on implicit behaviour like SIGPIPE | |
06:14 <Ke> like aa=$(builtin1 | builtin2) | |
06:14 <V> if you cat /some/humungous/file | head, cat automatically knows when to exit b/c it hits SIGPIPE b/c head exits | |
06:15 <V> there is no explicit communication here | |
06:15 <V> and scheduling is handled by the kernel, because it knows when the pipe is full. if it's still full, the blocked process (cat) will not be woken | |
06:16 <Ke> I guess one can use poll and nonblock io there also | |
06:17 <V> sure, but you're reimplementing a preexisting kernel feature | |
06:17 <V> what's the point | |
06:17 <Ke> often when I do pipes, I do want the parallelism | |
06:17 <Ke> one part of the pipe is often lzop eg. | |
06:17 <Ke> well if you want to keep things in the same process | |
06:18 <V> write a unikernel ;) | |
06:19 <abathur> hmm | |
06:21 <abathur> energizer sorry, I lied, you need `enable -f filename builtin` :) | |
06:22 <Ke> this will look bad on your credit score | |
06:24 <abathur> :[ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment