Skip to content

Instantly share code, notes, and snippets.

@shelby3
Last active May 24, 2016 18:57
Show Gist options
  • Save shelby3/19ecb56e16f159096d9c178a4b4cd8fd to your computer and use it in GitHub Desktop.
Save shelby3/19ecb56e16f159096d9c178a4b4cd8fd to your computer and use it in GitHub Desktop.

jl777 wrote:

why not have the way you allocate the memory determined how it behaves?

In a garbage collected (GC) language such as JavaScript, you don't actually ever explicitly allocate and deallocate memory. Thus we avoid the bug of crashing when a pointer points to memory that has already been freed, or the memory leaks due to forgetting to deallocate memory. GC languages can still have semantic memory leaks though, such as forgetting to remove an event handler. Automatic reference counting pointers have the flaw that they memory leak where there is a circular reference (pointer points to some structure which contains a pointer which points back to the struct that contains the first pointer).

So GC is much easier for the programmer than manually tracking memory allocation and deallocation. But it has one disadvantage which is that it requires up to 5X the memory to achieve the same performance, and the mark-and-sweep pauses can lockup the machine for many seconds. There are ways to improve the GC. The main challenge is reducing the number of temporary objects that hang around too long and/or the GC cost of tracking and deallocating. This can be improved by using a generational GC and employing some strategies similar to those used in Google's V8 JavaScript engine to group small size objects:

http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection

http://v8project.blogspot.com/2015/08/getting-garbage-collection-for-free.html

I am going to improve upon V8 by automatically tracking temporary objects employing the compiler to automatically deallocate them without ever relying on the GC. Thus much of the garbage will never hit the generational GC. The Rust language is doing this, but instead of doing it only for temporary objects, it is doing it every where which becomes a P.I.T.A. (pain in the ass), because in general imperative logic can't be shoehorned in a DAG structure on references (aka pointers) borrowing. I made this point recently in the Rust forum, and as you can imagine I instantly became a "persona na grata" there and I was asked to move the discussion to the less trafficked "internals" forum. I see many flaws in the design of Rust, but also I see some good ideas there and it appears to be a big improvement on C++.

Temporary objects are those that get allocated for very short period of time, such as allocating a hash key to pass as the input to a function which does a lookup in a hash table. This key might be freed quickly after use. Often we can free these automatically by observing when they go out-of-scope of a stack frame, but we have to track if we've shared the pointer with another data structure that is holding it longer term. This is why we have the compiler track the borrowing of pointers (aka references) automatically. Borrowing means that when a function returns it gives back the ownership of the pointer to the caller and relinquishes its usage of it. Thus the compiler is tracking that the said function didn't have the side-effect of storing the borrowed pointer in another more long-lived data structure on the heap.

dealing with memory is most of what programming is

Well not entirely but agreed a very significant portion of what programming is, as explained above.

delegating that to an automated process is fine if there are no critical performance issues.

Agreed as I also mentioned above, but the goal is to make something which is as efficient as manual memory management but automatic and guaranteed to never crash with a "use after freed" error and to eliminate more memory leaks.

If you can make your language support C as the low level basis, then I would be able to use it.

In low-level mode, it should end up looking and feeling very similar to C, but there will be some differences. I do think it will be very easy for you to grasp and use. And this would enable your code run every where since the compiler will transpile (output) to JavaScript.

The thing about languages is that the langauge is not the only thing that matters, the libraries that are available is the bottleneck, especially in this day of mega-packages

a language without libraries is not very useful. most of the cutting edge packages use C, so supporting C automatically levarages existing libs

Absolutely critical to have great and abundant libraries. Realize that JavaScript is gaining many libraries now because of Node.js. The library repository is NPM. So there is a lot of momentum in this direction, the world really needs a better compile-time checked high-level language. The choices out there now pretty much suck for one reason or another:

  • C++ has a 1500 page manual, is fugly verbose complexity and corner case hell of duplicity. Not elegant at all. Too many ways to do the same thing.
  • C is too low-level to capture high-level semantics and thus for high-level code it is akin to building the walls of a house by stacking grains of sand instead of bricks. And the actual high-level semantics are obscured (hidden) by all the unnecessary detail, i.e. giving the programmer too much low-level control.
  • Java is too verbose and limited. No unsigned data types! WTF! Everything has to be a fucking class or method! OOP and subclassing are anti-patterns, don't ever use this method of programming!
  • Scala is a horror of mirrors of combining too many different programming paradigms achieving a clusterfuck of complexity and being a master of none.
  • Haskell is too obtuse for non-mathematicians. And the lazy evaluation, memoization, and referential transparency every where shoehorns all imperative programming into an obtuse, fugly monadic model, while burdening the programmer with time and space indeterminacy. Have fun debugging that shit when have no way to know when a snippet of code will execute! The main claim to Haskell's fame is functional composability under laziness, but I have devised a way to achieve this with an eager evalution (normal style of) programming language.
  • Rust appears to be a major improvement with typeclasses instead of that OOP anti-pattern and it does both high-level and low-level similar to C++ but hopefully with less complexity. But some of their design decisions appear to me to be not good 80/20 priorities as I alluded to earlier. Also no compilation to JavaScript, which I must have both for the cross-platform JIT execution and JS's ubiquity. The era of having users download a different binary for each platform is gone and dead.

I would be surprised if there is any high level construct that cant be implemented in C via macros plus functions

Although C can be compiled to JavaScript with Emscripten to obtain the ubiquity and JIT, the Emscripten code is fugly I guess because it passes through LLVM bytecode first, thus the resultant JS is virtually incomprehensible to debug. I don't know of any debuggers yet which can handle that. I debugged my C with a normal C developement environment and just prayed there were no nuanced differences with the Emscripten output. The Chrome/Google WebAssemby route is not ubiquitous.

One example is that C can't stop you from accidentally reading from unintialized memory and writing+reading past the end of allocated memory block, thus opening your code to innumerable security holes. You'll never know if you've caught all these bugs which become Zeroday exploits, because all your code is written with grains of sand detail. Rather low-level code should only be used for the 5 - 20% of the code that needs to be finely tuned by an expert. High-level programming languages result in much faster programs most of the time in the real world scenario (not benchmarks contests). High-level programs are less verbose, provide more consistency of structure and performance, and enable less opaque expression of semantics (i.e. the program's logic and purpose). C obscures the actual sematics in loads of unnecessary low-level details as such as manually managing pointers, inability to express and abstract over high-level data structures such as collections and functional composition.

The list of high-level abstractions that can't be accomplished with C would fill an encyclopedia. For someone who has never mastered a high-level language to pontificate about how C can do everything a high-level language can do, seems to be Dunning-Kruger-esque. I like C a lot for low-level control, and I programmed some of a 100,000+ lines of code (actually perhaps a million lines of assembly with some C) program in C in the 1980s (my Neocept WordUp), but I couldn't fathom writing all my code now in C given all the advances in PL design.

Here are some references:

http://programmers.stackexchange.com/questions/116863/can-you-learn-functional-programming-in-c

https://news.ycombinator.com/item?id=1034549

http://stackoverflow.com/questions/216037/what-tools-are-there-for-functional-programming-in-c

https://www.google.com/search?q=functional+programming+in+C

name a higher level construct that cannot be achieved with C

Please re-read what I wrote. Yes you can use a screwdriver as a forklift. You really can. But most people realize you shouldn't.

if you cannot name a high level construct that cannot be done via C, then I will assume I am correct

I mean that C can be made to do everything that I can do in a high-level language, just like there is probably a way to use a screwdriver to accomplish the same job as a forklift, but there is a reason we don't use screwdrivers to do the jobs of forklifts and consumerately why we don't use C to do the job of a higher-level language.

Macros and functions are not enough. Read the references I cited for you. They explain why. You don't seem to understand the differences in type systems. C's type system can't model certain abstractions. Macros in C are not typed, they are outside the scope of the type system and have no knowledge of types (well there are some recent extensions that give them a very minor type information but still no where near complete).

It doesn't matter how good of a C programmer you are, because there are abstractions that can't be expressed in C's type system. They can't be checked by the compiler.

I love C, when I need its low-level control while forsaking high-level abstraction. I hate C, when it's type system can't model (and type check) the abstraction I need to express. I use C where it fits. I don't shoehorn C where it doesn't fit. I agree that all the high-level languages have design warts. But that is what I want to fix. And I shouldn't have to resort to FFI to call into C-like low-level code.

in general it takes me less lines of C code to get the same functionality that the C++ bitcoind has

Comparing the way one group writes C++ to C is not that relevant. Instead compare bitcoind to the Scala implementation by https://iohk.io/ which I've read is several times less LOC and more high-level structured expression.

not sure what kinds of type systems you havent been able to model

Of course you don't. That is what you need to learn.

unions can be used to combine N lower level types into a combined higher level type

Unions are not first-class in C.

and I also have a FSM interpreter that is runtime creatable/verifiable

You are talking algorithms. Algorithms can be done in any language. Let's not confuse that with type checking.

High level type checking is for example abstracting over collections of collections of function types which input collections of collections. Shit like that.

C can do that

It can't type check it.

at runtime it can. I do

Runtime is not compile time checking.

compile time checking for general case, that seems like a very hard problem to solve and not needed to be solved

When you learn more about the use cases, you will learn that you assumption is incorrect. I don't have the patience to explain those use case here in this chat, but if you don't have to deal with all the complexity of C++ maybe it will be more pleasant for you to see what those use cases are. I would have achieved my goal if so.

GC is possible

GC as library is not the same at all. No where near the same performance integration is possible.

for memory usage that needs to be verified, I use a special malloc that tracks all the usage/freeing, etc.

of course, I also have a special allocate/free approach for the performance critical parts

i am guaranteed to never leave any loose pointers

But you can't free circular references with that ARC (automatic refcounting), thus you leak memory in the general case. Also ARC is pathologically slow in the multi-threaded shared (even immutable) data case.

I do a single init time allocation. then during runtime, I allocate space from the memory. after I am done, then I just clear the data. So by not doing runtime alloc/free there is no chance of messing it up

there could be 1 or thousands of "allocs", but at the end, just by clearing the amount used, all usage is freed

If you can determine the limit of the memory you need at init time, then your program is not Turing-complete. That may be the case with a Bitcoin block chain, since it avoids Turing-complete scripting. But in general that is not true for programming.

Also your init time solution is irrelevant, for if you leak memory then you will eventually exhaust your init time allocation.

My point is unarguable. If you leak deallocations, you eventually exhaust memory pool. There is no way to argue against that. You may claim you don't have leaks.

You may have some simplistic scenario that can't create circular references. That is not applicable to general programming.

How do you insure you never read/write past the end of your allocated array?

you mean bugs?

Yep.

unix seems to be very strict about going past the allocated memory

At 4096 byte page boundaries perhaps.

maybe it is possible to go way past the end of a buffer and it wont page fault, but usually even if going past by a bit, it page faults

Not fault, but rather reading the wrong data injected by the hacker.

Hacker is injecting some data into memory that you read by accident; because he observes one of your arrays sometimes precedes a certain string that you store in memory. For example, you receive some input string from the Internet, store that on the heap, then with certain random probability that string ends up being read from due to a bug in your code that reads from an array on the heap. In this way, the hacker has injected some data where you did not expect it. This is one way how Zeroday exploits are made.

You see a high-level language can be sure you don't have security holes and zeroday exploits related to the low-level control over pointers. The C compiler can't check that for you.

Now you just learned something new. And there are many other things you need to learn about the differences between high-level coding and low-level.

any language that doesnt allow usage of pointers would seem to be quite a bit slower. and if something is so slow you cant run it, what good is it that it is theoretically more secure.

It is not the case that all code needs to be faster. Actually what we most often need is for coding to be faster. ;-)

so the issue is what language can create things with fewer bugs. for me, that is C.

I doubt anyone would agree that one can create less bugs on average with a low-level language. I use low-level coding when the performance needed requires it.

There seem to be millions of coders for high level stuff, I will continue as low level coder to balance things out a bit

Well for sure being able to write tight C code and also highly optimized assembly code (SIMD, etc) is very much needed for those portions of a program which are performance critical. The demand for that skill will not diminish. One advantage I hope to achieve is one could do low-level and high-level in the same language, so that it is easier for C-like programmers to collaborate with high-level programmers.

My point is not to eliminate C-like coding. Rather to avoid the crap you going through right now with WebAssembly.

We need a well integrated ubiquitous solution. Not all this piece-wise stuff.

C++ is a beast. A horror.

All that OOP crap in C++ and other crap that we don't really need to make a good high-level language. C++ templates are a fucking nightmare.

yes. i think if it can be linkable with C libs and supports a way for extern "C", and has threads

I think there will maybe be a way to transcode C to my language. It will look nearly identical. I am not sure about linking to C libraries. That seems to defeat the entire point of trying to make a ubiquitous (run every where, JIT) integration. We could transcode C to JS with Emscripten, but I think better to transcode C to my language because it removes lower layer dependencies.

I will leave the fancy higher level stuff to others

But the high-level stuff can be made comprehensible and sane. Typeclasses (aka traits in Rust) go a long way to making it simple:

https://doc.rust-lang.org/book/traits.html

That is not the same as OOP. Because a trait can be implemented on any struct at any time in the future.

ok, convenient, but nothing that cant be done via C union and a single function that calls function pointers.

No that is not a substitute, because that solution does not make the two (traits and structs) orthogonal w.r.t. to DNRY. Rather what we want is to be able to compose three distinct things as separation-of-concerns:

  1. Data types, e.g. struct.
  2. Interfaces, e.g. trait.
  3. Polymorphic function bounds, e.g. trait bounds.

That is an example of something C can't type check.

Seriously though, I hate complexity. I love the simplicity of C. The K&R C book is a masterpiece!

me too. I do my best to create complex behavior out of simple things

Me too ;)

I think when you hear "high-level" you think of a nightmare you had about reading C++ gobbledygook.

if you can make the language deal with multiple nodes, that would be nice

also maybe it can verify that the consensus rules wont have any exploits

and order ice cream on hot days

Of course you wouldn't want to build a general purpose PL that has A.I.. You'd build that as a library.

Build from building blocks. I am just saying there are very relevant use cases that require higher-level building blocks than C's type system offers. Can't accomplish the type checking with a library.

i will limit myself to problems that C can do

C can handle any problem. That is not my point.

We use tools that make us more productive and help us check things that we can't remember to check every time ourself.

https://github.com/tensorflow/models/tree/master/syntaxnet

Again you are talking algorithms. I already made that point earlier. Algorithms are not type systems.

ok. I am a C coder and will keep doing algorithms to solve things, maybe even algorithms that do runtime type systems :)

Runtime type checking is no type checking. Because due to the Halting problem (Alan Turing) it is impossible to know if your program has run all its code paths. Only a compiler can make guarantees about correctness.

if hackers or cosmic rays or hardware failure are part of the correctness space, it seems nothing will be perfect. anyway,

That is like saying if new incurable disease are part of medicine, then it seems improving medicine will never be perfect anyway.

Seriously i think it was a productive exchange. I think we should share it with the public, or do you prefer it remains private?

I'd like for people to know my justification for expending effort on this task.

please dont post to bitcointalk

it will distract you. just when it is done, you can announce it

How about I put in a Gist but never mention it is you? So I have permanent record. And never post to BCT.

as long as it is not bitcointalk (or reddit) sure

getting involved with toxic waste dumps, just gets toxic stuff on you.

Should I mention it is jl777 or just say it is a conversation with an anonymous person?

you can use jl777

yep. I am done with BCT. Waste of our time. Let's go kick their ass with code.

they really said I ran for mayor of a city??

DecentralizedEconomics says that. Try to check eca.sh's last post at BCT and see if he ever replied with proof.

I never went back to check. I don't want to load BCT again. Will just make me angry.

dont even know this DecentralizedEconomics and he sure knows nothing about me

Welcome to BCT. Lol. You might want to refute him. eca.sh said he contacted you and you stated you'd never ran for mayor. You might want to back up that, if you care to.

Okay James. Thanks for the discussion. I better go eat then code. See ya.

@shelby3
Copy link
Author

shelby3 commented May 22, 2016

@keean wrote:

When you have to write everything at the bottom level of the abstraction hierarchy like in C, then saving 50% of your time with a garbage collected language seems like a good saving.
...
However once that is done the remaining program becomes trivial and I no longer have to think about memory, and the time saving from the better abstraction makes the GC or no GC issue look insignificant.

Seems what would be ideal is no GC where you've abstracted the memory lifetimes so you can get maximum performance, but retain GC every where else so no chance of "access after freed" faults and no chance of failing to release memory due to circular references. This is the goal I am trying to facilitate.

My experience with C++ tells me it was the STL that really worked having Vector saved having to reimplement so much code

Note where I wrote C++ templates suck, I didn't mean that template libraries aren't incredibly useful. I just mean the template capability of C++ is too powerful and too complex. The templating typing is even Turing complete. Principle of Least Power applies. K.I.S.S..

I don't really need Rust's borrow checking, but rust has raw pointers if you need them, so you can treat is as 'C' with type classes to some degree, if you derive Copy for everything and use raw pointer types. However references are useful because they can't be null.

But IMO (subject to change) Rust's design error is that borrowing should not be the default, because additional noise to turn it off and then the user of the library is under the false impression that borrowing has been enforced globally. I'd prefer to use borrowing only locally where it doesn't need to be turned off.

I can probably live with the move semantics and if you treat references as argument passing conventions it's not too bad, it does add complexity due to the restrictions and the special status of l-values, and this feels like something you cannot abstract away, so I am not sure I like it.

We have to remember that Rust is moving or borrowing potentially two distinct attributes: mutability and allocation lifetime.

In my proposal, moving lifetime is always a delegation to GC and thus is no longer distinguished from borrowing. So if you need to store the reference in any of your function hierarchy under the current function, then you must take a GC reference. I just don't see the 80/20 priority of tracking moves for lifetimes.

I hadn't considered moving exclusive access to mutability. If that turns out to be a needed case, then I will need to augment my proposal with it.

Overall I think Rust has the edge on C/C++ mist of the time, having a lot of the simplicity of C and the abstraction power of C++ (and Haskell). It does have some annoying edge cases, and I do detect a bit of the same attitude that put me off Ada where people looked down on doing stuff with memory, pointers, loops and indexes. There are other algorithms in the world than just map and fold.

Neither of us is wanting to demean Rust, because we both appreciate some of the good ideas in it. I concur that we can't compile-time check the world of imperative semantic possibilities. And we don't want to entirely forsake that imperative control when we need it. I think Rust is being somewhat unrealistic about the programmer's priorities and dreaming of some nirvana of correctness that doesn't not exist and becomes something the programmer is always fighting against. But it is easier for me to criticize than it is for me to improve upon on Rust, so I think one can't truly compare until there is a new language to play with.

@shelby3
Copy link
Author

shelby3 commented May 22, 2016

@keean wrote:

@shelby3 wrote:

Note where I wrote C++ templates suck,

C++ templates with "Concepts" which add some static typing, are not so bad. They are a kind of duck-typed type-class and as such are a better abstraction than all OO-gook in C++.

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

The only exception I would make to the point of the following article about using the most powerful high-level language for applications is that if you want your code to be open source the your choice needs to be reasonably popular:

http://www.paulgraham.com/avg.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment