Skip to content

Instantly share code, notes, and snippets.

@c-cube
Last active August 1, 2017 17:48
Show Gist options
  • Save c-cube/7e58ac8d0a1a2797994b0b74fe357628 to your computer and use it in GitHub Desktop.
Save c-cube/7e58ac8d0a1a2797994b0b74fe357628 to your computer and use it in GitHub Desktop.
plans for OCaml's stdlib

Plans for the future of OCaml's stdlib

One of the most common complaint about OCaml, from both newcomers and veterans, is that the stdlib is lacking in several domains. Among these we can list:

  • some modules are absent but should exist (e.g. Option)
  • some modules are present but lack some functionality (e.g. List could have many more combinators)
  • the lack of some transverse features (iterators, printers, monadic operators…)
  • the lack of generality of some constructs. Notably, in_channel and out_channel are closed types and are not composable. It is impossible to create them from, say, string buffers, or to create a gzipped/crypted channel from a regular one.
  • the IO system is quite limited overall.
  • some modules are old and not used much anymore (genlex, num, perhaps stream) and could be moved into their own opam package for retrocompat.
  • documentation is good but succinct. More examples are needed, as well as a decent introduction to every module. Rust's stdlib and documentation could act as an example to follow.

As noted in a PR this can be helped by having a committee with a clear vision and the means of acting. Also, see the existing guide for contributing.

Constraints

  • almost absolute backward compatibility. likely exceptions for now:

    • String (with -safe-string)
    • Bigarray (move to core stdlib, except the mmap feature which should go into Unix) This is at odds with the goal of being consistent (notably for fold functions and other functions that have inconsistent signatures throughout the stdlib)
  • portability on all platforms (except for the specific functions in Unix which already aren't)

  • performance (need some benchmarks to measure it)

  • cleanliness of code, for maintainability

    • remove uses of Obj as much as possible (already well in progress, see e.g. Queue.t)
    • avoid introducing hacks that could break with multicore/flambda/jsoo/…
  • more testing. Even with OCaml's type system, bugs can easily sneak if there is no strong test suite. NOTE: there are some efforts towards making a certified stdlib for OCaml; cooperation with this work would be very helpful.

  • no pollution of existing namespaces. Nothing should be added to Pervasives without a very good reason. Most likely exceptions to that rule: function combinators such as composition or reverse composition.

Goals

  • improve existing modules by adding much needed functionalities. These new functionalities should be general purpose and useful enough. A good criterion to judge if something is missing is to look at what already exists in 3rd party libraries (core,batteries,containers,fmt,astring,…) and in other, close languages such as F# and Haskell. In particular:

    • many functions and combinators in List.
    • many functions and combinators in String
    • safe (non-exception raising) functions in most containers, such as Map and Hashtbl. There are already functions such as find_opt in the current trunk.
  • make some existing functions safe (w.r.t. overflows), in particular in List. Whether very long lists are good is up to discussion, but we should respect the principle of least surprise.

  • add the more direly missing modules. Candidates would be at least:

    • IO to put all the portable IO related functions in one place, with convenient names. Some functions (read_all, with_file, etc.) should also be added asap.
    • Option with all the usual operators (intersection of existing stdlibs?)
    • UString module (possibly a private alias to string that ensures everything in proper utf8). We need unicode, at least the basics.
      • Good source of inspiration might be D. Bünzli's libraries, we could include at least the equivalent of uutf for UTF8.
      • The other features might still be in 3rd party libs.
      • As pointed out by @dbuenzli it is best to keep moving targets outside of the stdlib (anything that depends on new versions of unicode).
      • We also need to ensure that these libraries can work with the new UString.t.
    • maybe a Heap structure. It is very commonly useful, and a reasonably efficient functor is useful enough to deserve being in the stdlib.
  • make IO channels more generic (ability to create them from a record of functions would be perfect). NOTE: what are other types that would benefit from similar extensibility?

  • consistently add some transverse functionalities to every existing (and new) module and type defined. These should include at least:

    • printers (based on Format) or printer combinators (for polymorphic types)
    • specialized hash function
    • specialized equality function
    • specialized comparison function
    • conversion from/into some common iterator type. See this PR for a detailed discussion and proposal.
    • whenever relevant, safe resource handling functions with_resource : resource -> (acquired_resource -> 'a) -> 'a that properly release the resource whether the function fails or succeeds.
    • whenever relevant, add an Infix module to existing ones, containing only the module's infix operators. This allows (local) opens to not pollute the namespace too much.

Roadmap

There needs to be a committee that leads this effort. The committee should be reasonably small and contain people from various parts of the community. Typically, at least one person from each of {core,batteries,containers} and some other notable library writers. Example: @gasche, @c-cube, @dbuenzli, @diml, @aantron, Stephen Weeks from JST, who all have various expertise domains.

Some code can be adapted (after careful vetting) from existing 3rd party stdlibs, if licenses permit. I can at least vouch for the possibility of moving some code (and tests!) from ocaml-containers.

Some new development is also needed for some of the issues raised above. Some of the features proposed already exist as PRs; some others will need to write new patches (e.g. Infix modules, or extensible channels).

At some point we also need to write better testing tools that are focused on the stdlib. Property based tests would help a lot.

@dbuenzli
Copy link

Here are a few comments:

  1. Constraints. Even if that could take the form of a betting game, I would add "forward looking", i.e. has to take into account the longer term evolution of the language.
  2. I do have reservations on the Unicode bits. As I said more than once I think it's better if we keep the moving part of the Unicode standard (e.g. uuseg which need to be updated yearly) outside the stdlib. I do however have a few designs for a unicode string data structure that I could dig out at a certain point but still, it would, maybe be better of living outside the stdlib for a while as the design space is quite large.
  3. I would also be careful with the IO bits, OCaml is being use more and more in constrained environment where IO as it exist in general purpose operating systems does not necessarily make sense. FWIW base seem to have left IO out of its scope which I somehow find a good thing. The IO story is also going to significantly change with the advent of multicore and effects.
  4. It would be good to have a check list which can be used to quickly evaluates stdlib PRs. This should starts from the nature of the addition, is it a new function in a module ? is it a new module ? etc. and then goes on by asking a few questions to make a basic assessements of the proposal (does it respect existing naming conventions, do we have evidence that the function is needed, do we have evidence that this helps general interop, is it better off living outside the stdlib etc. etc.)
  5. Regarding deprecation of old modules, I think (this)[https://caml.inria.fr/mantis/view.php?id=7400] should be considered aswell.

@damiendoligez
Copy link

I would like to nominate Stephen Weeks (at Jane Street) for the committee, if he can find the time. Given his experience, any input you get from him will be extremely valuable.

@yallop
Copy link

yallop commented Jun 30, 2017

safe (non-exception raising) functions in most containers, such as Map and Hashtbl. Some consistent naming must be found for this, for example get : key -> value option.

Don't we have this already, since ocaml/ocaml#885?

@c-cube
Copy link
Author

c-cube commented Jul 2, 2017

@yallop indeed, I forgot that. The point still stands :-)
@dbuenzli good point for moving targets of unicode (I still think a good UString.t type backed by string/bytes in utf8 + uutf, as in rust, would be very helpful)
I'll update the proposal with your comments.

@c-cube
Copy link
Author

c-cube commented Jul 3, 2017

@dbuenzli About IO, I think that since retrocompatibility is critical, and the stdlib already contains IO functions, it makes sense to keep these functions. On top of them (basic file/channel handling, mostly) it is perfectly doable to build a decent IO module with functions such as read_all : channel -> string that everyone reimplements a few times — the same as everyone reimplements string basics a lot.

(as an aside, I think that it's also important to consider that since IO use string/bytes, we should provide a utf8-aware mechanism that is compatible with these)

@c-cube
Copy link
Author

c-cube commented Jul 3, 2017

I would even propose that Daniel merges some of its libraries (if he's willing to) into the stdlib. In particular, fmt and astring, whose documentation is way better than what we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment