Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stared/a4bdf65afa1e37fa39ac to your computer and use it in GitHub Desktop.
Save stared/a4bdf65afa1e37fa39ac to your computer and use it in GitHub Desktop.
New scientific markup language - copied from https://hackpad.com/New-scientific-markup-language-utAjFcYuvvB

New scientific markup language

  • I am for open and collaborative culture, but disgusted by edits a few immature vandals (i takes only 2 among crowds of great contributors and discussants).

(It's collaborative! Feel invited to edit, comment, etc.)

Voting:

If you want to support (or contest) a thing already stated here, just:

  • add "+1", "-1",

    *   _Should a -1 refer to the it's next to, or the item it's under? These mean different things in some cases._
    
  • a word (e.g. "slow", "great interface"),

  • or a full fledged comment (then start with a newline and indent).

Problem

LaTeX creates beautiful documents, and makes it easy to include citations, math formulae etc. However:

  • learning curve is steep

    *   +2 (and having to escape apostrophes, etc leads to buggy documents)
    
  • -1 (I disagree: IMO the steep learning curve is mainly in learning the markup as such. Installing TexLive, MacTeX, or MikTeX and compiling some pre-made document is quite straightforward.)

  • It is often "almost" straightforward. That is great if you know a little bit of technology, but if you don't, you're lost. My real world example: I needed someone to check my English writing and I found two experts in arizona. They both had computers and had used them on a semi-daily basis. LaTeX was overkill. Instead, I had them use LyX. Unfrotunately they didn't know how to operate files. So I installed Dropbox and LyX for them. That almost worked. I ended up also having to install a desk sharing software so that I could get on their machines when nothing worked. Of course, installing Lyx and Dropbox and moving the files to the right locations was a no-go as well.

  • Conclusion: It has to be as easy to use as Hotmail/Gmail, or else lots and lots of people will be lost.

  • (The conclusion of all these -1 votes seems to be the LaTeX is the solution, but clearly it is not for all users, so we need some more +1 or additional points).

  • It's trivial in OSX; simply download and install (using the GUI) MacTex and open one of the included editors.

  • It can be difficult for new users to learn to use the markup in LaTeX properly. New users easily run into compilation problems that they cannot decipher.

  • there is non-negligible overhead of preamble,

    *   Which could be automatized.
    
            *   the whole point here is to automatize it; BTW LaTeX is a layer over TeX
    *   (I don't get the downvote; here I was listing *problems* of LaTex)
    *   (I am saying that I do not agree that preamble is a problem)
    
  • some other things are not as simple as they could be (e.g. lists),

    *   -1 LaTeX can always be extended.
    
  • it is hard to make a general LaTeX compiler, so

    *   -1 Do we really want a general LaTeX fetish compiler?
    
            *   we don't, as long as text are written in thing that does not require full LaTeX compilations
    
  • full LaTeX is not suitable for dynamic rendering over web (or ePub, smaller displays, or XML for archiving, etc).

    *   -1 You can always just limit the subset, see: [](https://dev.wlan-si.net/wiki/SandBox)https://dev.wlan-si.net/wiki/SandBox
    
            *   But then you end up with something not powerful enough for all users
    *   That's not just a subset of latex -- the top level of the document is in markdown, only formulas and diagrams are latex.
    
    *   +1 The ability to dynamically rer
    
  • is not semantic markup, doesn't make it easy to impose our own style sheets, to extract information (who is the author, what is this paper that has been cited)

    *   -1 I would argue that LaTeX fetish is by and large semantic markup. TeX is not.
    
    • -1 I agree, LaTeX is very semantic and can always be made even more
  • cannot be easily edited/created on tablets

    *   +2 -2(What _can_  easily be edited on tablets? That's not what they are for.) (There are lots of capable text editors for code, RTF, etc)
    
    • -1 Why it cannot be easily created on tablets? JavaScript implementation is in progress. So we could just throw more hands at JavaScript implementation and this would be it.

** Requirements**

  • Easy transition between different output formats

    *   (plain text, pdf (for different journals), html)
    
  • Supports positioning in a page-based context (see blog links below).

    *   -1 Please, forget pages. We do not need pages anymore.
    
    • Why don't we need pages to create documents and reports? I have not witnessed this future.

          *   Most web sites (including this very hackpad) are in this future.  They are reflown to fit your screen.  Ever tried reading a LaTeX doc on a smartphone?  Even if you play with page geometry, paginated docs are hard to read on small screens because you want to zoom in and out, but you dont want to constantly scroll right and left when zoomed in.
      
  • LaTeX for formulae

    *   -3 Why? TeX has the same high learning curve as LaTeX above. There is no much more harder to learn LaTeX itself than LaTeX formulae.
    
            *   do you know _anything_ _close_ to LaTeX when it comes to mathematical formulea? I would love to see anything, even at the stage of proof-of-principle that is close to LaTeX with that respect; any links? (especially as drawing formulae is way faster that writing them, it would be really beneficial)
    *   +1 and it is easier to write $\sin(x)$ than to learn how to make the whole file compiling; especially as in the first cases you can move gradually, learning only things that you realy need
    *   we are talking about markup not compilation stack, you can imagine a cloud service which gives you a form to input document body and it deals with preamble and compiling automatically
    *   Yes and no. Would be nice to ditch LaTeX entirely.
    
  • Open format

    *   +5
    
  • Documents are compatible with version-control systems

    *   +4
    
    • +3 as long as we keep it plain text, it's going to be compatible to version control. (Except that things like RTF (which is text-based) do not preserve line wraps and exact position of formatting between revisions).
  • Relatively easy to parse

    *   By who or what? (Does it matter?)
    
            *   Easy to parse by a one-pass parser _and_ by humans.. It _does_ matter: the simpler to parse, the harder to make an error (both in text and in parser implementation).
    
  • Packaged format for final distribution - although I don't want PDF, I also don't want HTML embedded in the publisher's website - I want to be able to download and process, sarchive, and read in any way I want. I just looked up several papers that were hosted on journals that ceased to exist, websites now host bridal ads... Could be a simple zip with text file in some markup language, images, data files etc.

    *   epub?
    
    • the language should be easily mapable to multiple final distribution formats: epub, html, pdf

          *   +1
      
  • It is possible to edit it in a plaintext editor (Vim, Emacs, Sublime, etc) so to make it compatible with other editors and don't require changing one's workflow

    *   +10  If it can't be edited as plain text, then any manipulation of the format requires purpose-written software. That reduces the user's choice of editing software dramatically, initially to just one or two alpha-quality programs. Plain text is easy to parse, modify, and transmit, and is not tied to one architecture or display mode. If we want the program to have staying power (and one of TeX's strength is the fact that it has remained essentially unchanged for decades), plain text is a must.
    
    • -1 No, let's move finally away from VT100 terminals and start living in 21st century. For scientists to write papers we should simply make easy for them to write it, anything text based is good maybe for geeks, but not others. People should not have to know any special syntax to be able to write scientific papers.

          *   What are you proposing here exactly? To allow the format to be edited in MS Word and Libreoffice?
      
      • +3 No. the language should be editable with a plaintext editor. Easy gui tools should be built on top of it for the non experts.

      • So you are still programming in machine code? This is the same argument made to Grace Hopper after she made a first compiler. Nobody wanted to use it because they were saying that programming should be done directly in machine code. Yes, continue writing rich text in plain text if you wish, but maybe we should start working on something for the future.

      • +1 Benefits of being able to edit in a text editor: (1) Compatible with version control systems. (2) Human readable. (3) Will easily live for a long time, past when any GUI tool falls out of favor. (4) Works with existing tools, especially web-based.

      • You want to standardize at the simplest possible level. Complexity (GUI editors, formatting templates, etc) can be built on top of this base functionality. This is what makes standardization successful. (See IPv4, the end-to-end argument, for historical background.)

                *   +6
        
  • The format should include markup of annotation, i.e. so that it is possible to distinguish the document text itself from annotations (comments), possibly made by others.

  • Complete and unambiguous syntax definition, to ensure that all processing tools interpret the contents in the same way (the weak point for Markdown)

    *   +8
    
  • Support for publisher-specific formatting of references, interface to reference databases (-> BibTeX).

  • An integrated or associated Turing-complete programming language. This is what makes TeX so powerful (nothing is ever impossible in TeX), but it's also an important source of complexity. Striking the right balance on this point is very difficult.

    *   An integrated programming language is a security nightmare. That's why Adobe migrated the world from PostScript to PDF, and that's why macros in Word are evil.
    
    • -1 I agree. If you want that, you could simply allow HTML5 embedding with same-origin sandboxing.
    • -1 Let�s keep things simple. Only geeks will use that kind of functionality, and those can easily pre-process their thing with whatever tool they like most.
    • +1 the integrate programming language should be for compile time, not run time. So I can for example pull in latest data from an api at run time.

Stages

  • Initial authoring by single person
  • Collaboration within small group / lab
  • Submission of initial paper to journal
  • Editorial workflow (reviews etc)
  • Final published paper on the web

All these stages are different, and might require different solutions. It might be that we need to come up with an interchange format for submitting to journals (based on XML or whatever), but make it very easy to convert from Markdown, ReST, or whatever WYSIWYG tool with templates etc people are using into this tool. Similarly - what is the best format for publishing - letting me easily view in any browser (web, laptop, tablet, mobile), markup, with semantic citations etc? (Of course, this workflow should support both journal-based publishing, and self-publishing - reprints or grey papers, or post-publishing peer-review/overlay journals etc).

  • Can�t we just embed information about that (and other things) in (YAML-like, for instance, please think outside the XML box) metadata?

Texts

Articles, essays, call for actions, random ramblings...

Solutions / implementations

LaTeX made easy

Make it easier to learn LaTex, e.g. online, no installation required, starting with an example:

Or full WYSIWYM:

  • http://www.lyx.org/ (you have LaTeX for formulas, at LaTeX lookup for everything, but you cannot edit it directly).

Markdown + LaTeX

  • https://stackedit.io/

    *   source: [](https://github.com/benweet/stackedit)[https://github.com/benweet/stackedit](https://github.com/benweet/stackedit) (2281 stars)
    
    • web-based, synchronizes with GoogleDrive, Dropbox and GitHub Gist
    • extendable
    • Documents are also stored in your browser, so you can work offline
    • votes/comments: +2
  • http://markx.herokuapp.com/

    *   source: [](https://github.com/yoavram/markx)https://github.com/yoavram/markx (93 stars)
    
    • web-based, synchronized with GitHub repositories
    • support for citations
    • Markdown processor easily extendable
    • file format conversion using pandoc (pdf, latex, word, html, etc.)
    • votes/comments: +1
  • http://www.authorea.com/

    *   Web-based, can synchronize with Git repos
    
    • Support for citations, figures, comments
    • Version controlled (GIT)
    • LaTeX and/or Markdown -> HTML5
    • Export to PDF (also using journal styles)
    • votes/comments: +1
  • Gitit

  • Comments
    • LaTeX is hackish (i.e. to do simple things sometimes you need to do tricky stuff); using Markdown (as it does not offer a lot of functions) can be even more hackish (e.g. including LaTeX hacks in Markdown is worse than just using LaTeX hacks)

    Markdown + LaTeX inside some environment

    HTML5 + MathJax

    HTML5 has semantic tags. MathJax can render equations. It works everywhere, you can even embed, link and whatever. CSS sections can be used for pages. Footnotes also work with them. If needed, only some small JavaScript library can be added for some special things. People know HTML5 from elsewhere.

    • HTML (or XML) way is one of ways to go.
    • IMHO Pure HTML gives too much freedom for that purpose (that, in certain cases may be undesired / distractive).

    Something like (I am just jotting, it is not a well-considered stuff; just I want to show than it is not only about how things look, but what do they mean (e.g. are citations, internal references, etc))

    • <title>On the paradigm shift of academic writing</title>
    •   <name>Piotr MigdaB</name>
      
    •   <affiliation>ICFO</affiliation>
      
    •   <orcid>23971381900</orcid>
      
    • Introduction

    • As it was shown in Feynmans's publication ...
    • Compare with . **
    • How will you specify how the citations are shown? in parenthesis style you may have:
    • "Hans C. Andersen (1892) discovered the need for self reflection, which subsequently has been studied widely (Schleier 1995, Hansen 2003, Dag 1987)." while in footnote citation formats you need both to be footnotes and for the first citation "(1892)" to also include the author, while in parenthesis style you only want the author in the second example...
      • External specification file, so you can easily change between journals. Or online, a reader can change according to her preferences.
      • Ok, but the point is that the citation specification isn't always enough. Take my example: Say the writer has written it for footnote citation support first. He then wants to turn it into parenthesis support instead. But how will the system know if he wants to include the author in the citation (example 2) or not (example 1)? Biblatex/Bibtex have quite advanced mechanisms to specify that. That would work, but it would mean that complexity increases again.
        • I am not sure if I understand. The point of some semantic language is to split the semantic meaning (e.g. "it is a citation") from how it is printed (e.g. (Smith and Kowalski 2005) vs [15]; footnote or on the margin, etc)). So, like with BibTeX citation styles, for rendering purposes, you will also need to have a library doing it for you.
          • Yes, but there is a difference between "Stephen Hawkins (2006) said something." And "Many scientists congratulated him (Warner 2003a, b, 2007; Milfgreen 2009, Butler 2011, 2012a, b, c, Hæfgens 2013)." You cannot just make one tag to cover both of them. As a minimum you need soem information whether it's \textcite or \parencite. If you look even closer, you realize you need even more things to specify. In the end you end up with a 255 page manual just for the various tags and options related to the citation -- like BibLaTeX ( [](ftp://ftp.fu-berlin.de/tex/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf)ftp://ftp.fu-berlin.de/tex/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf ). Which is fine -- but it will be complex .
            • You can use XML attributes. And in that way there is one tag for citing, you an just add attributes. But I agree, that after considering many things, it may turn about to be as complicated as LaTeX (perhaps up to a constant factor).

    Other approaches

    Supplementary tools

    We need a great template library, and a really good GUI tool for designing robust templates - both for PDFs, HTML, ePub etc. I should be able to quite easily implement for example my university's PhD thesis requirements (first page number of each chapter centered at the bottom, subsequent ones on the top right etc etc), possibly forking an existing template. The same with a journal or conference template. I should then be able to apply this template to any text written in a markup language. Right now, your PDF can look great if you use an existing LaTeX template, and follow the specifications, but it's a bit of a dark art to create or modify templates, and text isn't trivially compatible across different templates (I think).

    • We need WYSIWYG tools that understand styles, semantic metadata, citations etc that can output whatever format we want.

    • For semantic markup, you want a WYSIWYM tool.

    • See Fidus Writer http://www.fiduswriter.org it is exactly that. It is open source (AGPL) so you can change it to whatever you need. It is both WYSIYM and WYSIWYG, depneidng on how you use it.

    • Command-line tools to convert (Pandoc style) between text formats and output formats

    • Great idea. There are a few initiatives for realtime collaboration in editors out there already. Technically most advanced is likely the Substance.io team with their Operational changes in javascript" concept. But This would be a good thing to build a small working group around.

    • 1 + 1

    Expanding on what Stian said above, a finalized format - replacing pdf - which allows some reuse of content with attribution. I'm imagining that the finalized format is essentially the same as the original markdown source, with in place editing disallowed, but where is easy to split the document into sections and use a section in your own documents (which could be research notes, lecture notes to be distributed to students, etc.) Ideally these sections would contain some way to link back to the original document, or attribute the original author. This would be the same Scrivener like sectioning ability Stian mentioned in the real time collaboration feature, just expanded to be a general feature of reading or editing marked up files.

    • creating a new format will require new plugins for everyone.

    Meta-comment. If Markdown was a great collaboration language, we would be using it right now instead of of WYSIWYG environment.

    • Well, it works great for our purpose, but it is not a system you can build on top. (E.g. integrate with other tools, writing custom WYSIWYGs, using with version control, etc)
    • And anyway, come of strength of HackPad (IMHO) are abstraction of things (e.g. fonts).

    General comment: metadata should be included in a human-readable format, e.g. JSON or YAML (what does MD and its derivatives exactly do for that?), according to the general philosophy of a readable markup language. This means moving away from obfuscated HTML or XML-like solutions.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment