Skip to content

Instantly share code, notes, and snippets.

@voischev
Last active October 4, 2016 00:02
Show Gist options
  • Save voischev/767c2822b6fe12f1ac608aba2b1cc888 to your computer and use it in GitHub Desktop.
Save voischev/767c2822b6fe12f1ac608aba2b1cc888 to your computer and use it in GitHub Desktop.
PostHTMLTree.js ideas [WIP]
/*
format PostHTMLTree
@see https://dev.w3.org/html5/html-author/
*/
/*
declarations
@see https://dev.w3.org/html5/html-author/#doctype-declaration
<!DOCTYPE
HTML
PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
*/
{
type: 'declaration',
name: 'doctype',
raw: '<!DOCTYPE\n HTML\n PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
// <!-- comment -->
{
type: 'declaration',
name: 'comment',
raw: '<!-- comment -->',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
// <![CDATA[x<y]]>
{
type: 'declaration',
name: 'cdata',
raw: '<![CDATA[x<y]]>',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
// <?php $php = 1 ?>
{
type: 'declaration',
name: 'php',
raw: '<?php $php = 1 ?>',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
/*
elements
normal, void, raw text, RCDATA and foreign elements
@see https://dev.w3.org/html5/html-author/#elements
@see https://dev.w3.org/html5/html-author/#tags
@see https://dev.w3.org/html5/html-author/#void
@see https://dev.w3.org/html5/html-author/#raw-text-elements
@see https://dev.w3.org/html5/html-author/#rcdata-elements
@see https://dev.w3.org/html5/html-author/#foreign-elements & http://www.w3.org/TR/html5/syntax#foreign-elements
@see https://dev.w3.org/html5/html-author/#normal-elements
*/
/*
tag
@see https://dev.w3.org/html5/html-author/#tags
example: <p>The quick brown fox jumps over the lazy dog.</p>
*/
{
type: 'element',
term: 'normal',
name: 'p',
syntax: 'normal',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
},
content: [...]
}
// self-closing tag
// <p>The quick brown fox<br/>
// jumps over the lazy dog.</p>
{
type: 'element',
term: 'void',
name: 'br',
syntax: 'self-closing',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
// <span name="value"/>
{
type: 'element',
term: 'void',
name: 'span',
syntax: 'self-closing',
attrs: ...
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
/*
void elements
@see @see https://dev.w3.org/html5/html-author/#void
example: <hr>
*/
// <hr>
{
type: 'element',
term: 'void',
name: 'hr',
syntax: 'normal',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
// <hr/>
{
type: 'element',
term: 'void',
name: 'hr',
syntax: 'self-closing',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
/*
raw-text elements
@see @see https://dev.w3.org/html5/html-author/#raw-text-elements
example: <script>var a = 'a';</script>
*/
{
type: 'element',
name: 'script',
spec: {
term: 'raw-text',
syntax: 'normal'
},
data: 'var a = \'a\';',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
/*
rcdata-elements elements
@see https://dev.w3.org/html5/html-author/#rcdata-elements
example:
<textarea>
This can contain character references like &amp;, &lt; and &gt;,
but such characters and also be written directly as &, < and >.
Strings that look like <!-- comments --> or other elements <span>
are treated as plain text, instead of markup.
</textarea>
*/
{
type: 'element',
name: 'texarea',
spec: {
term: 'rcdata',
syntax: 'normal'
},
data: 'This can contain character references like &amp;, &lt; and &gt;,\n nbut such characters and also be written directly as &, < and >.\n Strings that look like <!-- comments --> or other elements <span>\n are treated as plain text, instead of markup.'
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
/*
foreign-elements
@see https://dev.w3.org/html5/html-author/#foreign-elements & http://www.w3.org/TR/html5/syntax#foreign-elements
example:
<p>
<svg>
<metadata>
<!-- this is invalid -->
<cdr:license xmlns:cdr="http://www.example.com/cdr/metadata" name="MIT"/>
</metadata>
</svg>
</p>
*/
{
type: 'element',
spec: {
term: 'foreign',
syntax: 'self-closing'
},
name: 'cdr:license',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
/*
foreign-elements
@see https://dev.w3.org/html5/html-author/#normal-elements
example: <div><br/></div>
*/
{
type: 'element',
spec: {
term: 'normal',
syntax: 'normal'
},
name: 'div',
content: [
{
type: 'element',
spec: {
term: 'void',
syntax: 'self-closing'
},
name: 'br',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
],
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
/*
text
example: ΠŸΡ€ΠΈΠ²Π΅Ρ‚
ΠœΠΈΡ€!
*/
{
type: 'text',
data: 'ΠŸΡ€ΠΈΠ²Π΅Ρ‚\n ΠœΠΈΡ€!',
position: { // for example
start: { line: 1, column: 1, offset: 0 },
end: { line: 1, column: 16, offset: 15 }
}
}
@michael-ciniawsky
Copy link

michael-ciniawsky commented Aug 26, 2016

Cool πŸ‘

attrs: ...:D How will they look like ?

{
  type: 'element' :+1:
  spec: { syntax: 'normal', term: 'rcdata' } // ? but yeah.. just to throw it in
  name: 'div' // is name really the best solution, besides personal likes/dislikes is there a reason using name e.g spec etc.
              // 100% correct would be tagName(ugly/verbose:)), the decision between either name or tag,
              // tag is more specific/related to html element as the agnostic term name(and shorter + familiar ;))
  content: [] // array always? :+1:
  position: {} :+1:
}

text nodes πŸ‘€ ? Or is it in general the case that plain text will/should be represented as string, is it raw-text elements and i'm dump :D

How/when/where will the vanilla parser come into play, is the work on the future parser vs. posthtml core conditionally?

@michael-ciniawsky
Copy link

What about XML/SVG?

@voischev
Copy link
Author

@michael-ciniawsky text nodes is soon ;)

@voischev
Copy link
Author

What about XML/SVG?

this nodes looks like for XML/SVG. Examples leter

@voischev
Copy link
Author

voischev commented Aug 28, 2016

is the work on the future parser vs. posthtml core conditionally?

this work about a new tree posthtml. Parser will need to look for/create for this format. Main idea treex + adapter for posthtml tree format

@michael-ciniawsky
Copy link

michael-ciniawsky commented Aug 30, 2016

text nodes is soon ;)

πŸ‘

this nodes looks like for XML/SVG. Examples leter

πŸ‘

Main idea treex + adapter for posthtml tree format

Adapter? :) Like a simpler 'Plugin/Transform' Tree and the Parser/PostHTML Tree. Can we develop treex here at posthtml? A bit more 'official' and for better contribution to it, e.g for getting started i could start making docs and take care of all the other chore envolved.

The result will break most of the plugins i suggest? We then need to discuss how we intend to migrate npm i posthtml@next for a while and how/what first etc. I also would like to talk about a file[name] option for posthtml.process and if streams in sync mode would be considered an valuable option.Note: i'm working on a Stream Wrapper/Middleware for PostHTML, as streaming replacement for e.g express-poshtml, koa-posthtml etc..., so maybe core is not necessary/the best place for stream support, but i can't tell for sure yet, I'm not the most seasoned stream user thb :) e.g async streams with/without returning a promise? is something like that technically possible/beneficial? Β―_(ツ)_/Β―

posthtml.process('file.html', { file: 'name', sync: true } }).<html|tree> // uses streams interally, but emits the hole file .on('end')
posthtml.process('file.html', { file: 'name', sync: 'stream' } }).pipe(stream) // returns readable stream for piping
import tap from 'gulp-tap'
import posthtml from 'gulp-posthtml'
import { task, src, dest } from 'gulp'

// Current
task('html', () => {
  let path

  const plugins = [ require('posthtml-include')({ root: `${path}` }) ]
  const options = {}

  src('src/**/*.html')
    .pipe(tap((file) => path = file.path))
    .pipe(posthtml(plugins, options))
    .pipe(dest('build/'))
})

// with options.file
task('html', () => {
  const plugins = [ require('posthtml-include')({ root: 'components' }) ] // relative path, static and cleaner setup  :)
  const options = {}

  src('src/**/*.html')
    .pipe(tap((file) => options.file = file.path))
    .pipe(posthtml(plugins, options))
    .pipe(dest('build/'))
})

@voischev
Copy link
Author

voischev commented Aug 31, 2016

Can we develop treex here at posthtml?

No. Treex - is idea only. And I'm should check in that it works before staying 'more official' :) But you can contribute in repo ;)

The result will break most of the plugins i suggest?

see later

Stream

Is a very good point! I think about it

@michael-ciniawsky
Copy link

michael-ciniawsky commented Sep 1, 2016

text nodes πŸ‘

What happens if a plugin changes e.g node.name = 'p' -> node.name = 'br' in relation to node.spec?

element node with attrs proposal :)

{
  type: "element",
  name: 'div',
  spec: {},
  attrs: { 
   id: 'text', 
   class: 'text text', 
   position:  { 
     id: { line: 1 column: 5, offset: 4 },
     class: {}
  },
  content: [],
  position: {
    start: { line: 1, column: 1, offset: 0 }, 
    attrs: { id: {}, class: {} } // or here maybe...
    end: { line: 1, column: 16, offset: 15 }
  }
}

But you can contribute in repo ;)

kk, yep official was meant more in the direction, that there is no way back then ;). Any architectural/design patterns the parser will/should follow? If the format stands 'polyfill' posthtml-parser with parse5 to start upgradig plugins and all smaller core related stuff?

file option πŸ‘€ ? 😍, there must be ways found in general to replace the path && fs deps in various plugins if browser usage is considered.

import { load } from 'posthtml-utils'
// ...
  node.content = load('component.html')
// ... 
// in the direction like ...  
  function load (url) {
     return new Promise ((resolve, reject) => {
       if (isClient()) {
          request(url).then((html) => resolve(html))
       } else {
          readFile(url, 'utf8', (err, html) => {
            if (err) reject(err)
            resolve(html)
          })
       }
     }
  })
}

Is a very good point! I think about it

πŸ‘

@michael-ciniawsky
Copy link

Any updates on this ? :)

@michael-ciniawsky
Copy link

michael-ciniawsky commented Oct 4, 2016

We could try one of these

-peg.js
-ohm.js

or vanilla :), should the parser be able to handle spec related fixes like parse5 does, I think its better to be forgiving (svg, xml) like htmlparser2, while being spec compliant as possible (or provide 2 different parsing modes if impossible to implement). Do you have other resources, in terms of the parser architecture, in mind/available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment