Skip to content

Instantly share code, notes, and snippets.

@andy0130tw
Last active May 27, 2019 00:07
Show Gist options
  • Save andy0130tw/2391995492a6f6bc8a15 to your computer and use it in GitHub Desktop.
Save andy0130tw/2391995492a6f6bc8a15 to your computer and use it in GitHub Desktop.
用 CSS3 Paged Media 打造優秀排版的電子書 feat. PrinceXML
This file is left blank intentionally.

README

「讀我點媽的」 ~WF, 2016

Some snippets I've written when I try to convert "Butterick’s Practical Typography" into PDF. This is an experiment of my project focusing on publishing books from markdown or HTML documents.

script 須在 Node.js 環境執行。用了兩個 dependency。 內容使用到的產品 PrinceXML 在非商業情況下可以免費使用。

僅為實驗,輸出僅供個人閱讀。 不過程式碼是蠻精彩的啦(?)。我還寫了一大堆註解呢。

檔案說明

gen.js

產生器,用來拼合並輸出本書的主要HTML檔案、將外部URL拉出主文件形成註解。

book-index.js

本書的目錄,輔助用的HTML片段,以及一些髒髒的函式用來改寫內容。

book-util.js

就一些上面兩個檔案會共用的東西。

fix-css.js

因為本來的樣式表有些地方不知道為什麼覆寫不掉,乾脆直接把他們從這張樣式表裡去掉。

pdf-styles.css

為產出 PDF 所撰寫的樣式,與產生器相配合,示範了一些大部份瀏覽器尚未實作的 CSS Paged Media 相關功能,比如左右頁結構設定、頁首頁尾、斷頁規則、PDF書籤、註解、cross reference、目錄等等。當然也有已經瀏覽器已經廣泛實作的部份,像章節計數器。另外覆寫了一些原樣式表定義的樣式。

其他的呢

因為版權因素,包含作者產出內容的文件比如 __prologue.html__epilogue.html 就不放了。

To Kick Things Off

# wget the site (very dangerous; you may get banned easily)
wget --mirror --convert-links --adjust-extension --page-requisites ***URL_HERE***

# you have to do some copy-and-paste and add little things from these files to produce ... #
#   * __cover.html
#   * __toc.html
#   * __prologue.html
#   * __epilogue.html

# install node dependencies
npm install css
npm install cheerio

# a global module to create a temporary server
# install it only if you don't have one
# IMPORTANT! this module is named node-static
# previously I'd mistaken the name!!
npm install -g node-static

# fix the css
node ./fix-css.js > styles-fix.css
# generate content to a file
node ./gen.js > __page.html

# this turn on a local web server
static . &
# make the PDF
prince http://localhost:8080/__page.html -o 1.pdf -s styles-fix.css -s pdf-styles.css

# view the result (optional?)
xdg-open 1.pdf

附註

經過了一學期DSnP的紮實訓練、聽了jserv關於嵌入式系統的講座以後,頓時想起心裡一直還有個專案沒實作完成(印象中大概是在MOPCON之後發想的)。練習完全用 hack 的方式解決問題,想知道自己以最短時間內發展出來的架構能夠做到什麼、不能做到什麼,於是開始了這次實驗。

前後大概一共花了36個小時寫。超連結的處理應該是我花最多時間的地方吧(同時也是最髒的部份),而side note 是我覺得整份裏面處理得最漂亮的部份了。也許可以當成黑客松的練習,儘管沒有 business model。

最後,謹以此做為寒假 coding 的結束。

感謝

  1. 感謝一位很仔細幫我校對的學弟。我們常常一起研究排版學 (Typography)。
  2. 感謝一個小朋友讓我用bug炸他(還有閃我)。
  3. 感謝一個學長陪我聊天和讓我丟code給他看 >////<
  4. 感謝你願意看到最後 Q__Q
var FONT_SIZE = 13;
module.exports = {
SITE_DOMAIN: '***URL_HERE***',
HOME_URL: 'index.html',
FONT_SIZE: FONT_SIZE, // font size in pixels
REPLACE_LINKS: true,
replaceRemUnit: function(text) {
return text.replace(/([\d.]+)rem/g, function(matched, p1) {
var num = p1 - 0;
if (!num) return '0';
return Math.round(num * 1e8) * FONT_SIZE / 1e8 + 'px';
});
},
collectSrc: function(entryArr, objRef) {
// scan over the entries and collect locations
// value 1 is for valid links
entryArr.forEach(function(v) {
if (v.src)
objRef[v.src] = v.disabled ? 0 : 1;
});
return objRef;
}
};
#!/usr/bin/node
var bookUtil = require('./book-util');
/* WARNING: some dirty fix involved */
var fs = require('fs');
var process = require('process');
var result = fs.readFileSync('styles.css', 'utf-8')
.replace(/@media.+/g, '')
.replace(/max-width:1000px;/g, '')
.replace(/min-width:520px;/g, '')
.replace(/html {\s*height: 100%;\s*}/, '');
result = bookUtil.replaceRemUnit(result);
process.stdout.write(result);
#!/usr/bin/node
var fs = require('fs');
var process = require('process');
var cheerio = require('cheerio');
var css = require('css');
var bookUtil = require('./book-util');
var bookIndex = require('./book-index');
function makeAnnotation(counter, title) {
var lineLen = 71; // 78 char - 7 markup
var hr = '#'.repeat(lineLen);
var text = (counter ? counter + ' ' : '') + title;
var len = text.length + 4; // 4 spaces
var leftLen = Math.floor((lineLen - len) / 2);
var rightLen = lineLen - len - leftLen;
return [hr, '#'.repeat(leftLen) + ' ' + text + ' ' + '#'.repeat(rightLen), hr]
.map(function(v) {
return '<!--' + v + '-->';
}).join('\n');
}
function rPrefixCSSFN(entry) {
return function(match, p1, p2, p3) {
var ast = css.parse(p2);
var rules = ast.stylesheet.rules;
rules
.filter(function(node) { return node.type == 'rule' })
.forEach(function(node) {
node.selectors = node.selectors.map(function(rule) {
return '#' + entry.src.replace(/\./g, '\\.') + ' ' + rule;
});
});
var p2p = css.stringify(ast);
return p1 + '/* prefixed */' + p2p + p3;
}
}
function rRewriteLinkFN() {
// hey there, welcome to the most HARDCORE part of this snippet!
return function(match, p1) {
if (!bookUtil.REPLACE_LINKS) return match;
var dom = cheerio('a', match, { decodeEntities: false });
var href = dom.attr('href');
var classStr = dom.attr('class');
var classList = classStr ? classStr.split(' ') : [];
var REPL_CONTENT = '__DUMMY_' + Math.random() + '__';
dom.html(REPL_CONTENT);
// empty links ------
// without destination; maybe a marker
if (!href || href == '#') {
if (!dom.attr('name') || dom.attr('id')) return match;
// princexml ignores empty tag...
dom.css('visibility', 'hidden');
return cheerio.html(dom).replace(REPL_CONTENT, '.');
}
// special transforms ------
// not knowing why fucking the parser escaped it
href = unescape(href);
dom.attr('href', href);
// some xrefs are forgotten to be marked; still rewrite the href
// so xref class is not reliable; we are actually persering its style
// and ignoring their meaning!
// some links contain the hash (or named anchors)
// fragments are not rewritten so maybe we can strip out the hash part here
// well... name of fragments should be prefixed to prevent collision across documents
var hashCharIdx = href.indexOf('#');
var hashPart; // INCLUDING # char
if (hashCharIdx >= 0) {
var actualHref = href.substring(0, hashCharIdx);
// if the destination is inside a disabled entry we have to say sorry
if (internalSrc[actualHref]) {
// strip out hash part
href = actualHref;
hashPart = href.substring(hashCharIdx);
}
}
if (
// 1) for linking to TOC
href == bookUtil.HOME_URL ||
// 2) links on the TOC; 0 if disabled or blacklisted
internalSrc.hasOwnProperty(href) ||
// 3) a hash or indirectly a hash
hashPart) {
// cross-reference-style links ------
if (!internalSrc[href]) {
// disabled links should get page hint
dom.addClass('__link_disabled__');
// but perserve its linking ability
href = bookUtil.SITE_DOMAIN + href;
} else {
dom.addClass('__show_page_hint__');
if (href == bookUtil.HOME_URL) // type 1
href = '#toc';
else if (hashPart) // type 3
href = href + hashPart;
else // type 2 and NOT disabled
href = '#' + href;
}
dom.attr('href', href);
} else {
// external links ------
// -> pointing to the original website (unreliable method)
// ... payment link must be handled externally
// ... if (href.indexOf('bc.html?') == 0) {
if (href.indexOf(':') < 0) {
href = bookUtil.SITE_DOMAIN + href;
dom.attr('href', href);
}
// else -> pointing to the universe
// ... add something indicating external links having no default styling
if (!classList.length)
dom.addClass('__ext_link_default__');
p1 += '<span class="__footnote__">' + unescape(href) + '</span>';
// return match + '<span class="__footnote__">' + unescape(href) + '</span>';
}
return cheerio.html(dom).replace(REPL_CONTENT, p1);
// console.log('unhandled link: ' + p1, href, classList);
}
}
var internalSrc = bookUtil.collectSrc(bookIndex, {});
var content = bookIndex
.filter(function(v) { return !v.disabled })
.map(function(v, i) {
var text;
var debugMsg;
if (v.text)
text = v.text;
else if (v.src) {
text = fs.readFileSync('practicaltypography.com/' + v.src, 'utf-8');
debugMsg = v.src;
} else
throw new Error('No content/source for entry #"' + i + ': ' + JSON.stringify(v) + '"!');
if (v.src && !v.partial) {
// dig out the main content
var re = /<div class="content">[\s\S]+<\/div>(?=\n+<!--)/;
var result = text.match(re);
if (!result)
throw new Error('No partial found in file "' + v.src + '"!');
text = result[0];
}
// apply transformer if available
if (v.transformer)
text = v.transformer(text);
if (v.src && !v.skipDefaultTransform) {
// fix horrible, unsupported rem unit
text = bookUtil.replaceRemUnit(text);
text = text
// strip empty areas that steal spaces
.replace(/<a id="links"><\/a>/g, '')
.replace(/<ul class="children"><\/ul>/, '')
.replace(/<div style="height:1em"><\/div>/, '')
// strip and prefix inline css
.replace(/(<style.+?>)([\s\S]*?)(<\/style>)/g, rPrefixCSSFN(v))
// strip out links
.replace(/<a\s+[\s\S]*?>([\s\S]*?)<\/a>/g, rRewriteLinkFN());
// wrap it around; adding information about the section
if (!v.partial)
text = '<div class="__section__" id="' + v.src + '" data-title="' + v.title + '">' + text + '</div>';
}
var annotation = v.title ? (makeAnnotation(v.counter, v.title) + '\n') : '';
// if (debugMsg)
// process.stderr.write('entry processed: ' + v.src + '\n');
return annotation + text;
});
process.stdout.write(fs.readFileSync('__prologue.html', 'utf-8'));
process.stdout.write(content.join('\n\n'));
process.stdout.write(fs.readFileSync('__epilogue.html', 'utf-8'));
img { image-resolution: auto, 300dpi; }
* { background-image-resolution: auto, 300dpi; }
/***** metric-dependent stuffs *****/
/*
I personally convert rem to actual length units by scaling
down from a 1000px (its max-width of <body>) wide viewport
2.5 7 2 ? 2.5 (rem)
+--------------------------------+
| | 3
| +-------+ +------------+ |
| | ..... | | xxxxx... | |
| | ..... | | | |
= - - - - =
^^^^^^^ side note
so we simply get the factor: 1000 px -> 5.8 inches, and
it looks really well when font size is set to 12-14px.
(read the 4th point of "Typography in Ten Minutes" to see
why I said this)
*/
html { font-size: 13px; }
@page {
size: A5; /* 5.8in x 8.3in */
margin-top: .518in; /* 72px + .1in */
margin-bottom: .318in; /* 72px - .1in */
margin-inside: .35in; /* ~60px */
margin-outside: 1.6in; /* ~276px */
padding-top: .1in;
@footnotes {
/*border-top: solid #777 thin;*/
margin-top: .15in;
max-height: 8in;
}
}
@page cover {
margin: .2in; /* casaual */
padding: 0;
}
@page:left {
@top-right { font-size: 12px; }
@top-left-corner { font-size: 14px; margin-outside: .25in; }
}
@page:right {
@top-left { font-size: 12px; }
@top-right-corner { font-size: 14px; margin-outside: .25in; }
}
/* delicated side notes (be careful since they are fragile) */
title-block, aside {
/* width 0.974in
gut 0.276in
reset the metrics caused by a bug where negative
width object will collapse in float environments */
width: .974in;
margin-outside: -1.25in;
margin-inside: .276001in;
/*width: 1.526001in;*/
/*border-left: .276in solid transparent;*/
/*border-right: .276in solid transparent;*/
}
/* fix on margin-collapsing on indented elements */
li aside {
/* add .17in (32.5px ~= .34in, make it half) */
/* that way, side notes on the right will be unaffected
but those on the left will be shift by the amount twice */
margin-outside: -1.42in;
margin-inside: .446001in;
right: .17in;
}
/***** page definitions *****/
@page:left {
@top-right {
content: counter(chapter) '\2003' string(chapter-title);
font-family: 'equity-caps';
text-transform: lowercase;
}
@top-left-corner {
text-align: left;
content: counter(page);
font-family: 'concourse-t3';
font-weight: 700;
}
}
@page:right {
@top-left {
content: counter(chapter) '.' counter(section) '\2003' string(section-title);
font-family: 'equity-caps';
text-transform: lowercase;
}
@top-right-corner {
text-align: right;
content: counter(page);
font-family: 'concourse-t3';
font-weight: 700;
}
}
@page cover{
@top-left-corner { content: normal; }
@top-right-corner { content: normal; }
}
@page:first:left { @top-right { content: normal; } }
@page:first:right { @top-left { content: normal; } }
@page toc:left {
@top-right { content: 'table of content'; }
@top-left-corner { content: counter(page, upper-roman); }
}
@page toc:right {
@top-left { content: 'table of content'; }
@top-right-corner { content: counter(page, upper-roman); }
}
@page partBeforeMain:left { @top-right { content: string(chapter-title); } }
@page partBeforeMain:right { @top-left { content: string(chapter-title); } }
@page partAppendix:left { @top-right { content: 'appendix'; } }
@page partAppendix:right { @top-left { content: 'A.' counter(section) '\2003' string(section-title); } }
@page endCredits:left { @top-right { content: 'end credits'; } }
@page endCredits:right { @top-left { content: 'end credits'; } }
@page blank, :blank {
@top-left { content: normal; }
@top-left-corner { content: normal; }
@top-right { content: normal; }
@top-right-corner { content: normal; }
}
/* holy debugging zone */
/*
@page partBeforeMain:left { background-color: red; }
@page partBeforeMain:right { background-color: green; }
@page blank { background: black; }
@page content { background: yellow; }
@page cover { background: purple; }
*/
/***** styles.css override BEGIN *****/
body > * {
margin-left: 0 !important;
margin-right: 0 !important;
}
div.content {
display: block;
padding: 0;
}
body {
/* the original css has it wrong */
-moz-font-feature-settings: 'kern=1', 'liga=1';
-moz-font-feature-settings: 'kern' 1, 'liga' 1;
-webkit-font-feature-settings: 'kern' 1, 'liga' 1;
-o-font-feature-settings: 'kern' 1, 'liga' 1;
-ms-font-feature-settings: 'kern' 1, 'liga' 1;
font-feature-settings: 'kern' 1, 'liga' 1;
}
/* princeXML doesn't support `font-feature-settings` for now,
but it supports `font-variant` with `prince-` prefixed flavor
not knowing if it works */
body {
font-variant: prince-opentype(kern, liga);
}
sig,
.subhead, .howto-name,
program,
a.xref,
toc-topic {
font-variant: prince-opentype(c2sc);
}
ol li {
font-variant: prince-opentype(liga, ss01 off);
}
.btw-title,
a.specimen, table.buy-table td a, a.direct-payment {
font-variant: prince-opentype(case);
}
a.__ext_link_default__ {
/* there is no hover effect in a PDF */
/* only affect unhandled && unstyled links */
background: #fbf3f3;
border-radius: 4px;
}
title-block, aside {
position: relative;
float: outside;
left: auto;
text-align: inside !important;
display: block;
overflow: auto;
box-sizing: border-box;
}
aside.__old_float_method__ {
position: absolute;
right: 0;
margin-outside: 0;
margin-inside: 0;
}
/* to fix the nasty border-top issue; no longer needed */
/*title-block:before, title-block:after {
content: '';
display: block;
position: absolute;
background: white;
height: 4px;
width: .276in;
top: -3px;
}
title-block:before { left: -.276in; }
title-block:after { right: -.276in; }*/
/***** styles.css override END *****/
/***** page breaking policy *****/
.__page_toc__ { page-break-after: right; }
.__chapter__ { page-break-after: right; prince-page-group: start; }
.__page_partBeforeMain__ .__chapter__ { page-break-after: always; }
.__section__ { page-break-after: always; }
topic {
page-break-inside: avoid;
page-break-after: avoid;
}
.btw-title,
.subhead,
.howto-name {
page-break-after: avoid;
}
.btw li,
#the-infinite-pixel-screen\.html table {
page-break-inside: avoid;
}
/***** book structure *****/
.__blank__ { page: blank; }
.__page_cover__ { page: cover; }
.__page_toc__ { page: toc; }
.__page_partBeforeMain__ { page: partBeforeMain; }
.__page_appendix__ { page: partAppendix; }
.__page_endCredits__ { page: endCredits; }
.__reset_page_number__ { counter-reset: page 1; }
/*.__reset_chapter_number__ { counter-reset: chapter; }*/
.__chapter__ {
counter-increment: chapter;
counter-reset: section;
}
.__section__ {
counter-increment: section;
}
/* do not increment a counter for the first section of each chapter */
.__chapter__ .__section__:first-child {
counter-increment: none;
}
.__chapter__ .__section__:first-child topic {
string-set: chapter-title content(), section-title content();
}
.__chapter__ .__section__:first-child {
bookmark-label: attr(data-title);
bookmark-state: closed;
bookmark-level: 1;
}
.__section__ topic {
string-set: section-title content();
}
.__section__ {
bookmark-label: attr(data-title);
bookmark-level: 2;
/*margin-bottom: .418in;*/
margin-bottom: .627in;
}
.__page_cover__ {
bookmark-label: 'Cover';
bookmark-level: 1;
}
.__page_toc__ {
bookmark-label: 'Table of Content';
bookmark-level: 1;
}
.__page_appendix__ {
counter-reset: section;
}
.__page_endCredits__ topic {
bookmark-label: content();
bookmark-level: 2;
}
/* hide foot notes as default */
.__footnote__ { display: none; }
/********************************************************/
/******** link-to-foot-note convertion for books ********/
/******** remove lines below when making e-books ********/
/********************************************************/
.__section__ { counter-reset: footnote; }
.__footnote__ {
display: initial;
font: initial;
float: footnote;
word-break: break-all;
text-align: left;
color: #777;
font-family: 'equity-text';
font-size: 8px;
line-height: 1;
}
.__footnote__::footnote-call {
/* from the lovely circle */
position: relative;
color: #933;
font-size: 80%;
top: -1em;
margin-left: .10em;
font-family: 'concourse-t3';
}
type-specimen .__footnote__ {
display: none;
}
a {
background: initial;
border-radius: initial;
}
a:after {
content: '';
}
a.xref.__show_page_hint__ {
/* don't break page with a page number */
page-break-inside: avoid;
}
a.xref.__show_page_hint__:after {
font-size: 50%;
position: relative;
top: -.6em;
margin-left: .10em;
font-family: 'concourse-t3';
content: 'p.\00a0' target-counter(attr(href), page);
/*content: '\00a0(' target-counter(attr(href), chapter) '.' target-counter(attr(href), section) ')';*/
}
a.toc.__show_page_hint__:after,
ul.children a.xref.__show_page_hint__:after {
position: static;
color: inherit;
font-size: 100%;
content: leader('.') target-counter(attr(href), page);
}
.__page_toc__ toc-topic a.toc.__show_page_hint__:after {
content: leader('') target-counter(attr(href), page);
}
.content ul.children a.xref.__show_page_hint__:after {
font-family: 'equity-caps';
}
/* style on disabled links on (main or per-chapter) TOC
__show_page_hint__ class is not applied on 'em ever when processing
so no need to override content of its :after
you can hide them but I don't recommend that
you can add back the red background (and that will be lovely indeed >///<)
*/
.__page_toc__ toc-topic a.toc.__link_disabled__ { }
.content ul.children a.xref.__link_disabled__ { color: #667; }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment