Skip to content

Instantly share code, notes, and snippets.

@zhgchgli0718
Last active May 22, 2022 02:50

Background

正在寫一個 Medium Post to Markdown 的完整轉換小工具,輸入文章連結就能自動爬取內容、下載圖片並轉換成 Markdown 格式。

Tech

爬取方式是取出 Medium 文章的前端 JSON Source,Source 會包含每個段落的資訊,將所有段落逐一爬取並轉換成 Markdown 格式。

Question

JSON Source 的段落樣式會給以下格式表示

"Paragraph": {
    "text": "code in text, and link in text, and ZhgChgLi, and bold, and I, only i",
    "markups": [
      {
        "type": "CODE",
        "start": 5,
        "end": 7
      },
      {
        "start": 18,
        "end": 22,
        "href": "http://zhgchg.li",
        "type": "LINK"
      },
      {
        "type": "STRONG",
        "start": 50,
        "end": 63
      },
      {
        "type": "EM",
        "start": 55,
        "end": 69
      }
    ]
  }

意思是 code in text, and link in text, and ZhgChgLi, and bold, and I, only i 這段文字的:

- 第 5 到第 7 字元要標示為 Code (用`Text`格式包裝)
- 第 18 到第 22 字元要標示為 URL (用[Text](URL)格式包裝)
- 第 50 到第 63 字元要標示為 Code (用*Text*格式包裝)
- 第 55 到第 69 字元要標示為 Code (用_Text_格式包裝)

第 5 到 7 & 18 到 22 在這個例子裡好處理,因為沒有交錯到;但 50-63 & 55-69 會有交錯問題,Markdown 無法用以下交錯方式表示:

code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, **only i_

正確的組合結果如下:

code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, _**_only i_

50-55 STRONG 55-63 STRONG, EM 63-69 EM

另外要需注意包裝格式的字串頭跟尾要能區別,Strong 只是剛好頭跟尾都是 **,如果是 Link 頭會是 [ 尾則是 ](URL)

Result

提出解決算法的朋友會把你加到專案完成開源後的 readme 上,標注提供算法技術支援。

Usecase 2

"Paragraph": {
    "text": "iCloud Private Relay is an iCloud+ service that prevents networks and servers from monitoring a person’s activity across the internet. Discover how your app can participate in this transition to a more secure and private internet: We’ll show you how to prepare your apps, servers, and networks to work with iCloud Private Relay.",
    "markups": [
      {
        "type": "A",
        "start": 24,
        "end": 201,
        "href": "https://medium.com/"
      },
      {
        "type": "STRONG",
        "start": 48,
        "end": 65
      },
      {
        "type": "STRONG",
        "start": 125,
        "end": 147
      },
      {
        "type": "EM",
        "start": 27,
        "end": 133
      }
    ]
  }

result:

iCloud Private Relay is [an _iCloud+ service that _**_prevents networks_**_ and servers from monitoring a person’s activity across the _**_internet_. Discover how** your app can participate in this transition to a more](https://medium.com) secure and private internet: We’ll show you how to prepare your apps, servers, and networks to work with iCloud Private Relay.

iCloud Private Relay is an _iCloud+ service that prevents networks and servers from monitoring a person’s activity across the _internet. Discover how your app can participate in this transition to a more secure and private internet: We’ll show you how to prepare your apps, servers, and networks to work with iCloud Private Relay.

@Nick0603
Copy link

第 5 到 7 & 18 到 22 在這個例子裡好處理,因為沒有交錯到;但 50-63 & 55-69 會有交錯問題,Markdown 無法用以下交錯方式表示:

我有點看不懂你說的正確組合,我想像應該正確組合會像是

code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold, _and I,_** _only i_

EM 要因為交錯而變成套用 STRONG 內部字,跟外部字
先 Note: 我發現如果 ** 或 _ 內部第一個字遇到 space 也會導致語法錯誤,所以可能再套用時候要位移一下

至於演算法,我在想是不是在 apply 樣式到文案中時候就要把文案轉成某種 decorator 形式,然後套用樣式是一層一層往裡面加,最後再一起輸出

@Nick0603
Copy link

Nick0603 commented May 21, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment