Skip to content

Instantly share code, notes, and snippets.

@zhgchgli0718
Last active May 22, 2022
Embed
What would you like to do?

Background

正在寫一個 Medium Post to Markdown 的完整轉換小工具,輸入文章連結就能自動爬取內容、下載圖片並轉換成 Markdown 格式。

Tech

爬取方式是取出 Medium 文章的前端 JSON Source,Source 會包含每個段落的資訊,將所有段落逐一爬取並轉換成 Markdown 格式。

Question

JSON Source 的段落樣式會給以下格式表示

"Paragraph": {
    "text": "code in text, and link in text, and ZhgChgLi, and bold, and I, only i",
    "markups": [
      {
        "type": "CODE",
        "start": 5,
        "end": 7
      },
      {
        "start": 18,
        "end": 22,
        "href": "http://zhgchg.li",
        "type": "LINK"
      },
      {
        "type": "STRONG",
        "start": 50,
        "end": 63
      },
      {
        "type": "EM",
        "start": 55,
        "end": 69
      }
    ]
  }

意思是 code in text, and link in text, and ZhgChgLi, and bold, and I, only i 這段文字的:

- 第 5 到第 7 字元要標示為 Code (用`Text`格式包裝)
- 第 18 到第 22 字元要標示為 URL (用[Text](URL)格式包裝)
- 第 50 到第 63 字元要標示為 Code (用*Text*格式包裝)
- 第 55 到第 69 字元要標示為 Code (用_Text_格式包裝)

第 5 到 7 & 18 到 22 在這個例子裡好處理,因為沒有交錯到;但 50-63 & 55-69 會有交錯問題,Markdown 無法用以下交錯方式表示:

code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, **only i_

正確的組合結果如下:

code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, _**_only i_

50-55 STRONG 55-63 STRONG, EM 63-69 EM

另外要需注意包裝格式的字串頭跟尾要能區別,Strong 只是剛好頭跟尾都是 **,如果是 Link 頭會是 [ 尾則是 ](URL)

Result

提出解決算法的朋友會把你加到專案完成開源後的 readme 上,標注提供算法技術支援。

Usecase 2

"Paragraph": {
    "text": "iCloud Private Relay is an iCloud+ service that prevents networks and servers from monitoring a person’s activity across the internet. Discover how your app can participate in this transition to a more secure and private internet: We’ll show you how to prepare your apps, servers, and networks to work with iCloud Private Relay.",
    "markups": [
      {
        "type": "A",
        "start": 24,
        "end": 201,
        "href": "https://medium.com/"
      },
      {
        "type": "STRONG",
        "start": 48,
        "end": 65
      },
      {
        "type": "STRONG",
        "start": 125,
        "end": 147
      },
      {
        "type": "EM",
        "start": 27,
        "end": 133
      }
    ]
  }

result:

iCloud Private Relay is [an _iCloud+ service that _**_prevents networks_**_ and servers from monitoring a person’s activity across the _**_internet_. Discover how** your app can participate in this transition to a more](https://medium.com) secure and private internet: We’ll show you how to prepare your apps, servers, and networks to work with iCloud Private Relay.

iCloud Private Relay is an _iCloud+ service that prevents networks and servers from monitoring a person’s activity across the _internet. Discover how your app can participate in this transition to a more secure and private internet: We’ll show you how to prepare your apps, servers, and networks to work with iCloud Private Relay.

@Nick0603
Copy link

Nick0603 commented May 21, 2022

第 5 到 7 & 18 到 22 在這個例子裡好處理,因為沒有交錯到;但 50-63 & 55-69 會有交錯問題,Markdown 無法用以下交錯方式表示:

我有點看不懂你說的正確組合,我想像應該正確組合會像是

code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold, _and I,_** _only i_

EM 要因為交錯而變成套用 STRONG 內部字,跟外部字
先 Note: 我發現如果 ** 或 _ 內部第一個字遇到 space 也會導致語法錯誤,所以可能再套用時候要位移一下

至於演算法,我在想是不是在 apply 樣式到文案中時候就要把文案轉成某種 decorator 形式,然後套用樣式是一層一層往裡面加,最後再一起輸出

@Nick0603
Copy link

Nick0603 commented May 21, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment