Post

Convert Medium Posts to Markdown|Effortless Backup & Formatting Tool

Easily backup your valuable Medium articles and convert them into clean Markdown format with this simple tool, solving content preservation and formatting challenges for bloggers and writers.

Convert Medium Posts to Markdown|Effortless Backup & Formatting Tool

点击这里查看本文章简体中文版本。

點擊這裡查看本文章正體中文版本。

This post was translated with AI assistance — let me know if anything sounds off!


Converting Medium Posts to Markdown

Writing a Tool to Backup Medium Articles & Convert Them to Markdown Format

[ZhgChgLi](https://github.com/ZhgChgLi){:target="_blank"} [ZMediumToMarkdown](https://github.com/ZhgChgLi/ZMediumToMarkdown){:target="_blank"}

ZhgChgLi / ZMediumToMarkdown

[EN] ZMediumToMarkdown

I’ve written a project that lets you download Medium posts and convert them to markdown format easily.

Features

  • Support downloading posts and converting them to markdown format

  • Support downloading all posts and converting them to markdown format from any user without login access.

  • Support downloading paid content

  • Support downloading all images in a post to local storage and converting their paths to local ones

  • Support parsing Twitter tweet content into blockquotes

  • Support downloading paid content

  • Support command line interface

  • Convert Gist source code to markdown code block

  • Convert YouTube links embedded in posts to preview images

  • Adjust post’s last modification date from Medium to the local downloaded markdown file

  • Auto skip when the post has been downloaded and the last modification date from Medium hasn’t changed (convenient for auto-sync or auto-backup services, saving server bandwidth and execution time)

  • Support using Github Action as auto sync/backup service

  • Highly optimized markdown format for Medium

  • Native Markdown Style Render Engine (Feel free to contribute if you have any optimization ideas! MarkupStyleRender.rb )

  • jekyll & social share (og: tag) friendly

  • 100% Ruby @ RubyGem

[CH] ZMediumToMarkdown

A small backup tool that can crawl the content of Medium article links and all articles by a Medium user, convert them into Markdown format, and download them along with the images in the articles.

[2022/07/18 Update]: Step-by-step Guide to Seamlessly Migrate Medium to a Self-Hosted Website

Features

  • No login required, no special permissions needed

  • Support downloading and converting a single article or all articles by a user into Markdown

  • Support downloading and backing up all images within the article and converting them to corresponding image paths

  • Support deep parsing of Gist embedded within the article and convert it into Markdown Code Blocks of the corresponding language.

  • Supports parsing Twitter content and embedding it into articles

  • Support parsing YouTube videos embedded in the article, converting them into video thumbnails and links displayed in Markdown

  • When downloading all user articles, the system scans for embedded related articles and replaces the links with local ones if found.

  • Specially optimized for Medium format style

  • Automatically change the last modified/created time of the downloaded article to match the Medium article’s publish time

  • Automatically compare the last modification date of the downloaded article; if it is not earlier than the Medium article’s last modification date, skip the update.
    (This helps users create automatic Sync/Backup tools, saving server bandwidth and time.)

  • CLI Operation, Supports Automation

This project and article are for technical research only. Do not use for any commercial or illegal purposes. The author is not responsible for any illegal activities conducted using this content.

Please ensure you have the rights to use and copyright of the article before downloading and backing it up.

Origin

In my third year of managing Medium, I have published over 65 articles; all were written directly on the Medium platform without any other backups. Honestly, I have always feared that issues with Medium or other factors might cause the loss of years of hard work.

I used to back up manually, which was very boring and time-consuming. So, I have been looking for a tool that can automatically back up and download all articles, preferably with the ability to convert them into Markdown format.

Backup Requirements

  • Markdown Format

  • Automatically download all Medium posts of a User based on the User ID

  • Article images should also be downloadable for backup purposes.

  • Must be able to parse Gist into Markdown code blocks
    (My Medium heavily uses gist to embed source code, so this feature is very important)

Backup Plan

Medium Official

Although the official version provides an export backup feature, the export format can only be used for importing into Medium, not Markdown or common formats, and it does not handle embedded content such as Github Gitst … etc.

The API provided by Medium is not well maintained and only offers the Create Post function.

Makes sense, as Medium officially does not want users to easily transfer content to other platforms.

Chrome Extension

Tried several Chrome Extensions (most have been removed), but the results were poor. First, you have to manually open and back up each article one by one. Second, the parsed format has many errors, and it cannot deeply parse Gist source code or back up all images in the articles.

medium-to-markdown command line

A skilled developer wrote this in JS, enabling basic downloading and conversion to Markdown, but it still lacks image backup and deep parsing of Gist source code.

ZMediumToMarkdown

After finding no perfect solution, I decided to write a backup conversion tool myself; it took about three weeks of after-work hours using Ruby to complete.

Technical Details

How to Get a List of Articles by Entering a Username?

  1. Get UserID: View the user’s homepage source code (https://medium.com/@#{username}) to find the UserID corresponding to the Username
    Note that since Medium has reopened custom domains, you need to handle 30X redirects accordingly

  2. Sniffing network requests reveals that Medium uses GraphQL to fetch the homepage article list information

  3. Copy the Query & Replace UserID in the Request Information

1
2
HOST: https://medium.com/_/graphql
METHOD: POST
  1. Get Response

You can only fetch 10 items at a time; pagination is required.

  • Article list: can be obtained in result[0]->userResult->homepagePostsConnection->posts

  • homepagePostsFrom pagination info: can be found in result[0]->userResult->homepagePostsConnection->pagingInfo->next
    Use homepagePostsFrom in the request to access the next page. nil means there are no more pages.

How to Analyze Article Content?

After inspecting the page source, it can be seen that Medium is built using Apollo Client; the HTML is actually rendered from JS. Therefore, you can check the <script> section in the source code to find the window.__APOLLO_STATE__ field, which contains the entire article’s paragraph structure. Medium breaks your article into sentence-by-sentence paragraphs and then renders them back to HTML through the JS engine.

What we need to do is the same: parse this JSON, match the Type with Markdown styles, and assemble the Markdown format.

Technical Challenges

Here is a technical challenge when rendering paragraph text styles: Medium provides the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"Paragraph": {
    "text": "code in text, and link in text, and ZhgChgLi, and bold, and I, only i",
    "markups": [
      {
        "type": "CODE",
        "start": 5,
        "end": 7
      },
      {
        "start": 18,
        "end": 22,
        "href": "http://zhgchg.li",
        "type": "LINK"
      },
      {
        "type": "STRONG",
        "start": 50,
        "end": 63
      },
      {
        "type": "EM",
        "start": 55,
        "end": 69
      }
    ]
}

The meaning of code in text, and link in text, and ZhgChgLi, and bold, and I, only i is:

1
2
3
4
- Characters 5 to 7 should be marked as code (wrapped with `Text`)
- Characters 18 to 22 should be marked as a link (using [Text](URL) format)
- Characters 50 to 63 should be marked as bold (using *Text* format)
- Characters 55 to 69 should be marked as italic (using _Text_ format)

Sections 5 to 7 & 18 to 22 are easy to handle in this example because there is no overlap; however, sections 50–63 & 55–69 have overlapping issues, which Markdown cannot represent with the following overlapping method:

1
code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, **only i_

The correct combination results are as follows:

1
code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, _**_only i_

50–55 STRONG
55–63 STRONG, EM
63–69 EM

Also, please note:

  • The string delimiters of the packaging format must be distinguishable. Strong happens to have both the start and end as **, while for a Link, the start is [ and the end is ](URL).

  • When combining Markdown symbols with text, be careful not to include spaces before or after, or it will not work.

See the full question here.

This part has been studied for a long time. For now, we use an existing library to solve it: reverse_markdown.

Special thanks to former colleagues Nick, Chun-Hsiu Liu , James for their collaborative research. I will rewrite it natively when I have time.

Results

Original -> Converted Markdown Result

If you have any questions or feedback, feel free to contact me.


Buy me a beer

This post was originally published on Medium (View original post), and automatically converted and synced by ZMediumToMarkdown.

Improve this page on Github.

This post is licensed under CC BY 4.0 by the author.