Post

Converting Medium Posts to Markdown

Creating a tool to back up my cherished Medium articles and convert them into Markdown format.

Converting Medium Posts to Markdown

ℹ️ℹ️ℹ️ The following content is translated by OpenAI.

Click here to view the original Chinese version. | 點此查看本文中文版


Converting Medium Posts to Markdown

Creating a tool to back up my cherished Medium articles and convert them into Markdown format.

[ZhgChgLi](https://github.com/ZhgChgLi){:target="_blank"} [ZMediumToMarkdown](https://github.com/ZhgChgLi/ZMediumToMarkdown){:target="_blank"}

ZhgChgLi / ZMediumToMarkdown

[EN] ZMediumToMarkdown

I’ve developed a project that allows you to easily download Medium posts and convert them to Markdown format.

Features

  • Supports downloading posts and converting them to Markdown format.
  • Allows downloading all posts from any user without requiring login access.
  • Supports downloading paid content.
  • Downloads all images from the post to local storage and converts them to local paths.
  • Parses Twitter tweet content into blockquotes.
  • Supports command line interface.
  • Converts Gist source code into Markdown code blocks.
  • Converts embedded YouTube links in posts to preview images.
  • Adjusts the last modification date of the post from Medium to the locally downloaded Markdown file.
  • Automatically skips posts that have already been downloaded if the last modification date from Medium hasn’t changed (convenient for auto-sync or backup services, saving server bandwidth and execution time).
  • Supports using GitHub Actions as an auto sync/backup service.
  • Highly optimized Markdown format for Medium.
  • Native Markdown Style Render Engine (Feel free to contribute if you have any optimization ideas! MarkupStyleRender.rb).
  • Jekyll & social share (og: tag) friendly.
  • 100% Ruby @ RubyGem.

[CH] ZMediumToMarkdown

This tool allows you to scrape content from Medium article links or all articles from a Medium user, backing them up and converting them into Markdown format along with the images in the articles.

[2022/07/18 Update]: A step-by-step guide to smoothly migrating from Medium to your own website

Key Features

  • No login or special permissions required.
  • Supports downloading and converting individual articles or all articles from a user into Markdown.
  • Downloads and backs up all images within the articles, converting them to corresponding image paths.
  • Deeply parses embedded Gist content and converts it into Markdown code blocks for the relevant language.
  • Parses Twitter content and reposts it within the articles.
  • Parses embedded YouTube videos in articles, converting them into preview images and links displayed in Markdown.
  • When downloading all articles from a user, it scans for any embedded related articles and replaces the links with local ones if found.
  • Specially optimized for Medium’s formatting styles.
  • Automatically updates the last modification/creation time of the downloaded articles to match the publication time of the corresponding Medium articles.
  • Automatically compares the last modification time of the downloaded articles; if it is not less than the last modification time of the Medium articles, it skips the download (convenient for creating an automatic sync/backup tool, saving server bandwidth/time).
  • CLI operation, supporting automation.

This project and article are for technical research purposes only. Please do not use them for any commercial or illegal activities. The author is not responsible for any illegal actions taken by others based on this.

Please ensure you have the rights to use and download the articles before backing them up.

Background

After three years of using Medium, I have published over 65 articles; all of them were written directly on the Medium platform without any other backups. Honestly, I have always been worried that issues with the Medium platform or other factors could lead to the loss of my hard work over the years.

I previously attempted to back up manually, which was tedious and time-consuming, so I was looking for a tool that could automatically back up all my articles and ideally convert them into Markdown format.

Backup Requirements

  • Markdown format.
  • Automatically download all Medium posts from a user.
  • Images from the articles should also be downloadable.
  • Must be able to parse Gist into Markdown code blocks (I heavily use Gist for embedding source code in my Medium articles, so this feature is crucial).

Backup Solutions

Official Medium

While Medium does provide an export backup feature, the export format can only be used for importing back into Medium, not in Markdown or any common format, and it does not handle embedded content like GitHub Gist.

The Medium API is not well-maintained and only offers the ability to create posts.

This is reasonable, as Medium does not want users to easily transfer content to other platforms.

Chrome Extensions

I found and tried several Chrome extensions (most of which have been removed), but they were ineffective. One required manually clicking through each article to back it up, and the parsed format had many errors and could not deeply parse Gist source code or back up all images from the articles.

medium-to-markdown command line

A developer created this using JavaScript, which achieves basic downloading and converting to Markdown functionality, but it also lacks image backup and deep parsing of Gist source code.

ZMediumToMarkdown

After struggling to find a perfect solution, I decided to create my own backup and conversion tool; it took me about three weeks of after-work time to complete it using Ruby.

Technical Details

How to get the list of articles by inputting a username?

  1. Obtain UserID: By viewing the source code of the user’s homepage (https://medium.com/@{username}), you can find the Username corresponding to the UserID. Note that Medium has reopened custom domains, so you need to handle 30X redirects.

  2. By sniffing network requests, you can discover that Medium uses GraphQL to obtain the list of articles on the homepage.

  3. Copy the Query and replace the UserID in the request information:
    1
    2
    
    HOST: https://medium.com/_/graphql
    METHOD: POST
    
  4. Obtain the Response.

You can only retrieve 10 articles at a time, so pagination is necessary.

  • Article list: You can find it in result[0]->userResult->homepagePostsConnection->posts.
  • Pagination info: You can find it in result[0]->userResult->homepagePostsConnection->pagingInfo->next. You can use homepagePostsFrom in the request to access the next page; nil indicates there are no more pages.

How to parse article content?

By examining the article’s source code, you can see that Medium uses the Apollo Client service for its architecture; the HTML is actually rendered from JavaScript. Therefore, you can look for the window.__APOLLO_STATE__ field in the <script> section of the source code, which contains the structure of the entire article’s paragraphs. Medium breaks the entire article into individual paragraphs and then renders them back into HTML using the JS engine.

What we need to do is parse this JSON, match the types to Markdown styles, and assemble the Markdown format.

Technical Challenges

One technical challenge is rendering the paragraph text styles. The structure provided by Medium looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"Paragraph": {
    "text": "code in text, and link in text, and ZhgChgLi, and bold, and I, only i",
    "markups": [
      {
        "type": "CODE",
        "start": 5,
        "end": 7
      },
      {
        "start": 18,
        "end": 22,
        "href": "http://zhgchg.li",
        "type": "LINK"
      },
      {
        "type": "STRONG",
        "start": 50,
        "end": 63
      },
      {
        "type": "EM",
        "start": 55,
        "end": 69
      }
    ]
}

This means that in the text code in text, and link in text, and ZhgChgLi, and bold, and I, only i:

1
2
3
4
- Characters 5 to 7 should be marked as code (wrapped in `Text` format).
- Characters 18 to 22 should be marked as a link (wrapped in [Text](URL) format).
- Characters 50 to 63 should be marked as bold (wrapped in *Text* format).
- Characters 55 to 69 should be marked as italic (wrapped in _Text_ format).

Characters 5 to 7 and 18 to 22 are straightforward to handle since they don’t overlap; however, 50–63 and 55–69 present an overlap issue that Markdown cannot represent in the following interleaved manner:

1
code `in` text, and [link](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, **only i_

The correct combined result should be:

1
code `in` text, and [link](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, _**_only i_

50–55 STRONG 55–63 STRONG, EM 63–69 EM.

Additionally, it’s important to note:

  • The wrapping format’s start and end must be distinguishable; for example, Strong is marked with ** at both ends, while Link starts with [ and ends with ](URL).
  • Care must be taken that there are no spaces before or after Markdown symbols when combined with strings, as this would render them ineffective.

For the complete issue, please see this.

I researched this for quite a while and am currently using an existing package to solve it: reverse_markdown.

Special thanks to my former colleagues Nick, Chun-Hsiu Liu, and James for their collaborative research. I plan to write my own native solution when I have time.

Results

Original Article -> Converted Markdown Result.

If you have any questions or feedback, feel free to contact me.


This article was first published on Medium ➡️ Click Here

Automatically converted and synchronized using ZMediumToMarkdown and Medium-to-jekyll-starter.

Improve this page on Github.

Buy me a beer

489 Total Views
Last Statistics Date: 2025-03-11 | 430 Views on Medium.
This post is licensed under CC BY 4.0 by the author.