Convert Medium Posts to Markdown|Effortless Backup & Formatting Tool
Easily backup your valuable Medium articles and convert them into clean Markdown format with this simple tool, solving content preservation and formatting challenges for bloggers and writers.
点击这里查看本文章简体中文版本。
點擊這裡查看本文章正體中文版本。
This post was translated with AI assistance — let me know if anything sounds off!
Converting Medium Posts to Markdown
Writing a Tool to Backup Medium Articles & Convert Them to Markdown Format
[EN] ZMediumToMarkdown
I’ve written a project that lets you download Medium posts and convert them to markdown format easily.
Features
Support downloading posts and converting them to markdown format
Support downloading all posts and converting them to markdown format from any user without login access.
Support downloading paid content
Support downloading all images in a post to local storage and converting their paths to local ones
Support parsing Twitter tweet content into blockquotes
Support downloading paid content
Support command line interface
Convert Gist source code to markdown code block
Convert YouTube links embedded in posts to preview images
Adjust post’s last modification date from Medium to the local downloaded markdown file
Auto skip when the post has been downloaded and the last modification date from Medium hasn’t changed (convenient for auto-sync or auto-backup services, saving server bandwidth and execution time)
Highly optimized markdown format for Medium
Native Markdown Style Render Engine (Feel free to contribute if you have any optimization ideas!
MarkupStyleRender.rb
)jekyll & social share (og: tag) friendly
100% Ruby @ RubyGem
[CH] ZMediumToMarkdown
A small backup tool that can crawl the content of Medium article links and all articles by a Medium user, convert them into Markdown format, and download them along with the images in the articles.
[2022/07/18 Update]: Step-by-step Guide to Seamlessly Migrate Medium to a Self-Hosted Website
Features
No login required, no special permissions needed
Support downloading and converting a single article or all articles by a user into Markdown
Support downloading and backing up all images within the article and converting them to corresponding image paths
Support deep parsing of Gist embedded within the article and convert it into Markdown Code Blocks of the corresponding language.
Supports parsing Twitter content and embedding it into articles
Support parsing YouTube videos embedded in the article, converting them into video thumbnails and links displayed in Markdown
When downloading all user articles, the system scans for embedded related articles and replaces the links with local ones if found.
Specially optimized for Medium format style
Automatically change the last modified/created time of the downloaded article to match the Medium article’s publish time
Automatically compare the last modification date of the downloaded article; if it is not earlier than the Medium article’s last modification date, skip the update.
(This helps users create automatic Sync/Backup tools, saving server bandwidth and time.)CLI Operation, Supports Automation
This project and article are for technical research only. Do not use for any commercial or illegal purposes. The author is not responsible for any illegal activities conducted using this content.
Please ensure you have the rights to use and copyright of the article before downloading and backing it up.
Origin
In my third year of managing Medium, I have published over 65 articles; all were written directly on the Medium platform without any other backups. Honestly, I have always feared that issues with Medium or other factors might cause the loss of years of hard work.
I used to back up manually, which was very boring and time-consuming. So, I have been looking for a tool that can automatically back up and download all articles, preferably with the ability to convert them into Markdown format.
Backup Requirements
Markdown Format
Automatically download all Medium posts of a User based on the User ID
Article images should also be downloadable for backup purposes.
Must be able to parse Gist into Markdown code blocks
(My Medium heavily uses gist to embed source code, so this feature is very important)
Backup Plan
Medium Official
Although the official version provides an export backup feature, the export format can only be used for importing into Medium, not Markdown or common formats, and it does not handle embedded content such as Github Gitst … etc.
The API provided by Medium is not well maintained and only offers the Create Post function.
Makes sense, as Medium officially does not want users to easily transfer content to other platforms.
Chrome Extension
Tried several Chrome Extensions (most have been removed), but the results were poor. First, you have to manually open and back up each article one by one. Second, the parsed format has many errors, and it cannot deeply parse Gist source code or back up all images in the articles.
medium-to-markdown command line
A skilled developer wrote this in JS, enabling basic downloading and conversion to Markdown, but it still lacks image backup and deep parsing of Gist source code.
ZMediumToMarkdown
After finding no perfect solution, I decided to write a backup conversion tool myself; it took about three weeks of after-work hours using Ruby to complete.
Technical Details
How to Get a List of Articles by Entering a Username?
Get UserID: View the user’s homepage source code (https://medium.com/@#{username}) to find the
UserID
corresponding to theUsername
Note that since Medium has reopened custom domains, you need to handle 30X redirects accordinglySniffing network requests reveals that Medium uses GraphQL to fetch the homepage article list information
Copy the Query & Replace UserID in the Request Information
1
2
HOST: https://medium.com/_/graphql
METHOD: POST
- Get Response
You can only fetch 10 items at a time; pagination is required.
Article list: can be obtained in
result[0]->userResult->homepagePostsConnection->posts
homepagePostsFrom
pagination info: can be found inresult[0]->userResult->homepagePostsConnection->pagingInfo->next
UsehomepagePostsFrom
in the request to access the next page.nil
means there are no more pages.
How to Analyze Article Content?
After inspecting the page source, it can be seen that Medium is built using Apollo Client; the HTML is actually rendered from JS. Therefore, you can check the <script> section in the source code to find the window.__APOLLO_STATE__
field, which contains the entire article’s paragraph structure. Medium breaks your article into sentence-by-sentence paragraphs and then renders them back to HTML through the JS engine.
What we need to do is the same: parse this JSON, match the Type with Markdown styles, and assemble the Markdown format.
Technical Challenges
Here is a technical challenge when rendering paragraph text styles: Medium provides the following structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"Paragraph": {
"text": "code in text, and link in text, and ZhgChgLi, and bold, and I, only i",
"markups": [
{
"type": "CODE",
"start": 5,
"end": 7
},
{
"start": 18,
"end": 22,
"href": "http://zhgchg.li",
"type": "LINK"
},
{
"type": "STRONG",
"start": 50,
"end": 63
},
{
"type": "EM",
"start": 55,
"end": 69
}
]
}
The meaning of code in text, and link in text, and ZhgChgLi, and bold, and I, only i
is:
1
2
3
4
- Characters 5 to 7 should be marked as code (wrapped with `Text`)
- Characters 18 to 22 should be marked as a link (using [Text](URL) format)
- Characters 50 to 63 should be marked as bold (using *Text* format)
- Characters 55 to 69 should be marked as italic (using _Text_ format)
Sections 5 to 7 & 18 to 22 are easy to handle in this example because there is no overlap; however, sections 50–63 & 55–69 have overlapping issues, which Markdown cannot represent with the following overlapping method:
1
code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, **only i_
The correct combination results are as follows:
1
code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, _**_only i_
50–55 STRONG
55–63 STRONG, EM
63–69 EM
Also, please note:
The string delimiters of the packaging format must be distinguishable. Strong happens to have both the start and end as
**
, while for a Link, the start is[
and the end is](URL)
.When combining Markdown symbols with text, be careful not to include spaces before or after, or it will not work.
This part has been studied for a long time. For now, we use an existing library to solve it: reverse_markdown.
Special thanks to former colleagues Nick, Chun-Hsiu Liu , James for their collaborative research. I will rewrite it natively when I have time.
Results
Original -> Converted Markdown Result
If you have any questions or feedback, feel free to contact me.
This post was originally published on Medium (View original post), and automatically converted and synced by ZMediumToMarkdown.