Post

The Journey of Building a Custom HTML Parser

A detailed account of developing the ZMarkupParser HTML to NSAttributedString rendering engine

The Journey of Building a Custom HTML Parser

ℹ️ℹ️ℹ️ The following content is translated by OpenAI.

Click here to view the original Chinese version. | 點此查看本文中文版


The Journey of Building a Custom HTML Parser

A detailed account of developing the ZMarkupParser HTML to NSAttributedString rendering engine.

This article covers the tokenization of HTML strings, normalization processes, the generation of an Abstract Syntax Tree, the application of the Visitor and Builder patterns, and some additional thoughts.

Continuation

Last year, I published an article titled “TL;DR on Implementing iOS NSAttributedString HTML Rendering,” which briefly introduced how to use XMLParser to parse HTML and convert it into NSAttributedString.Key. The structure and ideas presented in that article were quite scattered, as it was more of a record of the issues I encountered at the time, and I didn’t spend much time researching the topic.

Convert HTML String to NSAttributedString

Revisiting this topic, we need a way to convert HTML strings provided by the API into NSAttributedString and apply the corresponding styles for display in UITextView/UILabel.

For example, <b>Test<a>Link</a></b> should be displayed as Test Link.

  • Note 1: It is not recommended to use HTML as a rendering medium for communication between the app and data, as the HTML specification is too flexible. The app cannot support all HTML styles, and there is no official HTML conversion rendering engine.
  • Note 2: Starting from iOS 14, you can use the official native AttributedString to parse Markdown or import the apple/swift-markdown Swift Package to parse Markdown.
  • Note 3: Due to the large scale of our company’s project and the long-standing use of HTML as a medium, we cannot fully switch to Markdown or other markup languages at this time.
  • Note 4: The HTML here is not intended to display an entire HTML webpage; it is merely used as a styled Markdown rendering string. (To render a full page of complex HTML, including images and tables, you still need to use WebView to load HTML.)

I strongly recommend using Markdown as the string rendering medium. If your project faces similar challenges and you have to use HTML without an elegant tool for converting to NSAttributedString, then please proceed with caution.

Friends who remember the previous article can jump directly to the ZhgChgLi / ZMarkupParser section.

NSAttributedString.DocumentType.html

Most methods found online for converting HTML to NSAttributedString involve directly using the options provided by NSAttributedString to render HTML. Here’s an example:

1
2
3
4
5
6
7
let htmlString = "<b>Test<a>Link</a></b>"
let data = htmlString.data(using: String.Encoding.utf8)!
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey: Any] = [
  .documentType: NSAttributedString.DocumentType.html,
  .characterEncoding: String.Encoding.utf8.rawValue
]
let attributedString = try! NSAttributedString(data: data, options: attributedOptions, documentAttributes: nil)

Problems with this approach:

  • Poor performance: This method renders styles through the WebView Core and then switches back to the Main Thread for UI display; rendering over 300 characters takes about 0.03 seconds.
  • Text loss: For example, marketing copy might use <Congratulation!>, which would be treated as an HTML tag and removed.
  • Lack of customization: For instance, you cannot specify the degree of boldness for bold HTML in NSAttributedString.
  • Random crashes starting from iOS 12 with no official solution.
  • A significant number of crashes appeared in iOS 15, with tests showing that under low battery conditions, it crashes 100% of the time (fixed in iOS ≥ 15.2).
  • Long strings cause crashes; testing shows that inputting strings longer than 54,600 characters results in a 100% crash (EXC_BAD_ACCESS).

The most painful issue for us remains the crashing problem. From the release of iOS 15 until the fix in 15.2, our app was consistently plagued by this issue. Data shows that from March 11, 2022, to June 8, 2022, it caused over 2.4K crashes, affecting more than 1.4K users.

This crashing issue has existed since iOS 12, and iOS 15 merely encountered a larger pitfall. However, I suspect that the fix in iOS 15.2 is just a patch; the official team cannot eradicate the root cause.

The next issue is performance. As a markup language for string styles, it is heavily used in UILabel/UITextView throughout the app. As mentioned earlier, rendering a single label takes 0.03 seconds, and multiplying that across a list of UILabels/UITextViews can lead to noticeable lag in user interactions.

XMLParser

The second solution is to use XMLParser to parse the HTML into corresponding NSAttributedString keys and apply styles, as introduced in the previous article.

You can refer to the implementation of SwiftRichString and the content of the previous article for more details.

The previous article only explored the possibility of using XMLParser to parse HTML and perform corresponding conversions, completing an experimental implementation without designing it as a well-structured and extensible “tool.”

Problems with this approach:

  • Zero fault tolerance: HTML like <br>, <Congratulation!>, and <b>Bold<i>Bold+Italic</b>Italic</i> can lead to errors in XMLParser, throwing an error and displaying a blank result.
  • When using XMLParser, the HTML string must strictly adhere to XML rules, unlike browsers or NSAttributedString.DocumentType.html, which can display with some fault tolerance.

Standing on the Shoulders of Giants

Neither of the above solutions perfectly and elegantly solves the HTML problem, so I began searching for existing solutions.

  • johnxnguyen / Down only supports converting Markdown to Any (XML/NSAttributedString…) but does not support converting HTML.
  • malcommac / SwiftRichString uses XMLParser under the hood, and tests show that it has the same zero fault tolerance issues as mentioned earlier.
  • scinfu / SwiftSoup only supports HTML parsing (Selector) and does not support conversion to NSAttributedString as noted in this issue.

After searching extensively, I found that the results were similar to the projects mentioned above. There were no giants’ shoulders to stand on.

ZhgChgLi/ZMarkupParser

With no giants’ shoulders to rely on, I had to become the giant myself and developed a tool to convert HTML strings to NSAttributedString.

Developed entirely in Swift, it uses Regex to parse HTML tags and undergoes tokenization, analyzing and correcting tag validity (fixing missing end tags and misaligned tags), converting to an abstract syntax tree, and finally using the Visitor Pattern to map HTML tags to abstract styles, resulting in the final NSAttributedString. This tool does not rely on any parser libraries.

Features

  • Supports HTML rendering (to NSAttributedString), stripping HTML tags, and selector functionality.
  • Higher performance than NSAttributedString.DocumentType.html.
  • Automatically analyzes and corrects tag validity (fixing missing end tags and misaligned tags).
  • Supports dynamic style settings from style="color:red...".
  • Allows customization of style specifications, such as how bold a bold tag should be.
  • Supports flexible extensibility for tags or custom tags and attributes.

For detailed introduction and installation instructions, please refer to this article: “ZMarkupParser HTML String to NSAttributedString Tool”.

You can directly git clone the project and open the ZMarkupParser.xcworkspace project, selecting the ZMarkupParser-Demo target to build and run it for experimentation.

[ZMarkupParser](https://github.com/ZhgChgLi/ZMarkupParser){:target="_blank"}

ZMarkupParser

Technical Details

Next, I want to share the technical details regarding the development of this tool.

Overview of Operation Process

The above image illustrates the general operation process. Subsequent articles will introduce each step and include code snippets.

⚠️️️️️️ This article will simplify demo code, reduce abstraction, and focus on explaining the operational principles. For the final results, please refer to the project source code.

Code Implementation — Tokenization

a.k.a. parser, parsing

When it comes to HTML rendering, the most crucial aspect is the parsing stage. Previously, I used XMLParser to treat HTML as XML for parsing; however, it couldn’t overcome the fact that everyday HTML usage is not 100% XML-compliant, leading to parser errors and an inability to dynamically correct them.

After ruling out the use of XMLParser, the only option left in Swift was to use Regex for matching and parsing.

Initially, I thought I could directly use regex to extract “paired” HTML tags and recursively search for HTML tags layer by layer until completion. However, this approach does not address the nesting of HTML tags or the need for fault tolerance for misaligned tags. Therefore, we changed our strategy to extract “single” HTML tags, recording whether they are start tags, close tags, or self-closing tags, along with other string combinations to form the parsing result array.

The Tokenization Structure is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
enum HTMLParsedResult {
    case start(StartItem) // <a>
    case close(CloseItem) // </a>
    case selfClosing(SelfClosingItem) // <br/>
    case rawString(NSAttributedString)
}

extension HTMLParsedResult {
    class SelfClosingItem {
        let tagName: String
        let tagAttributedString: NSAttributedString
        let attributes: [String: String]?
        
        init(tagName: String, tagAttributedString: NSAttributedString, attributes: [String : String]?) {
            self.tagName = tagName
            self.tagAttributedString = tagAttributedString
            self.attributes = attributes
        }
    }
    
    class StartItem {
        let tagName: String
        let tagAttributedString: NSAttributedString
        let attributes: [String: String]?

        // Start Tag may be an abnormal HTML Tag or normal text e.g. <Congratulation!>. After normalization, if it is found to be an isolated Start Tag, it will be marked as True.
        var isIsolated: Bool = false
        
        init(tagName: String, tagAttributedString: NSAttributedString, attributes: [String : String]?) {
            self.tagName = tagName
            self.tagAttributedString = tagAttributedString
            self.attributes = attributes
        }
        
        // For subsequent normalization automatic correction
        func convertToCloseParsedItem() -> CloseItem {
            return CloseItem(tagName: self.tagName)
        }
        
        // For subsequent normalization automatic correction
        func convertToSelfClosingParsedItem() -> SelfClosingItem {
            return SelfClosingItem(tagName: self.tagName, tagAttributedString: self.tagAttributedString, attributes: self.attributes)
        }
    }
    
    class CloseItem {
        let tagName: String
        init(tagName: String) {
            self.tagName = tagName
        }
    }
}

The regex used is as follows:

1
<(?:(?<closeTag>\/)?(?<tagName>[A-Za-z0-9]+)(?<tagAttributes>(?:\s*(\w+)\s*=\s*(["|']).*?\5)*)\s*(?<selfClosingTag>\/)?>)

-> Online Regex101 Playground

  • closeTag: Matches < / a>
  • tagName: Matches < a > or , </ a >
  • tagAttributes: Matches <a href=”https://zhgchg.li” style=”color:red” >
  • selfClosingTag: Matches <br / >

*This regex can still be optimized further; I will address that later.

The latter part of the article provides additional information about regex for those interested.

Putting it all together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
var tokenizationResult: [HTMLParsedResult] = []

let expression = try? NSRegularExpression(pattern: pattern, options: expressionOptions)
let attributedString = NSAttributedString(string: "<a>Li<b>nk</a>Bold</b>")
let totalLength = attributedString.string.utf16.count // utf-16 support emoji
var lastMatch: NSTextCheckingResult?

// Start Tags Stack, First In Last Out (FILO)
// Check if the HTML string requires subsequent normalization to correct misalignment or add Self-Closing Tags
var stackStartItems: [HTMLParsedResult.StartItem] = []
var needForamatter: Bool = false

expression.enumerateMatches(in: attributedString.string, range: NSMakeRange(0, totalLength)) { match, _, _ in
  if let match = match {
    // Check the string between tags or the string before the first tag
    // e.g. Test<a>Link</a>zzz<b>bold</b>Test2 - > Test,zzz
    let lastMatchEnd = lastMatch?.range.upperBound ?? 0
    let currentMatchStart = match.range.lowerBound
    if currentMatchStart > lastMatchEnd {
      let rawStringBetweenTag = attributedString.attributedSubstring(from: NSMakeRange(lastMatchEnd, (currentMatchStart - lastMatchEnd)))
      tokenizationResult.append(.rawString(rawStringBetweenTag))
    }

    // <a href="https://zhgchg.li">, </a>
    let matchAttributedString = attributedString.attributedSubstring(from: match.range)
    // a, a
    let matchTag = attributedString.attributedSubstring(from: match.range(withName: "tagName"))?.string.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
    // false, true
    let matchIsEndTag = matchResult.attributedString(from: match.range(withName: "closeTag"))?.string.trimmingCharacters(in: .whitespacesAndNewlines) == "/"
    // href="https://zhgchg.li", nil
    // Use regex to further extract HTML attributes, to [String: String], please refer to Source Code
    let matchTagAttributes = parseAttributes(matchResult.attributedString(from: match.range(withName: "tagAttributes")))
    // false, false
    let matchIsSelfClosingTag = matchResult.attributedString(from: match.range(withName: "selfClosingTag"))?.string.trimmingCharacters(in: .whitespacesAndNewlines) == "/"

    if let matchAttributedString = matchAttributedString,
       let matchTag = matchTag {
        if matchIsSelfClosingTag {
          // e.g. <br/>
          tokenizationResult.append(.selfClosing(.init(tagName: matchTag, tagAttributedString: matchAttributedString, attributes: matchTagAttributes)))
        } else {
          // e.g. <a> or </a>
          if matchIsEndTag {
            // e.g. </a>
            // Retrieve the position of the same TagName from the Stack, starting from the last
            if let index = stackStartItems.lastIndex(where: { $0.tagName == matchTag }) {
              // If it's not the last one, it indicates there is a misalignment or a missing closing tag
              if index != stackStartItems.count - 1 {
                  needForamatter = true
              }
              tokenizationResult.append(.close(.init(tagName: matchTag)))
              stackStartItems.remove(at: index)
            } else {
              // Extra close tag e.g </a>
              // Does not affect subsequent processing, simply ignore
            }
          } else {
            // e.g. <a>
            let startItem: HTMLParsedResult.StartItem = HTMLParsedResult.StartItem(tagName: matchTag, tagAttributedString: matchAttributedString, attributes: matchTagAttributes)
            tokenizationResult.append(.start(startItem))
            // Push to Stack
            stackStartItems.append(startItem)
          }
        }
     }

    lastMatch = match
  }
}

// Check the RawString at the end
// e.g. Test<a>Link</a>Test2 - > Test2
if let lastMatch = lastMatch {
  let currentIndex = lastMatch.range.upperBound
  if totalLength > currentIndex {
    // There are remaining strings
    let resetString = attributedString.attributedSubstring(from: NSMakeRange(currentIndex, (totalLength - currentIndex)))
    tokenizationResult.append(.rawString(resetString))
  }
} else {
  // lastMatch = nil, indicating no tags were found, all are plain text
  let resetString = attributedString.attributedSubstring(from: NSMakeRange(0, totalLength))
  tokenizationResult.append(.rawString(resetString))
}

Here’s the translated text in naturalistic English while preserving the original markdown image sources:


Check if the Stack is empty. If it isn’t, it means there are Start Tags without corresponding End Tags, which should be marked as isolated Start Tags.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for stackStartItem in stackStartItems {
  stackStartItem.isIsolated = true
  needForamatter = true
}

print(tokenizationResult)
// [
//    .start("a",["href":"https://zhgchg.li"])
//    .rawString("Li")
//    .start("b",nil)
//    .rawString("nk")
//    .close("a")
//    .rawString("Bold")
//    .close("b")
// ]

The operation flow is shown in the image above

The operation flow is shown in the image above.

In the end, we will obtain a Tokenization result array.

Corresponding to the source code in HTMLStringToParsedResultProcessor.swift implementation.

Normalization

Also known as Formatter, normalization.

After obtaining the preliminary parsing results in the previous step, if we find that normalization is still needed during the parsing, this step is required to automatically correct HTML Tag issues.

There are three types of HTML Tag issues:

  • HTML Tag missing a Close Tag: for example, <br>
  • Regular text being treated as an HTML Tag: for example, <Congratulation!>
  • HTML Tag misalignment issues: for example, <a>Li<b>nk</a>Bold</b>

The correction method is quite simple; we need to traverse the elements of the Tokenization result and attempt to fill in the gaps.

The operation flow is shown in the image above

The operation flow is shown in the image above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
var normalizationResult = tokenizationResult

// Start Tags Stack, First In Last Out (FILO)
var stackExpectedStartItems: [HTMLParsedResult.StartItem] = []
var itemIndex = 0
while itemIndex < newItems.count {
    switch newItems[itemIndex] {
    case .start(let item):
        if item.isIsolated {
            // If it is an isolated Start Tag
            if WC3HTMLTagName(rawValue: item.tagName) == nil && (item.attributes?.isEmpty ?? true) {
                // If it is not a W3C defined HTML Tag & has no HTML Attributes
                // Refer to the WC3HTMLTagName Enum in the Source Code
                // Considered as regular text treated as an HTML Tag
                // Change to raw string type
                normalizationResult[itemIndex] = .rawString(item.tagAttributedString)
            } else {
                // Otherwise, change to self-closing tag, e.g. <br> -> <br/>
                normalizationResult[itemIndex] = .selfClosing(item.convertToSelfClosingParsedItem())
            }
            itemIndex += 1
        } else {
            // Normal Start Tag, add to Stack
            stackExpectedStartItems.append(item)
            itemIndex += 1
        }
    case .close(let item):
        // Encounter Close Tag
        // Get the Tags between the Start Stack Tag and this Close Tag
        // e.g <a><u><b>[CurrentIndex]</a></u></b> -> gap 0
        // e.g <a><u><b>[CurrentIndex]</a></u></b> -> gap b,u

        let reversedStackExpectedStartItems = Array(stackExpectedStartItems.reversed())
        guard let reversedStackExpectedStartItemsOccurredIndex = reversedStackExpectedStartItems.firstIndex(where: { $0.tagName == item.tagName }) else {
            itemIndex += 1
            continue
        }
        
        let reversedStackExpectedStartItemsOccurred = Array(reversedStackExpectedStartItems.prefix(upTo: reversedStackExpectedStartItemsOccurredIndex))
        
        // Gap 0 means the tag is not misaligned
        guard reversedStackExpectedStartItemsOccurred.count != 0 else {
            // It's a pair, pop
            stackExpectedStartItems.removeLast()
            itemIndex += 1
            continue
        }
        
        // If there are other gaps, automatically insert the missing Tags
        // e.g <a><u><b>[CurrentIndex]</a></u></b> ->
        // e.g <a><u><b>[CurrentIndex]</b></u></a><b></u></u></b>
        let stackExpectedStartItemsOccurred = Array(reversedStackExpectedStartItemsOccurred.reversed())
        let afterItems = stackExpectedStartItemsOccurred.map({ HTMLParsedResult.start($0) })
        let beforeItems = reversedStackExpectedStartItemsOccurred.map({ HTMLParsedResult.close($0.convertToCloseParsedItem()) })
        normalizationResult.insert(contentsOf: afterItems, at: newItems.index(after: itemIndex))
        normalizationResult.insert(contentsOf: beforeItems, at: itemIndex)
        
        itemIndex = newItems.index(after: itemIndex) + stackExpectedStartItemsOccurred.count
        
        // Update Start Stack Tags
        // e.g. -> b,u
        stackExpectedStartItems.removeAll { startItem in
            return reversedStackExpectedStartItems.prefix(through: reversedStackExpectedStartItemsOccurredIndex).contains(where: { $0 === startItem })
        }
    case .selfClosing, .rawString:
        itemIndex += 1
    }
}

print(normalizationResult)
// [
//    .start("a",["href":"https://zhgchg.li"])
//    .rawString("Li")
//    .start("b",nil)
//    .rawString("nk")
//    .close("b")
//    .close("a")
//    .start("b",nil)
//    .rawString("Bold")
//    .close("b")
// ]

Corresponding to the source code in HTMLParsedResultFormatterProcessor.swift implementation.

Abstract Syntax Tree

Also known as AST, Abstract Tree.

After completing the data preprocessing with Tokenization & Normalization, we will now convert the results into an abstract tree 🌲.

As shown in the image above

As shown in the image above.

Converting to an abstract tree allows us to facilitate future operations and expansions, such as implementing Selector functionality or performing other transformations, like HTML to Markdown; or if we want to add Markdown to NSAttributedString in the future, we just need to implement Markdown’s Tokenization & Normalization to achieve it.

First, we define a Markup Protocol with Child & Parent properties to record information about leaves and branches:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
protocol Markup: AnyObject {
    var parentMarkup: Markup? { get set }
    var childMarkups: [Markup] { get set }
    
    func appendChild(markup: Markup)
    func prependChild(markup: Markup)
    func accept<V: MarkupVisitor>(_ visitor: V) -> V.Result
}

extension Markup {
    func appendChild(markup: Markup) {
        markup.parentMarkup = self
        childMarkups.append(markup)
    }
    
    func prependChild(markup: Markup) {
        markup.parentMarkup = self
        childMarkups.insert(markup, at: 0)
    }
}

Additionally, we use the Visitor Pattern to define each style property as an object Element, and through different Visit strategies, we can obtain individual application results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
protocol MarkupVisitor {
    associatedtype Result
        
    func visit(markup: Markup) -> Result
    
    func visit(_ markup: RootMarkup) -> Result
    func visit(_ markup: RawStringMarkup) -> Result
    
    func visit(_ markup: BoldMarkup) -> Result
    func visit(_ markup: LinkMarkup) -> Result
    //...
}

extension MarkupVisitor {
    func visit(markup: Markup) -> Result {
        return markup.accept(self)
    }
}

Basic Markup Nodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Root Node
final class RootMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

// Leaf Node
final class RawStringMarkup: Markup {
    let attributedString: NSAttributedString
    
    init(attributedString: NSAttributedString) {
        self.attributedString = attributedString
    }
    
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

Defining Markup Style Nodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Branch Node:

// Link Style
final class LinkMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

// Bold Style
final class BoldMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

Corresponding to the source code in Markup implementation.

Before converting to an abstract tree, we still need to…

MarkupComponent

Because our tree structure does not depend on any data structure (for example, a node/LinkMarkup should have URL information to proceed with rendering). To address this, we define a container to store tree nodes and their related data information:

1
2
3
4
5
6
7
8
9
10
11
12
13
protocol MarkupComponent {
    associatedtype T
    var markup: Markup { get }
    var value: T { get }
    
    init(markup: Markup, value: T)
}

extension Sequence where Iterator.Element: MarkupComponent {
    func value(markup: Markup) -> Element.T? {
        return self.first(where:{ $0.markup === markup })?.value as? Element.T
    }
}

Corresponding to the source code in MarkupComponent implementation.

We could also declare Markup as Hashable and directly use a Dictionary to store values [Markup: Any], but this would prevent Markup from being used as a regular type, requiring the addition of any Markup.

HTMLTag & HTMLTagName & HTMLTagNameVisitor

We also abstract the HTML Tag Name part, allowing users to decide which Tags need to be processed, making future expansions easier. For example, the <strong> Tag Name can also correspond to BoldMarkup.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public protocol HTMLTagName {
    var string: String { get }
    func accept<V: HTMLTagNameVisitor>(_ visitor: V) -> V.Result
}

public struct A_HTMLTagName: HTMLTagName {
    public let string: String = WC3HTMLTagName.a.rawValue
    
    public init() {
        
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagNameVisitor {
        return visitor.visit(self)
    }
}

public struct B_HTMLTagName: HTMLTagName {
    public let string: String = WC3HTMLTagName.b.rawValue
    
    public init() {
        
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagNameVisitor {
        return visitor.visit(self)
    }
}

Corresponding to the source code in HTMLTagNameVisitor implementation.

Additionally, refer to the W3C wiki which lists the HTML tag name enum: WC3HTMLTagName.swift.

HTMLTag is simply a container object, as we want to allow external specification of the styles corresponding to HTML Tags, so we declare a container to hold them together:

1
2
3
4
5
6
7
8
9
struct HTMLTag {
    let tagName: HTMLTagName
    let customStyle: MarkupStyle? // To be explained in the Render section later
    
    init(tagName: HTMLTagName, customStyle: MarkupStyle? = nil) {
        self.tagName = tagName
        self.customStyle = customStyle
    }
}

Corresponding to the source code in HTMLTag implementation.

HTMLTagNameToHTMLMarkupVisitor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct HTMLTagNameToMarkupVisitor: HTMLTagNameVisitor {
    typealias Result = Markup
    
    let attributes: [String: String]?
    
    func visit(_ tagName: A_HTMLTagName) -> Result {
        return LinkMarkup()
    }
    
    func visit(_ tagName: B_HTMLTagName) -> Result {
        return BoldMarkup()
    }
    //...
}

Corresponding to the source code in HTMLTagNameToHTMLMarkupVisitor implementation.

Converting to an Abstract Tree with HTML Data

We need to convert the normalized HTML data results into an abstract tree. First, we declare a data structure for MarkupComponent that can hold HTML data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct HTMLElementMarkupComponent: MarkupComponent {
    struct HTMLElement {
        let tag: HTMLTag
        let tagAttributedString: NSAttributedString
        let attributes: [String: String]?
    }
    
    typealias T = HTMLElement
    
    let markup: Markup
    let value: HTMLElement
    init(markup: Markup, value: HTMLElement) {
        self.markup = markup
        self.value = value
    }
}

Converting to a Markup Abstract Tree:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
var htmlElementComponents: [HTMLElementMarkupComponent] = []
let rootMarkup = RootMarkup()
var currentMarkup: Markup = rootMarkup

let htmlTags: [String: HTMLTag]
init(htmlTags: [HTMLTag]) {
  self.htmlTags = Dictionary(uniqueKeysWithValues: htmlTags.map{ ($0.tagName.string, $0) })
}

// Start Tags Stack, ensuring correct pop tag
// Normalization has already been done, so there shouldn't be any errors, just ensuring
var stackExpectedStartItems: [HTMLParsedResult.StartItem] = []
for thisItem in from {
    switch thisItem {
    case .start(let item):
        let visitor = HTMLTagNameToMarkupVisitor(attributes: item.attributes)
        let htmlTag = self.htmlTags[item.tagName] ?? HTMLTag(tagName: ExtendTagName(item.tagName))
        // Use Visitor to ask for the corresponding Markup
        let markup = visitor.visit(tagName: htmlTag.tagName)
        
        // Add itself as the current branch's leaf node
        // It becomes the current branch node
        htmlElementComponents.append(.init(markup: markup, value: .init(tag: htmlTag, tagAttributedString: item.tagAttributedString, attributes: item.attributes)))
        currentMarkup.appendChild(markup: markup)
        currentMarkup = markup
        
        stackExpectedStartItems.append(item)
    case .selfClosing(let item):
        // Directly add as the current branch's leaf node
        let visitor = HTMLTagNameToMarkupVisitor(attributes: item.attributes)
        let htmlTag = self.htmlTags[item.tagName] ?? HTMLTag(tagName: ExtendTagName(item.tagName))
        let markup = visitor.visit(tagName: htmlTag.tagName)
        htmlElementComponents.append(.init(markup: markup, value: .init(tag: htmlTag, tagAttributedString: item.tagAttributedString, attributes: item.attributes)))
        currentMarkup.appendChild(markup: markup)
    case .close(let item):
        if let lastTagName = stackExpectedStartItems.popLast()?.tagName,
           lastTagName == item.tagName {
            // Encounter Close Tag, go back to the previous level
            currentMarkup = currentMarkup.parentMarkup ?? currentMarkup
        }
    case .rawString(let attributedString):
        // Directly add as the current branch's leaf node
        currentMarkup.appendChild(markup: RawStringMarkup(attributedString: attributedString))
    }
}

// print(htmlElementComponents)
// [(markup: LinkMarkup, (tag: a, attributes: ["href":"zhgchg.li"]...)]

The operation result is shown in the image above

The operation result is shown in the image above.

Corresponding to the source code in HTMLParsedResultToHTMLElementWithRootMarkupProcessor.swift implementation.

At this point, we have actually completed the Selector functionality 🎉

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public class HTMLSelector: CustomStringConvertible {
    
    let markup: Markup
    let components: [HTMLElementMarkupComponent]
    init(markup: Markup, components: [HTMLElementMarkupComponent]) {
        self.markup = markup
        self.components = components
    }
    
    public func filter(_ htmlTagName: String) -> [HTMLSelector] {
        let result = markup.childMarkups.filter({ components.value(markup: $0)?.tag.tagName.isEqualTo(htmlTagName) ?? false })
        return result.map({ .init(markup: $0, components: components) })
    }

    //...
}

We can filter leaf node objects layer by layer.

Corresponding to the source code in HTMLSelector implementation.

Parser — HTML to MarkupStyle (Abstract of NSAttributedString.Key)

Next, we need to complete the conversion of HTML to MarkupStyle (NSAttributedString.Key).

NSAttributedString uses NSAttributedString.Key Attributes to set text styles. We abstract all fields of NSAttributedString.Key to MarkupStyle, MarkupStyleColor, MarkupStyleFont, and MarkupStyleParagraphStyle.

Purpose:

  • The original Attributes data structure is [NSAttributedString.Key: Any?]. If we expose it directly, it becomes difficult to control the values users input. If they input incorrectly, it could lead to crashes, such as .font: 123.
  • Styles need to be inheritable, for example, <a><b>test</b></a>, the style of the string “test” inherits from the link’s bold (bold + link); if we expose the Dictionary directly, it becomes difficult to manage inheritance rules.
  • Encapsulate iOS/macOS (UIKit/AppKit) related objects.

This translation maintains the original structure and meaning while ensuring the text flows naturally in English.

MarkupStyle Struct

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
public struct MarkupStyle {
    public var font: MarkupStyleFont
    public var paragraphStyle: MarkupStyleParagraphStyle
    public var foregroundColor: MarkupStyleColor? = nil
    public var backgroundColor: MarkupStyleColor? = nil
    public var ligature: NSNumber? = nil
    public var kern: NSNumber? = nil
    public var tracking: NSNumber? = nil
    public var strikethroughStyle: NSUnderlineStyle? = nil
    public var underlineStyle: NSUnderlineStyle? = nil
    public var strokeColor: MarkupStyleColor? = nil
    public var strokeWidth: NSNumber? = nil
    public var shadow: NSShadow? = nil
    public var textEffect: String? = nil
    public var attachment: NSTextAttachment? = nil
    public var link: URL? = nil
    public var baselineOffset: NSNumber? = nil
    public var underlineColor: MarkupStyleColor? = nil
    public var strikethroughColor: MarkupStyleColor? = nil
    public var obliqueness: NSNumber? = nil
    public var expansion: NSNumber? = nil
    public var writingDirection: NSNumber? = nil
    public var verticalGlyphForm: NSNumber? = nil
    //...

    // Inherits from...
    // Default: If fields are nil, fill in from the provided object
    mutating func fillIfNil(from: MarkupStyle?) {
        guard let from = from else { return }
        
        var currentFont = self.font
        currentFont.fillIfNil(from: from.font)
        self.font = currentFont
        
        var currentParagraphStyle = self.paragraphStyle
        currentParagraphStyle.fillIfNil(from: from.paragraphStyle)
        self.paragraphStyle = currentParagraphStyle
        //..
    }

    // Convert MarkupStyle to NSAttributedString.Key: Any
    func render() -> [NSAttributedString.Key: Any] {
        var data: [NSAttributedString.Key: Any] = [:]
        
        if let font = font.getFont() {
            data[.font] = font
        }

        if let ligature = self.ligature {
            data[.ligature] = ligature
        }
        //...
        return data
    }
}

public struct MarkupStyleFont: MarkupStyleItem {
    public enum FontWeight {
        case style(FontWeightStyle)
        case rawValue(CGFloat)
    }
    public enum FontWeightStyle: String {
        case ultraLight, light, thin, regular, medium, semibold, bold, heavy, black
        // ...
    }
    
    public var size: CGFloat?
    public var weight: FontWeight?
    public var italic: Bool?
    //...
}

public struct MarkupStyleParagraphStyle: MarkupStyleItem {
    public var lineSpacing: CGFloat? = nil
    public var paragraphSpacing: CGFloat? = nil
    public var alignment: NSTextAlignment? = nil
    public var headIndent: CGFloat? = nil
    public var tailIndent: CGFloat? = nil
    public var firstLineHeadIndent: CGFloat? = nil
    public var minimumLineHeight: CGFloat? = nil
    public var maximumLineHeight: CGFloat? = nil
    public var lineBreakMode: NSLineBreakMode? = nil
    public var baseWritingDirection: NSWritingDirection? = nil
    public var lineHeightMultiple: CGFloat? = nil
    public var paragraphSpacingBefore: CGFloat? = nil
    public var hyphenationFactor: Float? = nil
    public var usesDefaultHyphenation: Bool? = nil
    public var tabStops: [NSTextTab]? = nil
    public var defaultTabInterval: CGFloat? = nil
    public var textLists: [NSTextList]? = nil
    public var allowsDefaultTighteningForTruncation: Bool? = nil
    public var lineBreakStrategy: NSParagraphStyle.LineBreakStrategy? = nil
    //...
}

public struct MarkupStyleColor {
    let red: Int
    let green: Int
    let blue: Int
    let alpha: CGFloat
    //...
}

Corresponding to the original code’s MarkupStyle implementation

Additionally, refer to the W3C wiki, which lists corresponding color names and their RGB values: MarkupStyleColorName.swift

HTMLTagStyleAttribute & HTMLTagStyleAttributeVisitor

I would like to mention these two objects because HTML Tags can be styled using CSS. We apply the same abstraction from HTMLTagName to HTML Style Attributes.

For example, HTML might provide: <a style="color:red;font-size:14px">RedLink</a>, indicating that this link should be styled in red with a font size of 14px.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public protocol HTMLTagStyleAttribute {
    var styleName: String { get }
    
    func accept<V: HTMLTagStyleAttributeVisitor>(_ visitor: V) -> V.Result
}

public protocol HTMLTagStyleAttributeVisitor {
    associatedtype Result
    
    func visit(styleAttribute: HTMLTagStyleAttribute) -> Result
    func visit(_ styleAttribute: ColorHTMLTagStyleAttribute) -> Result
    func visit(_ styleAttribute: FontSizeHTMLTagStyleAttribute) -> Result
    //...
}

public extension HTMLTagStyleAttributeVisitor {
    func visit(styleAttribute: HTMLTagStyleAttribute) -> Result {
        return styleAttribute.accept(self)
    }
}

Corresponding to the original code’s HTMLTagStyleAttribute implementation

HTMLTagStyleAttributeToMarkupStyleVisitor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct HTMLTagStyleAttributeToMarkupStyleVisitor: HTMLTagStyleAttributeVisitor {
    typealias Result = MarkupStyle?
    
    let value: String
    
    func visit(_ styleAttribute: ColorHTMLTagStyleAttribute) -> Result {
        // Use regex to extract Color Hex or map from HTML Pre-defined Color Name; see Source Code
        guard let color = MarkupStyleColor(string: value) else { return nil }
        return MarkupStyle(foregroundColor: color)
    }
    
    func visit(_ styleAttribute: FontSizeHTMLTagStyleAttribute) -> Result {
        // Use regex to extract 10px -> 10; see Source Code
        guard let size = self.convert(fromPX: value) else { return nil }
        return MarkupStyle(font: MarkupStyleFont(size: CGFloat(size)))
    }
    // ...
}

Corresponding to the original code’s HTMLTagAttributeToMarkupStyleVisitor.swift implementation

The init value corresponds to the attribute’s value, which is converted to the corresponding MarkupStyle field based on the visit type.

HTMLElementMarkupComponentMarkupStyleVisitor

After introducing the MarkupStyle object, we will convert the results from Normalization’s HTMLElementComponents into MarkupStyle.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
// MarkupStyle Strategy
public enum MarkupStylePolicy {
    case respectMarkupStyleFromCode // Prioritize styles from Code, filling in from HTML Style Attributes
    case respectMarkupStyleFromHTMLStyleAttribute // Prioritize styles from HTML Style Attributes, filling in from Code
}

struct HTMLElementMarkupComponentMarkupStyleVisitor: MarkupVisitor {

    typealias Result = MarkupStyle?
    
    let policy: MarkupStylePolicy
    let components: [HTMLElementMarkupComponent]
    let styleAttributes: [HTMLTagStyleAttribute]

    func visit(_ markup: BoldMarkup) -> Result {
        // .bold is just a default style defined in MarkupStyle; see Source Code
        return defaultVisit(components.value(markup: markup), defaultStyle: .bold)
    }
    
    func visit(_ markup: LinkMarkup) -> Result {
        // .link is just a default style defined in MarkupStyle; see Source Code
        var markupStyle = defaultVisit(components.value(markup: markup), defaultStyle: .link) ?? .link
        
        // Retrieve the corresponding HtmlElement from HtmlElementComponents for LinkMarkup
        // Look for the href parameter in the HtmlElement's attributes (the way HTML carries URL Strings)
        if let href = components.value(markup: markup)?.attributes?["href"] as? String,
           let url = URL(string: href) {
            markupStyle.link = url
        }
        return markupStyle
    }

    // ...
}

extension HTMLElementMarkupComponentMarkupStyleVisitor {
    // Retrieve the specified custom MarkupStyle from the HTMLTag container
    private func customStyle(_ htmlElement: HTMLElementMarkupComponent.HTMLElement?) -> MarkupStyle? {
        guard let customStyle = htmlElement?.tag.customStyle else {
            return nil
        }
        return customStyle
    }
    
    // Default action
    func defaultVisit(_ htmlElement: HTMLElementMarkupComponent.HTMLElement?, defaultStyle: MarkupStyle? = nil) -> Result {
        var markupStyle: MarkupStyle? = customStyle(htmlElement) ?? defaultStyle
        // Retrieve the corresponding HtmlElement for LinkMarkup from HtmlElementComponents
        // Check if the HtmlElement's attributes contain a `Style` Attribute
        guard let styleString = htmlElement?.attributes?["style"],
              styleAttributes.count > 0 else {
            // None
            return markupStyle
        }

        // There are Style Attributes
        // Split the Style Value string into an array
        // font-size:14px;color:red -> ["font-size":"14px","color":"red"]
        let styles = styleString.split(separator: ";").filter { $0.trimmingCharacters(in: .whitespacesAndNewlines) != "" }.map { $0.split(separator: ":") }
        
        for style in styles {
            guard style.count == 2 else {
                continue
            }
            // e.g. font-size
            let key = style[0].trimmingCharacters(in: .whitespacesAndNewlines)
            // e.g. 14px
            let value = style[1].trimmingCharacters(in: .whitespacesAndNewlines)
            
            if let styleAttribute = styleAttributes.first(where: { $0.isEqualTo(styleName: key) }) {
                // Use the previously mentioned HTMLTagStyleAttributeToMarkupStyleVisitor to convert back to MarkupStyle
                let visitor = HTMLTagStyleAttributeToMarkupStyleVisitor(value: value)
                if var thisMarkupStyle = visitor.visit(styleAttribute: styleAttribute) {
                    // If the Style Attribute has a value...
                    // Merge the previous MarkupStyle result
                    thisMarkupStyle.fillIfNil(from: markupStyle)
                    markupStyle = thisMarkupStyle
                }
            }
        }
        
        // If there is a default Style
        if var defaultStyle = defaultStyle {
            switch policy {
                case .respectMarkupStyleFromHTMLStyleAttribute:
                  // Style Attribute MarkupStyle takes precedence, then
                  // Merge the defaultStyle result
                    markupStyle?.fillIfNil(from: defaultStyle)
                case .respectMarkupStyleFromCode:
                  // defaultStyle takes precedence, then
                  // Merge the Style Attribute MarkupStyle result
                  defaultStyle.fillIfNil(from: markupStyle)
                  markupStyle = defaultStyle
            }
        }
        
        return markupStyle
    }
}

Corresponding to the original code’s HTMLTagAttributeToMarkupStyleVisitor.swift implementation

We will define some default styles in MarkupStyle, which will be used when certain Markup does not have externally specified styles.

There are two style inheritance strategies:

  • respectMarkupStyleFromCode: Use the default style as the primary; then see what styles can be supplemented from Style Attributes, ignoring existing values.
  • respectMarkupStyleFromHTMLStyleAttribute: Use the Style Attributes as the primary; then see what styles can be supplemented from the default style, ignoring existing values.

HTMLElementWithMarkupToMarkupStyleProcessor

This converts the normalization results into an AST & MarkupStyleComponent.

We declare a new MarkupComponent to store the corresponding MarkupStyle:

1
2
3
4
5
6
7
8
9
10
struct MarkupStyleComponent: MarkupComponent {
    typealias T = MarkupStyle
    
    let markup: Markup
    let value: MarkupStyle
    init(markup: Markup, value: MarkupStyle) {
        self.markup = markup
        self.value = value
    }
}

A simple traversal of the Markup Tree & HTMLElementMarkupComponent structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
let styleAttributes: [HTMLTagStyleAttribute]
let policy: MarkupStylePolicy
    
func process(from: (Markup, [HTMLElementMarkupComponent])) -> [MarkupStyleComponent] {
  var components: [MarkupStyleComponent] = []
  let visitor = HTMLElementMarkupComponentMarkupStyleVisitor(policy: policy, components: from.1, styleAttributes: styleAttributes)
  walk(markup: from.0, visitor: visitor, components: &components)
  return components
}
    
func walk(markup: Markup, visitor: HTMLElementMarkupComponentMarkupStyleVisitor, components: inout [MarkupStyleComponent]) {
        
  if let markupStyle = visitor.visit(markup: markup) {
    components.append(.init(markup: markup, value: markupStyle))
  }
        
  for markup in markup.childMarkups {
    walk(markup: markup, visitor: visitor, components: &components)
  }
}

// print(components)
// [(markup: LinkMarkup, MarkupStyle(link: https://zhgchg.li, color: .blue)]
// [(markup: BoldMarkup, MarkupStyle(font: .init(weight: .bold))]

Corresponding to the original code’s HTMLElementWithMarkupToMarkupStyleProcessor.swift implementation

The process result is shown in the image above

Render — Convert To NSAttributedString

Now that we have the abstract tree structure of HTML Tags and the corresponding MarkupStyle, the final step is to produce the final NSAttributedString rendering result.

MarkupNSAttributedStringVisitor

Visit markup to NSAttributedString

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
struct MarkupNSAttributedStringVisitor: MarkupVisitor {
    typealias Result = NSAttributedString
    
    let components: [MarkupStyleComponent]
    // Root/base MarkupStyle, specified externally, e.g., can specify the size of the entire text
    let rootStyle: MarkupStyle?
    
    func visit(_ markup: RootMarkup) -> Result {
        // Look down to the RawString object
        return collectAttributedString(markup)
    }
    
    func visit(_ markup: RawStringMarkup) -> Result {
        // Return Raw String
        // Collect all MarkupStyles along the chain
        // Apply Style to NSAttributedString
        return applyMarkupStyle(markup.attributedString, with: collectMarkupStyle(markup))
    }
    
    func visit(_ markup: BoldMarkup) -> Result {
        // Look down to the RawString object
        return collectAttributedString(markup)
    }
    
    func visit(_ markup: LinkMarkup) -> Result {
        // Look down to the RawString object
        return collectAttributedString(markup)
    }
    // ...
}

private extension MarkupNSAttributedStringVisitor {
    // Apply Style to NSAttributedString
    func applyMarkupStyle(_ attributedString: NSAttributedString, with markupStyle: MarkupStyle?) -> NSAttributedString {
        guard let markupStyle = markupStyle else { return attributedString }
        let mutableAttributedString = NSMutableAttributedString(attributedString: attributedString)
        mutableAttributedString.addAttributes(markupStyle.render(), range: NSMakeRange(0, mutableAttributedString.string.utf16.count))
        return mutableAttributedString
    }

    func collectAttributedString(_ markup: Markup) -> NSMutableAttributedString {
        // Collect from downstream
        // Root -> Bold -> String("Bold")
        //      \
        //       > String("Test")
        // Result: Bold Test
        // Recursively visit and combine the final NSAttributedString layer by layer
        return markup.childMarkups.compactMap({ visit(markup: $0) }).reduce(NSMutableAttributedString()) { partialResult, attributedString in
            partialResult.append(attributedString)
            return partialResult
        }
    }
    
    func collectMarkupStyle(_ markup: Markup) -> MarkupStyle? {
        // Collect from upstream
        // String("Test") -> Bold -> Italic -> Root
        // Result: style: Bold+Italic
        // Recursively find the parent tag's markup style
        // Then inherit styles layer by layer
        var currentMarkup: Markup? = markup.parentMarkup
        var currentStyle = components.value(markup: markup)
        while let thisMarkup = currentMarkup {
            guard let thisMarkupStyle = components.value(markup: thisMarkup) else {
                currentMarkup = thisMarkup.parentMarkup
                continue
            }

            if var thisCurrentStyle = currentStyle {
                thisCurrentStyle.fillIfNil(from: thisMarkupStyle)
                currentStyle = thisCurrentStyle
            } else {
                currentStyle = thisMarkupStyle
            }

            currentMarkup = thisMarkup.parentMarkup
        }
        
        if var currentStyle = currentStyle {
            currentStyle.fillIfNil(from: rootStyle)
            return currentStyle
        } else {
            return rootStyle
        }
    }
}

Corresponding to the original code’s MarkupNSAttributedStringVisitor.swift implementation

Operation flow and results are shown in the image above

Ultimately, we can achieve:

1
2
3
4
5
6
7
8
9
10
11
Li{
    NSColor = "Blue";
    NSFont = "<UICTFont: 0x145d17600> font-family: \".SFUI-Regular\"; font-weight: normal; font-style: normal; font-size: 13.00pt";
    NSLink = "https://zhgchg.li";
}nk{
    NSColor = "Blue";
    NSFont = "<UICTFont: 0x145d18710> font-family: \".SFUI-Semibold\"; font-weight: bold; font-style: normal; font-size: 13.00pt";
    NSLink = "https://zhgchg.li";
}Bold{
    NSFont = "<UICTFont: 0x145d18710> font-family: \".SFUI-Semibold\"; font-weight: bold; font-style: normal; font-size: 13.00pt";
}

🎉🎉🎉🎉 Completed 🎉🎉🎉🎉

We have now completed the entire process of converting an HTML String to NSAttributedString.

Stripper — Removing HTML Tags

Removing HTML Tags is relatively simple; it only requires:

1
2
3
4
5
6
7
8
9
10
func attributedString(_ markup: Markup) -> NSAttributedString {
  if let rawStringMarkup = markup as? RawStringMarkup {
    return rawStringMarkup.attributedString
  } else {
    return markup.childMarkups.compactMap({ attributedString($0) }).reduce(NSMutableAttributedString()) { partialResult, attributedString in
      partialResult.append(attributedString)
      return partialResult
    }
  }
}

Corresponding to the original code’s MarkupStripperProcessor.swift implementation

This is similar to Render, but purely returns the content after finding RawStringMarkup.

Extend — Dynamic Expansion

To expand the coverage of all HTML tags and style attributes, a dynamic extension point was created to facilitate the direct dynamic expansion of objects from code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public struct ExtendTagName: HTMLTagName {
    public let string: String
    
    public init(_ w3cHTMLTagName: WC3HTMLTagName) {
        self.string = w3cHTMLTagName.rawValue
    }
    
    public init(_ string: String) {
        self.string = string.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagNameVisitor {
        return visitor.visit(self)
    }
}
// to
final class ExtendMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []

    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

//----

public struct ExtendHTMLTagStyleAttribute: HTMLTagStyleAttribute {
    public let styleName: String
    public let render: ((String) -> (MarkupStyle?)) // Dynamic closure to change MarkupStyle
    
    public init(styleName: String, render: @escaping ((String) -> (MarkupStyle?))) {
        self.styleName = styleName
        self.render = render
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagStyleAttributeVisitor {
        return visitor.visit(self)
    }
}

ZHTMLParserBuilder

Finally, we use the Builder Pattern to allow external modules to quickly construct the objects needed for ZMarkupParser, while also managing access level control.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
public final class ZHTMLParserBuilder {
    
    private(set) var htmlTags: [HTMLTag] = []
    private(set) var styleAttributes: [HTMLTagStyleAttribute] = []
    private(set) var rootStyle: MarkupStyle?
    private(set) var policy: MarkupStylePolicy = .respectMarkupStyleFromCode
    
    public init() {
        
    }
    
    public static func initWithDefault() -> Self {
        var builder = Self.init()
        for htmlTagName in ZHTMLParserBuilder.htmlTagNames {
            builder = builder.add(htmlTagName)
        }
        for styleAttribute in ZHTMLParserBuilder.styleAttributes {
            builder = builder.add(styleAttribute)
        }
        return builder
    }
    
    public func set(_ htmlTagName: HTMLTagName, withCustomStyle markupStyle: MarkupStyle?) -> Self {
        return self.add(htmlTagName, withCustomStyle: markupStyle)
    }
    
    public func add(_ htmlTagName: HTMLTagName, withCustomStyle markupStyle: MarkupStyle? = nil) -> Self {
        // Only one instance of the same tagName can exist
        htmlTags.removeAll { htmlTag in
            return htmlTag.tagName.string == htmlTagName.string
        }
        
        htmlTags.append(HTMLTag(tagName: htmlTagName, customStyle: markupStyle))
        
        return self
    }
    
    public func add(_ styleAttribute: HTMLTagStyleAttribute) -> Self {
        styleAttributes.removeAll { thisStyleAttribute in
            return thisStyleAttribute.styleName == styleAttribute.styleName
        }
        
        styleAttributes.append(styleAttribute)
        
        return self
    }
    
    public func set(rootStyle: MarkupStyle) -> Self {
        self.rootStyle = rootStyle
        return self
    }
    
    public func set(policy: MarkupStylePolicy) -> Self {
        self.policy = policy
        return self
    }
    
    public func build() -> ZHTMLParser {
        // ZHTMLParser init is only accessible internally; it cannot be directly initialized from outside
        // It can only be initialized through ZHTMLParserBuilder
        return ZHTMLParser(htmlTags: htmlTags, styleAttributes: styleAttributes, policy: policy, rootStyle: rootStyle)
    }
}

Corresponding implementation in the original source code: ZHTMLParserBuilder.swift

The initWithDefault method will by default add all implemented HTMLTagName/Style Attributes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public extension ZHTMLParserBuilder {
    static var htmlTagNames: [HTMLTagName] {
        return [
            A_HTMLTagName(),
            B_HTMLTagName(),
            BR_HTMLTagName(),
            DIV_HTMLTagName(),
            HR_HTMLTagName(),
            I_HTMLTagName(),
            LI_HTMLTagName(),
            OL_HTMLTagName(),
            P_HTMLTagName(),
            SPAN_HTMLTagName(),
            STRONG_HTMLTagName(),
            U_HTMLTagName(),
            UL_HTMLTagName(),
            DEL_HTMLTagName(),
            TR_HTMLTagName(),
            TD_HTMLTagName(),
            TH_HTMLTagName(),
            TABLE_HTMLTagName(),
            IMG_HTMLTagName(handler: nil),
            // ...
        ]
    }
}

public extension ZHTMLParserBuilder {
    static var styleAttributes: [HTMLTagStyleAttribute] {
        return [
            ColorHTMLTagStyleAttribute(),
            BackgroundColorHTMLTagStyleAttribute(),
            FontSizeHTMLTagStyleAttribute(),
            FontWeightHTMLTagStyleAttribute(),
            LineHeightHTMLTagStyleAttribute(),
            WordSpacingHTMLTagStyleAttribute(),
            // ...
        ]
    }
}

The ZHTMLParser initialization is only accessible internally; it cannot be directly initialized from outside, and can only be initialized through ZHTMLParserBuilder.

ZHTMLParser encapsulates Render/Selector/Stripper operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
public final class ZHTMLParser: ZMarkupParser {
    let htmlTags: [HTMLTag]
    let styleAttributes: [HTMLTagStyleAttribute]
    let rootStyle: MarkupStyle?

    internal init(...) {
    }
    
    // Retrieve link style attributes
    public var linkTextAttributes: [NSAttributedString.Key: Any] {
        // ...
    }
    
    public func selector(_ string: String) -> HTMLSelector {
        // ...
    }
    
    public func selector(_ attributedString: NSAttributedString) -> HTMLSelector {
        // ...
    }
    
    public func render(_ string: String) -> NSAttributedString {
        // ...
    }
    
    // Allows rendering of NSAttributedString within nodes using HTMLSelector results
    public func render(_ selector: HTMLSelector) -> NSAttributedString {
        // ...
    }
    
    public func render(_ attributedString: NSAttributedString) -> NSAttributedString {
        // ...
    }
    
    public func stripper(_ string: String) -> String {
        // ...
    }
    
    public func stripper(_ attributedString: NSAttributedString) -> NSAttributedString {
        // ...
    }
    
  // ...
}

Corresponding implementation in the original source code: ZHTMLParser.swift

UIKit Issues

The result of NSAttributedString is most commonly displayed in a UITextView, but there are some important considerations:

  • The link style in UITextView is uniformly determined by the linkTextAttributes setting, and it does not consider the settings of NSAttributedString.Key, nor can individual styles be set; hence the need for the ZMarkupParser.linkTextAttributes property.
  • Currently, UILabel does not have a way to change link styles, and since UILabel does not have NSTextStorage, if you want to load NSTextAttachment images, you need to handle UILabel separately.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public extension UITextView {
    func setHtmlString(_ string: String, with parser: ZHTMLParser) {
        self.setHtmlString(NSAttributedString(string: string), with: parser)
    }
    
    func setHtmlString(_ string: NSAttributedString, with parser: ZHTMLParser) {
        self.attributedText = parser.render(string)
        self.linkTextAttributes = parser.linkTextAttributes
    }
}
public extension UILabel {
    func setHtmlString(_ string: String, with parser: ZHTMLParser) {
        self.setHtmlString(NSAttributedString(string: string), with: parser)
    }
    
    func setHtmlString(_ string: NSAttributedString, with parser: ZHTMLParser) {
        let attributedString = parser.render(string)
        attributedString.enumerateAttribute(NSAttributedString.Key.attachment, in: NSMakeRange(0, attributedString.string.utf16.count), options: []) { (value, effectiveRange, nil) in
            guard let attachment = value as? ZNSTextAttachment else {
                return
            }
            
            attachment.register(self)
        }
        
        self.attributedText = attributedString
    }
}

Thus, we extended UIKit, allowing external modules to simply call setHTMLString() to complete the binding.

Complex Rendering Items — Item Lists

Here’s a record of the implementation regarding item lists.

Using <ol> / <ul> to wrap <li> in HTML represents an item list:

1
2
3
4
5
6
<ul>
    <li>ItemA</li>
    <li>ItemB</li>
    <li>ItemC</li>
    //...
</ul>

Using the parsing method mentioned earlier, we can retrieve other list items in visit(_ markup: ListItemMarkup) to know the current list index (thanks to the conversion to AST).

1
2
3
4
func visit(_ markup: ListItemMarkup) -> Result {
  let siblingListItems = markup.parentMarkup?.childMarkups.filter({ $0 is ListItemMarkup }) ?? []
  let position = (siblingListItems.firstIndex(where: { $0 === markup }) ?? 0)
}

NSParagraphStyle has an NSTextList object that can be used to display list items, but in practice, it cannot be customized for the width of the whitespace (I personally feel the whitespace is too large). If there is whitespace between the bullet and the string, it can trigger a line break, which may look a bit odd, as shown in the image below:

The better part can potentially be achieved by setting headIndent, firstLineHeadIndent, and NSTextTab, but testing revealed that if the string is too long or the size changes, it still does not present a perfect result.

Currently, we have only achieved an acceptable solution by manually inserting the item list string before the string.

We only use NSTextList.MarkerFormat to generate the list item symbols, rather than using NSTextList directly.

For a list of supported list symbols, refer to: MarkupStyleList.swift

Final display result: ( <ol><li> )

Complex Rendering Items — Tables

Similar to the implementation of item lists, but for tables.

Using <table> to wrap <tr> for table rows, and wrapping <td>/<th> for table columns in HTML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<table>
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
</table>

Testing revealed that the native NSAttributedString.DocumentType.html uses the private macOS API NSTextBlock to display, thus being able to fully render HTML table styles and content.

A bit of a cheat! We cannot use private APIs 🥲

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
func visit(_ markup: TableColumnMarkup) -> Result {
    let attributedString = collectAttributedString(markup)
    let siblingColumns = markup.parentMarkup?.childMarkups.filter({ $0 is TableColumnMarkup }) ?? []
    let position = (siblingColumns.firstIndex(where: { $0 === markup }) ?? 0)
    
    // Optionally specify the desired width from the outside; can set to .max to avoid truncating the string
    var maxLength: Int? = markup.fixedMaxLength
    if maxLength == nil {
        // If not specified, find the length of the string in the first row of the same column as the max length
        if let tableRowMarkup = markup.parentMarkup as? TableRowMarkup,
           let firstTableRow = tableRowMarkup.parentMarkup?.childMarkups.first(where: { $0 is TableRowMarkup }) as? TableRowMarkup {
            let firstTableRowColumns = firstTableRow.childMarkups.filter({ $0 is TableColumnMarkup })
            if firstTableRowColumns.indices.contains(position) {
                let firstTableRowColumnAttributedString = collectAttributedString(firstTableRowColumns[position])
                let length = firstTableRowColumnAttributedString.string.utf16.count
                maxLength = length
            }
        }
    }
    
    if let maxLength = maxLength {
        // If the column exceeds maxLength, truncate the string
        if attributedString.string.utf16.count > maxLength {
            attributedString.mutableString.setString(String(attributedString.string.prefix(maxLength))+"...")
        } else {
            attributedString.mutableString.setString(attributedString.string.padding(toLength: maxLength, withPad: " ", startingAt: 0))
        }
    }
    
    if position < siblingColumns.count - 1 {
        // Add whitespace as spacing; the external can specify how many spaces to use for spacing
        attributedString.append(makeString(in: markup, string: String(repeating: " ", count: markup.spacing)))
    }
    
    return attributedString
}

func visit(_ markup: TableRowMarkup) -> Result {
    let attributedString = collectAttributedString(markup)
    attributedString.append(makeBreakLine(in: markup)) // Add a line break; see source code for details
    return attributedString
}

func visit(_ markup: TableMarkup) -> Result {
    let attributedString = collectAttributedString(markup)
    attributedString.append(makeBreakLine(in: markup)) // Add a line break; see source code for details
    attributedString.insert(makeBreakLine(in: markup), at: 0) // Add a line break; see source code for details
    return attributedString
}

Final presentation effect as shown below:

Not perfect, but acceptable.

Complex Rendering Items — Images

Finally, the biggest challenge is loading remote images into NSAttributedString.

Using <img> in HTML to represent an image:

1
<img src="https://user-images.githubusercontent.com/33706588/219608966-20e0c017-d05c-433a-9a52-091bc0cfd403.jpg" width="300" height="125"/>

You can specify the desired display size through the width / height HTML attributes.

Displaying images in NSAttributedString is more complex than expected, and there isn’t a perfect implementation. Previously, when working on UITextView text wrapping, I encountered some pitfalls, but after further research, I found that there still isn’t a perfect solution.

Currently, we ignore the native NSTextAttachment’s inability to reuse and release memory issues, and instead implement downloading images from remote sources, placing them into NSTextAttachment, and ensuring content updates automatically.

This series of operations has been broken down into another small project for easier optimization and reuse in other projects:

The main reference is the series of articles on Asynchronous NSTextAttachments, but I modified the final content update part (after downloading, the UI needs to refresh to display) and added Delegate/DataSource for external extensibility.

The operational flow and relationships are shown in the image above

The operational flow and relationships are as follows:

  • Declare a ZNSTextAttachmentable object that encapsulates the NSTextStorage object (which UITextView has) and the UILabel itself (since UILabel lacks NSTextStorage). The operation method is solely to implement replace attributedString from NSRange. (func replace(attachment: ZNSTextAttachment, to: ZResizableNSTextAttachment))
  • The principle is to first use ZNSTextAttachment to wrap the imageURL, PlaceholderImage, and the size information to be displayed, then directly show the image using the placeholder.
  • When the system requires this image on the screen, it will call the image(forBounds…) method, at which point we start downloading the image data.
  • The DataSource allows external customization for how to download or implement image cache policies, with the default being a direct URLSession request for image data.
  • After downloading, a new ZResizableNSTextAttachment is created, and the logic for setting the custom image size is implemented in attachmentBounds(for…).
  • Call the replace(attachment: ZNSTextAttachment, to: ZResizableNSTextAttachment) method to replace the position of ZNSTextAttachment with ZResizableNSTextAttachment.
  • Emit a didLoad Delegate notification, allowing external connections if needed.
  • Done.

For detailed code, refer to the Source Code.

The reason for not using NSLayoutManager.invalidateLayout(forCharacterRange: range, actualCharacterRange: nil) or NSLayoutManager.invalidateDisplay(forCharacterRange: range) to refresh the UI is that it was found that the UI did not correctly display updates; since we already know the range, directly triggering the replacement of NSAttributedString ensures the UI updates correctly.

The final display result is as follows:

1
2
<span style="color:red">こんにちは</span>こんにちはこんにちは <br />
<img src="https://user-images.githubusercontent.com/33706588/219608966-20e0c017-d05c-433a-9a52-091bc0cfd403.jpg"/>

Testing & Continuous Integration

This project not only involved writing unit tests but also established snapshot tests for integration testing, making it easier to compare the final NSAttributedString in a comprehensive manner.

The main functional logic is covered by unit tests, along with integration tests, resulting in a final Test Coverage of around 85%.

[ZMarkupParser — codecov](https://app.codecov.io/gh/ZhgChgLi/ZMarkupParser){:target="_blank"}

ZMarkupParser — codecov

Snapshot Test

Directly import the framework:

1
2
3
4
5
6
7
8
9
10
11
12
13
import SnapshotTesting
// ...
func testShouldKeepNSAttributedString() {
  let parser = ZHTMLParserBuilder.initWithDefault().build()
  let textView = UITextView()
  textView.frame.size.width = 390
  textView.isScrollEnabled = false
  textView.backgroundColor = .white
  textView.setHtmlString("html string...", with: parser)
  textView.layoutIfNeeded()
  assertSnapshot(matching: textView, as: .image, record: false)
}
// ...

This directly compares the final result to ensure that adjustments made during integration do not cause any issues.

Codecov Test Coverage

Integrating with Codecov.io (free for public repositories) allows for evaluating test coverage. You just need to install the Codecov GitHub App and set it up.

Once the Codecov and GitHub repository are set up, you can also add a codecov.yml file in the root directory:

1
2
3
4
5
6
comment:                  # this is a top-level key
  layout: "reach, diff, flags, files"
  behavior: default
  require_changes: false  # if true: only post the comment if coverage changes
  require_base: no        # [yes :: must have a base report to post]
  require_head: yes       # [yes :: must have a head report to post]

This configuration enables automatic comments on the CI results after each PR is submitted.

Continuous Integration

GitHub Action CI integration: ci.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
name: CI

on:
  workflow_dispatch:
  pull_request:
    types: [opened, reopened]
  push:
    branches:
    - main

jobs:
  build:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v3
      - name: spm build and test
        run: |
          set -o pipefail
          xcodebuild test -workspace ZMarkupParser.xcworkspace -testPlan ZMarkupParser -scheme ZMarkupParser -enableCodeCoverage YES -resultBundlePath './scripts/TestResult.xcresult' -destination 'platform=iOS Simulator,name=iPhone 14,OS=16.1' build test | xcpretty
      - name: Codecov
        uses: codecov/codecov-action@v3.1.1
        with:
          xcode: true
          xcode_archive_path: './scripts/TestResult.xcresult'

This configuration runs the build and test when a PR is opened/reopened or when pushing to the main branch, and uploads the test coverage report to Codecov.

Regex

Regarding regular expressions, each time I use them, I refine my understanding; although I didn’t use them extensively this time, I initially wanted to extract paired HTML tags, which led me to research how to write them.

Here are some cheat sheet notes from what I learned this time:

  • ?: allows ( ) to match group results without capturing them. e.g. (?:https?:\/\/)?(?:www\.)?example\.com will return the entire URL in https://www.example.com instead of just https:// and www.
  • .+? performs a non-greedy match (returns the closest match). e.g. <.+?> will return <a> and </a> in <a>test</a> instead of the entire string.
  • (?=XYZ) matches any string until the string XYZ appears; note that [^XYZ] represents any string until the characters X, Y, or Z appear. e.g. (?:__)(.+?(?=__))(?:__) (matches any string until __) will match test.
  • ?R recursively looks for values that match the same rule. e.g. \((?:[^()]|((?R)))+\) will match (simple) and (and(nested)) in (simple) (and(nested)), including (nested).
  • ?<GroupName>\k<GroupName> matches the previous group name. e.g. (?<tagName><a>).*(\k<GroupName>)
  • (?(X)yes|no) matches the condition yes if the Xth match has a value (can also use group names), otherwise matches no. Swift does not currently support this.

Other good Regex resources:

Swift Package Manager & Cocoapods

This was also my first time developing with SPM and Cocoapods… it was quite interesting. SPM is really convenient; however, if two projects depend on the same package, opening both projects simultaneously can lead to one of them not finding the package and failing to build.

Cocoapods has uploaded ZMarkupParser, but I haven’t tested its functionality since I used SPM. 😝

ChatGPT

In my actual development experience, I found it most useful for assisting with editing the README. I haven’t felt a significant impact during development; when asking mid-senior level questions, it often doesn’t provide accurate answers and sometimes gives incorrect ones (I encountered this when asking about regex rules, and the answers were not quite right). So, I ultimately returned to Google for accurate solutions.

Not to mention asking it to write code; unless it’s for simple code generation objects, don’t expect it to complete an entire tool structure. (At least for now, it seems that Copilot might be more helpful for writing code.)

However, it can provide a general direction for knowledge gaps, allowing us to quickly understand how certain things should be done. Sometimes, when our grasp is too weak, it can be difficult to quickly locate the correct direction on Google, and that’s when ChatGPT becomes quite helpful.

Declaration

After more than three months of research and development, I am exhausted, but I want to clarify that this approach is merely a feasible result of my research and may not be the best solution or may still have areas for optimization. This project is more like a stepping stone, hoping to achieve a perfect solution for converting a markup language to NSAttributedString. Contributions are very welcome; many aspects still need the power of the community to improve.

Contributing

[ZMarkupParser](https://github.com/ZhgChgLi/ZMarkupParser){:target="_blank"} [⭐](https://github.com/ZhgChgLi/ZMarkupParser){:target="_blank"}

ZMarkupParser

Here are some areas I think could be improved as of now (2023/03/12), which I will document in the repo:

  1. Performance/algorithm optimization; although it’s faster and more stable than the native NSAttributedString.DocumentType.html, there is still room for improvement. I believe its performance is definitely not on par with XMLParser. I hope one day it can achieve the same performance while maintaining customization and automatic error correction.
  2. Support for more HTML tags and style attribute conversions.
  3. Further optimization of ZNSTextAttachment to implement reuse capabilities and release memory; may need to research CoreText.
  4. Support for Markdown parsing, as the underlying abstraction is not limited to HTML; thus, once the Markdown to Markup object is established, Markdown parsing can be completed. That’s why I named it ZMarkupParser instead of ZHTMLParser, hoping that one day it can also support Markdown to NSAttributedString.
  5. Support for Any to Any conversions, e.g., HTML to Markdown, Markdown to HTML, since we have the original AST tree (Markup object), so implementing conversions between any markup is possible.
  6. Implement CSS !important functionality to enhance the inheritance strategy of MarkupStyle.
  7. Strengthen HTML Selector functionality; currently, it only has the most basic filtering capabilities.
  8. So many more improvements; feel free to open an issue.

If you feel you can’t contribute but want to help, you can give me a ⭐ to make the repo more visible, which may lead to contributions from GitHub experts!

Summary

[ZMarkupParser](https://github.com/ZhgChgLi/ZMarkupParser){:target="_blank"}

ZMarkupParser

This concludes all the technical details and my journey in developing ZMarkupParser. It took me nearly three months of after-work and weekend time, countless research and practical processes, writing tests, improving test coverage, and establishing CI; finally, I have a somewhat presentable result. I hope this tool helps those who face similar challenges, and I look forward to everyone working together to make it even better.

[pinkoi.com](https://www.pinkoi.com){:target="_blank"}

pinkoi.com

Currently, it is applied in our company’s iOS app on pinkoi.com, and I haven’t encountered any issues. 😄

Further Reading

If you have any questions or feedback, feel free to contact me.


This article was first published on Medium ➡️ Click Here

Automatically converted and synchronized using ZMediumToMarkdown and Medium-to-jekyll-starter.

Improve this page on Github.

Buy me a beer

1,148 Total Views
Last Statistics Date: 2025-03-11 | 1,142 Views on Medium.
This post is licensed under CC BY 4.0 by the author.