Home The Chronicles of Crafting an HTML Parser from Scratch
Post
Cancel

The Chronicles of Crafting an HTML Parser from Scratch

The Chronicles of Crafting an HTML Parser

Development Record of ZMarkupParser HTML to NSAttributedString Rendering Engine

This article covers HTML string tokenization, normalization, abstract syntax tree generation, the application of Visitor Pattern / Builder Pattern, and some miscellaneous discussions.

Continuing from the Previous Article

Last year, I published an article titled “[TL;DR] Implementing iOS NSAttributedString HTML Render” which briefly introduced the use of XMLParser to parse HTML and convert it into NSAttributedString.Key. The program architecture and approach mentioned in that article were quite disorganized as it was merely a record of the challenges encountered without investing much time into exploring the topic in depth.

Convert HTML String to NSAttributedString

Revisiting the topic, our goal is to convert HTML strings provided by an API into NSAttributedString and apply corresponding styles to display them in UITextView/UILabel.

For example, <b>Test<a>Link</a></b> should be displayed as Test Link.

  • Note 1 Using HTML as the communication and rendering medium between the app and data is not recommended. HTML specifications are too flexible, and apps cannot support all HTML styles without an official HTML conversion rendering engine.

  • Note 2 Starting from iOS 14, you can use the native AttributedString to parse Markdown, or you can introduce the apple/swift-markdown Swift Package to parse Markdown.

  • Note 3 Due to the large size of our company’s project and the extensive use of HTML as a medium for many years, we are unable to completely switch to Markdown or other markup languages at the moment.

  • Note 4 The HTML used here is not meant to display entire HTML webpages; it is merely used as a Markdown-rendering string with styles. (To render entire pages with complex content, including images and tables, WebView with loadHTML is still required.)

It is strongly recommended to use Markdown as the string rendering language. However, if your project faces similar challenges to mine, where you have to use HTML and lack an elegant tool for converting to NSAttributedString, then please proceed with using HTML.

For those who remember the previous article, you can directly skip to the section ZhgChgLi / ZMarkupParser.

NSAttributedString.DocumentType.html

The HTML to NSAttributedString approaches found on the internet usually involve directly using NSAttributedString’s built-in options to render HTML, as shown below:

1
2
3
4
5
6
7
let htmlString = "<b>Test<a>Link</a></b>"
let data = htmlString.data(using: String.Encoding.utf8)!
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey: Any] = [
  .documentType: NSAttributedString.DocumentType.html,
  .characterEncoding: String.Encoding.utf8.rawValue
]
let attributedString = try! NSAttributedString(data: data, options: attributedOptions, documentAttributes: nil)

Issues with this approach:

  • Poor performance: This method renders styles through WebView Core and then switches back to the Main Thread for UI display. Rendering around 300 characters takes 0.03 seconds.
  • Content loss: For example, marketing copies using <Congratulation!> would have the HTML tag removed.
  • Limited customization: For instance, you cannot specify the boldness level of HTML bold text when converting to NSAttributedString.
  • Intermittent crashes since iOS ≥ 12 with no official solution.
  • Extensive crashes observed in iOS 15, particularly when the device’s battery is low (iOS ≥ 15.2 has fixed this issue).
  • Crash when the string is too long; testing showed that inputting a string of length 54,600+ would cause a 100% crash (EXC_BAD_ACCESS).

The most painful issue is undoubtedly the crashing problem. Since iOS 15 was released until version 15.2 with the fix, this problem has consistently plagued the app. According to the data, between 2022/03/11 and 2022/06/08, there were more than 2.4K crashes, impacting over 1.4K users.

The second problem is performance. As HTML is used as a markup language for string styles, it is heavily applied to UILabel/UITextView in the app. As mentioned earlier, rendering one label takes 0.03 seconds, and when multiplied across multiple *UILabel/UITextView, it leads to noticeable lag for the users’ interactions.

XMLParser

The second approach is the one introduced in the previous article, which involves using XMLParser to parse HTML and apply the corresponding NSAttributedString Key to implement the styles.

You can refer to the implementation in SwiftRichString and the content covered in the previous article.

The previous article only explored the possibility of using XMLParser to parse HTML and perform corresponding conversions. While an experimental implementation was completed, it was not designed as a well-structured “tool” with extendability.

Issues with this approach:

  • Fault tolerance rate of 0: <br> / <Congratulation!> / <b>Bold<i>Bold+Italic</b>Italic</i> In the three scenarios above, when XMLParser parses the HTML, it will throw an error and display blank.
  • Using XMLParser, HTML strings must fully comply with XML rules and cannot be displayed with fault tolerance like in a browser or NSAttributedString.DocumentType.html.

Standing on the Shoulders of Giants

Neither of the two solutions can perfectly and elegantly solve the HTML issues, so I started searching for existing solutions.

After an extensive search, it seems that all the results are similar to the projects mentioned above, Orz, there’s no giant’s shoulder to stand on.

ZhgChgLi/ZMarkupParser

With no giants to rely on, I had to become the giant myself and developed the HTML String to NSAttributedString tool.

Developed purely in Swift, it uses Regex to parse HTML tags and tokenization to analyze and correct tag correctness (fixing unclosed tags and misplaced tags). It then converts the parsed data into an abstract syntax tree and uses the Visitor Pattern to map HTML tags to abstract styles, resulting in the final NSAttributedString. The tool does not rely on any external parser library.

Features

  • Supports HTML Render (to NSAttributedString) / Stripper (removing HTML tags) / Selector functionalities.
  • Higher performance compared to NSAttributedString.DocumentType.html.
  • Automatically analyzes and corrects tag correctness (fixing unclosed tags and misplaced tags).
  • Supports dynamic styling from style=”color:red…”.
  • Supports custom style specifications, for example, requiring extra boldness.
  • Offers flexibility for extending or customizing tags and attributes.

For detailed information on installation and usage, please refer to the article: “ZMarkupParser HTML String to NSAttributedString Tool.”

To try it out directly, you can git clone the project, open the ZMarkupParser.xcworkspace project, select the ZMarkupParser-Demo target, and build & run the project.

ZMarkupParser

ZMarkupParser

Technical Details

Now let’s get to the technical details behind the development of this tool.

Overview of the Process

Overview of the Process

The above image provides a rough overview of the process, and in the following articles, each step will be explained in detail with accompanying code.

⚠️️️️️️ This article will simplify the demo code and reduce abstractions and performance considerations, focusing on explaining the working principles. For the final implementation, please refer to the Source Code of the project.

Code-Based Tokenization

When it comes to HTML rendering, the most crucial step is parsing. In the past, HTML was parsed using XMLParser as if it were XML. However, this approach couldn’t handle the fact that HTML, in everyday use, is not always 100% XML-compliant, leading to parsing errors and an inability to dynamically correct them.

After ruling out the XMLParser approach, the only option left for us in Swift was to use regular expressions (Regex) for matching and parsing.

Initially, I didn’t delve too deep and thought I could directly use regular expressions to extract “paired” HTML tags, then recursively search for HTML tags inside them until the process is complete. However, this method couldn’t handle nested HTML tags or support misaligned, error-tolerant situations. Therefore, I changed the strategy to extract “individual” HTML tags and record whether they are Start Tags, Close Tags, or Self-Closing Tags, along with other string combinations, forming an array of parsing results.

The structure of Tokenization is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
enum HTMLParsedResult {
    case start(StartItem) // <a>
    case close(CloseItem) // </a>
    case selfClosing(SelfClosingItem) // <br/>
    case rawString(NSAttributedString)
}

extension HTMLParsedResult {
    class SelfClosingItem {
        let tagName: String
        let tagAttributedString: NSAttributedString
        let attributes: [String: String]?
        
        init(tagName: String, tagAttributedString: NSAttributedString, attributes: [String : String]?) {
            self.tagName = tagName
            self.tagAttributedString = tagAttributedString
            self.attributes = attributes
        }
    }
    
    class StartItem {
        let tagName: String
        let tagAttributedString: NSAttributedString
        let attributes: [String: String]?

        // The Start Tag could be an exceptional HTML Tag or just normal text, e.g., <Congratulation!>. After subsequent normalization, if it is an isolated Start Tag, it will be marked as True.
        var isIsolated: Bool = false
        
        init(tagName: String, tagAttributedString: NSAttributedString, attributes: [String : String]?) {
            self.tagName = tagName
            self.tagAttributedString = tagAttributedString
            self.attributes = attributes
        }
        
        // Used for automatic padding correction during subsequent normalization
        func convertToCloseParsedItem() -> CloseItem {
            return CloseItem(tagName: self.tagName)
        }
        
        // Used for automatic padding correction during subsequent normalization
        func convertToSelfClosingParsedItem() -> SelfClosingItem {
            return SelfClosingItem(tagName: self.tagName, tagAttributedString: self.tagAttributedString, attributes: self.attributes)
        }
    }
    
    class CloseItem {
        let tagName: String
        init(tagName: String) {
            self.tagName = tagName
        }
    }
}

The regular expression used is as follows:

1
<(?:(?<closeTag>\/)?(?<tagName>[A-Za-z0-9]+)(?<tagAttributes>(?:\s*(\w+)\s*=\s*(["|']).*?\5)*)\s*(?<selfClosingTag>\/)?>

-> Online Regex101 Playground

  • closeTag: Matches </a>
  • tagName: Matches or
  • tagAttributes: Matches <a href=”https://zhgchg.li” style=”color:red” >
  • selfClosingTag: Matches

*Note: This regex can still be optimized, which we can address in the future.

The latter part of the article provides additional information about the regex for those interested in delving deeper.

The combined code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
var tokenizationResult: [HTMLParsedResult] = []

let expression = try? NSRegularExpression(pattern: pattern, options: expressionOptions)
let attributedString = NSAttributedString(string: "<a>Li<b>nk</a>Bold</b>")
let totalLength = attributedString.string.utf16.count // utf-16 support emoji
var lastMatch: NSTextCheckingResult?

// Start Tags Stack, first in, last out (FILO)
// Check if the HTML string requires subsequent normalization for fixing misplacements or completing Self-Closing Tags
var stackStartItems: [HTMLParsedResult.StartItem] = []
var needForFormatter: Bool = false

expression.enumerateMatches(in: attributedString.string, range: NSMakeRange(0, totalLength)) { match, _, _ in
    if let match = match {
        // Check the string between tags or to the first tag, e.g., "Test<a>Link</a>zzz<b>bold</b>Test2" -> "Test,zzz"
        let lastMatchEnd = lastMatch?.range.upperBound ?? 0
        let currentMatchStart = match.range.lowerBound
        if currentMatchStart > lastMatchEnd {
            let rawStringBetweenTag = attributedString.attributedSubstring(from: NSMakeRange(lastMatchEnd, (currentMatchStart - lastMatchEnd)))
            tokenizationResult.append(.rawString(rawStringBetweenTag))
        }

        // <a href="https://zhgchg.li">, </a>
        let matchAttributedString = attributedString.attributedSubstring(from: match.range)
        // a, a
        let matchTag = attributedString.attributedSubstring(from: match.range(withName: "tagName"))?.string.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
        // false, true
        let matchIsEndTag = matchResult.attributedString(from: match.range(withName: "closeTag"))?.string.trimmingCharacters(in: .whitespacesAndNewlines) == "/"
        // href="https://zhgchg.li", nil
        // Use regex to extract HTML attributes into [String: String], please refer to the Source Code for details
        let matchTagAttributes = parseAttributes(matchResult.attributedString(from: match.range(withName: "tagAttributes")))
        // false, false
        let matchIsSelfClosingTag = matchResult.attributedString(from: match.range(withName: "selfClosingTag"))?.string.trimmingCharacters(in: .whitespacesAndNewlines) == "/"

        if let matchAttributedString = matchAttributedString,
            let matchTag = matchTag {
            if matchIsSelfClosingTag {
                // e.g. <br/>
                tokenizationResult.append(.selfClosing(.init(tagName: matchTag, tagAttributedString: matchAttributedString, attributes: matchTagAttributes)))
            } else {
                // e.g. <a> or </a>
                if matchIsEndTag {
                    // e.g. </a>
                    // Retrieve the position of the corresponding Start Tag from the Stack, starting from the last occurrence
                    if let index = stackStartItems.lastIndex(where: { $0.tagName == matchTag }) {
                        // If it's not the last one, it means there are misplacements or missing closing Tags
                        if index != stackStartItems.count - 1 {
                            needForFormatter = true
                        }
                        tokenizationResult.append(.close(.init(tagName: matchTag)))
                        stackStartItems.remove(at: index)
                    } else {
                        // Redundant close tag, e.g., </a>
                        // It doesn't affect subsequent steps, so we ignore it
                    }
                } else {
                    // e.g. <a>
                    let startItem: HTMLParsedResult.StartItem = HTMLParsedResult.StartItem(tagName: matchTag, tagAttributedString: matchAttributedString, attributes: matchTagAttributes)
                    tokenizationResult.append(.start(startItem))
                    // Push it onto the Stack
                    stackStartItems.append(startItem)
                }
            }
        }

        lastMatch = match
    }
}

// Check the ending RawString, e.g., "Test<a>Link</a>Test2" -> "Test2"
if let lastMatch = lastMatch {
    let currentIndex = lastMatch.range.upperBound
    if totalLength > currentIndex {
        // There are remaining characters
        let resetString = attributedString.attributedSubstring(from: NSMakeRange(currentIndex, (totalLength - currentIndex)))
        tokenizationResult.append(.rawString(resetString))
    }
} else {
    // lastMatch = nil, meaning no tags were found, and it's all plain text
    let resetString = attributedString.attributedSubstring(from: NSMakeRange(0, totalLength))
    tokenizationResult.append(.rawString(resetString))
}

// Check if the Stack is empty, if not, it means there are Start Tags without corresponding End Tags
// Mark them as isolated Start Tags
for stackStartItem in stackStartItems {
    stackStartItem.isIsolated = true
    needForFormatter = true
}

print(tokenizationResult)
// [
//    .start("a",["href":"https://zhgchg.li"])
//    .rawString("Li")
//    .start("b",nil)
//    .rawString("nk")
//    .close("a")
//    .rawString("Bold")
//    .close("b")
// ]

Operation process as shown in the above image

Operation process as shown in the diagram above.

In the end, you will get a Tokenization result array.

Corresponding implementation in the source code: HTMLStringToParsedResultProcessor.swift

Standardization — Normalization

Also known as Formatter, normalization.

After obtaining the preliminary parsing result in the previous step, if further normalization is required during the parsing process, this step is necessary to automatically correct HTML tag issues.

There are three types of HTML tag issues:

  • HTML tag with a missing close tag, for example, <br>
  • Regular text being treated as an HTML tag, for example, <Congratulation!>
  • HTML tags with misplacement, for example, <a>Li<b>nk</a>Bold</b>

The correction process is straightforward; we need to iterate through the elements of the Tokenization result and attempt to fill in the missing parts.

Operation process as shown in the diagram above

Operation process as shown in the diagram above

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
var normalizationResult = tokenizationResult

// Start Tags Stack, First In Last Out (FILO)
var stackExpectedStartItems: [HTMLParsedResult.StartItem] = []
var itemIndex = 0
while itemIndex < newItems.count {
    switch newItems[itemIndex] {
    case .start(let item):
        if item.isIsolated {
            // If it is an isolated Start Tag
            if WC3HTMLTagName(rawValue: item.tagName) == nil && (item.attributes?.isEmpty ?? true) {
                // If it is not a WCS-defined HTML Tag and has no HTML Attribute
                // Treat it as regular raw string type
                normalizationResult[itemIndex] = .rawString(item.tagAttributedString)
            } else {
                // Otherwise, convert it to a self-closing tag, e.g., <br> -> <br/>
                normalizationResult[itemIndex] = .selfClosing(item.convertToSelfClosingParsedItem())
            }
            itemIndex += 1
        } else {
            // Normal Start Tag, add to the Stack
            stackExpectedStartItems.append(item)
            itemIndex += 1
        }
    case .close(let item):
        // Encountered a Close Tag
        // Get the tags between the Start Stack Tag and this Close Tag
        // e.g., <a><u><b>[CurrentIndex]</a></u></b> -> Interval 0
        // e.g., <a><u><b>[CurrentIndex]</a></u></b> -> Interval b,u

        let reversedStackExpectedStartItems = Array(stackExpectedStartItems.reversed())
        guard let reversedStackExpectedStartItemsOccurredIndex = reversedStackExpectedStartItems.firstIndex(where: { $0.tagName == item.tagName }) else {
            itemIndex += 1
            continue
        }
        
        let reversedStackExpectedStartItemsOccurred = Array(reversedStackExpectedStartItems.prefix(upTo: reversedStackExpectedStartItemsOccurredIndex))
        
        // Interval 0 means the tags are not misplaced
        guard reversedStackExpectedStartItemsOccurred.count != 0 else {
            // If it is a pair, pop it
            stackExpectedStartItems.removeLast()
            itemIndex += 1
            continue
        }
        
        // If there are other intervals, automatically fill in the missing tags in between
        // e.g., <a><u><b>[CurrentIndex]</a></u></b> ->
        // e.g., <a><u><b>[CurrentIndex]</b></u></a><b></u></u></b>
        let stackExpectedStartItemsOccurred = Array(reversedStackExpectedStartItemsOccurred.reversed())
        let afterItems = stackExpectedStartItemsOccurred.map({ HTMLParsedResult.start($0) })
        let beforeItems = reversedStackExpectedStartItemsOccurred.map({ HTMLParsedResult.close($0.convertToCloseParsedItem()) })
        normalizationResult.insert(contentsOf: afterItems, at: newItems.index(after: itemIndex))
        normalizationResult.insert(contentsOf: beforeItems, at: itemIndex)
        
        itemIndex = newItems.index(after: itemIndex) + stackExpectedStartItemsOccurred.count
        
        // Update Start Stack Tags
        // e.g., -> b,u
        stackExpectedStartItems.removeAll { startItem in
            return reversedStackExpectedStartItems.prefix(through: reversedStackExpectedStartItemsOccurredIndex).contains(where: { $0 === startItem })
        }
    case .selfClosing, .rawString:
        itemIndex += 1
    }
}

print(normalizationResult)
// [
//    .start("a",["href":"https://zhgchg.li"])
//    .rawString("Li")
//    .start("b",nil)
//    .rawString("nk")
//    .close("b")
//    .close("a")
//    .start("b",nil)
//    .rawString("Bold")
//    .close("b")
// ]

Corresponding implementation in the source code: HTMLParsedResultFormatterProcessor.swift

Abstract Syntax Tree

AKA AST, or Abstract Tree.

After completing Tokenization & Normalization data preprocessing, the next step is to transform the result into an abstract syntax tree 🌲.

As shown above

As shown above.

Converting it into an abstract syntax tree allows us to perform future operations and extensions more conveniently. For example, implementing the Selector feature or performing other transformations, such as HTML to Markdown. Additionally, if we want to add Markdown to NSAttributedString in the future, we only need to implement Markdown’s Tokenization & Normalization to achieve it.

First, we define a Markup Protocol with Child & Parent properties to record information about leaves and branches:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
protocol Markup: AnyObject {
    var parentMarkup: Markup? { get set }
    var childMarkups: [Markup] { get set }
    
    func appendChild(markup: Markup)
    func prependChild(markup: Markup)
    func accept<V: MarkupVisitor>(_ visitor: V) -> V.Result
}

extension Markup {
    func appendChild(markup: Markup) {
        markup.parentMarkup = self
        childMarkups.append(markup)
    }
    
    func prependChild(markup: Markup) {
        markup.parentMarkup = self
        childMarkups.insert(markup, at: 0)
    }
}

In addition, we use the Visitor Pattern to define each style attribute as a Markup Element, and then obtain individual application results through different Visit strategies.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
protocol MarkupVisitor {
    associatedtype Result
        
    func visit(markup: Markup) -> Result
    
    func visit(_ markup: RootMarkup) -> Result
    func visit(_ markup: RawStringMarkup) -> Result
    
    func visit(_ markup: BoldMarkup) -> Result
    func visit(_ markup: LinkMarkup) -> Result
    //...
}

extension MarkupVisitor {
    func visit(markup: Markup) -> Result {
        return markup.accept(self)
    }
}

Basic Markup nodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Root node
final class RootMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

// Leaf node
final class RawStringMarkup: Markup {
    let attributedString: NSAttributedString
    
    init(attributedString: NSAttributedString) {
        self.attributedString = attributedString
    }
    
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

Definition of Markup style nodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Branch nodes:

// Link style
final class LinkMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

// Bold style
final class BoldMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []
    
    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

Corresponding to the Markup implementation in the original code.

Before converting it into an abstract syntax tree, we still need…

MarkupComponent

Since our tree structure does not depend on any data structure (e.g., a node/LinkMarkup should have URL information to proceed with rendering). For this, we define another container to store tree nodes and related data information:

1
2
3
4
5
6
7
8
9
10
11
12
13
protocol MarkupComponent {
    associatedtype T
    var markup: Markup { get }
    var value: T { get }
    
    init(markup: Markup, value: T)
}

extension Sequence where Iterator.Element: MarkupComponent {
    func value(markup: Markup) -> Element.T? {
        return self.first(where:{ $0.markup === markup })?.value as? Element.T
    }
}

Corresponding to the MarkupComponent implementation in the original code.

Alternatively, Markup can be declared as Hashable, and we can directly use a Dictionary to store values [Markup: Any]. However, in this case, Markup cannot be used as a regular type and requires adding any Markup.

HTMLTag & HTMLTagName & HTMLTagNameVisitor

We have also abstracted the HTML Tag Name part, allowing users to decide which tags need to be processed and facilitating future extensions. For example, the <strong> tag name can correspond to BoldMarkup.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public protocol HTMLTagName {
    var string: String { get }
    func accept<V: HTMLTagNameVisitor>(_ visitor: V) -> V.Result
}

public struct A_HTMLTagName: HTMLTagName {
    public let string: String = WC3HTMLTagName.a.rawValue
    
    public init() {
        
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagNameVisitor {
        return visitor.visit(self)
    }
}

public struct B_HTMLTagName: HTMLTagName {
    public let string: String = WC3HTMLTagName.b.rawValue
    
    public init() {
        
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagNameVisitor {
        return visitor.visit(self)
    }
}

Corresponding to the HTMLTagNameVisitor implementation in the original code.

Additionally, reference to the W3C wiki lists the HTML tag name enum: WC3HTMLTagName.swift

HTMLTag is simply a container object because we want to allow external specification of the style corresponding to HTML tags. So, we declare a container to put them together:

1
2
3
4
5
6
7
8
9
struct HTMLTag {
    let tagName: HTMLTagName
    let customStyle: MarkupStyle? // We'll explain Render later.
    
    init(tagName: HTMLTagName, customStyle: MarkupStyle? = nil) {
        self.tagName = tagName
        self.customStyle = customStyle
    }
}

Corresponds to the implementation of HTMLTag in the source code.

HTMLTagNameToHTMLMarkupVisitor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct HTMLTagNameToMarkupVisitor: HTMLTagNameVisitor {
    typealias Result = Markup
    
    let attributes: [String: String]?
    
    func visit(_ tagName: A_HTMLTagName) -> Result {
        return LinkMarkup()
    }
    
    func visit(_ tagName: B_HTMLTagName) -> Result {
        return BoldMarkup()
    }
    //...
}

Corresponds to the implementation of HTMLTagNameToHTMLMarkupVisitor in the source code.

Converting to Abstract Syntax Tree with HTML Data

We need to convert the normalized HTML data result into an abstract syntax tree. First, let’s declare a data structure, MarkupComponent, that can hold HTML data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct HTMLElementMarkupComponent: MarkupComponent {
    struct HTMLElement {
        let tag: HTMLTag
        let tagAttributedString: NSAttributedString
        let attributes: [String: String]?
    }
    
    typealias T = HTMLElement
    
    let markup: Markup
    let value: HTMLElement
    init(markup: Markup, value: HTMLElement) {
        self.markup = markup
        self.value = value
    }
}

Converting to Markup Abstract Syntax Tree:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
var htmlElementComponents: [HTMLElementMarkupComponent] = []
let rootMarkup = RootMarkup()
var currentMarkup: Markup = rootMarkup

let htmlTags: [String: HTMLTag]
init(htmlTags: [HTMLTag]) {
  self.htmlTags = Dictionary(uniqueKeysWithValues: htmlTags.map{ ($0.tagName.string, $0) })
}

// Start Tags Stack, ensuring correct popping of tags
// Normalization has been done earlier, so it should not result in errors, just to be sure
var stackExpectedStartItems: [HTMLParsedResult.StartItem] = []
for thisItem in from {
    switch thisItem {
    case .start(let item):
        let visitor = HTMLTagNameToMarkupVisitor(attributes: item.attributes)
        let htmlTag = self.htmlTags[item.tagName] ?? HTMLTag(tagName: ExtendTagName(item.tagName))
        // Using the Visitor to determine the corresponding Markup
        let markup = visitor.visit(tagName: htmlTag.tagName)
        
        // Adding oneself as a leaf node of the current branch
        // Becoming the current branch
        htmlElementComponents.append(.init(markup: markup, value: .init(tag: htmlTag, tagAttributedString: item.tagAttributedString, attributes: item.attributes)))
        currentMarkup.appendChild(markup: markup)
        currentMarkup = markup
        
        stackExpectedStartItems.append(item)
    case .selfClosing(let item):
        // Adding directly as a leaf node of the current branch
        let visitor = HTMLTagNameToMarkupVisitor(attributes: item.attributes)
        let htmlTag = self.htmlTags[item.tagName] ?? HTMLTag(tagName: ExtendTagName(item.tagName))
        let markup = visitor.visit(tagName: htmlTag.tagName)
        htmlElementComponents.append(.init(markup: markup, value: .init(tag: htmlTag, tagAttributedString: item.tagAttributedString, attributes: item.attributes)))
        currentMarkup.appendChild(markup: markup)
    case .close(let item):
        if let lastTagName = stackExpectedStartItems.popLast()?.tagName,
           lastTagName == item.tagName {
            // When encountering a Close Tag, go back to the previous level
            currentMarkup = currentMarkup.parentMarkup ?? currentMarkup
        }
    case .rawString(let attributedString):
        // Adding directly as a leaf node of the current branch
        currentMarkup.appendChild(markup: RawStringMarkup(attributedString: attributedString))
    }
}

// print(htmlElementComponents)
// [(markup: LinkMarkup, (tag: a, attributes: ["href":"zhgchg.li"]...)]

The operation result is shown in the above image

The operation result is shown in the above image.

Corresponds to the implementation of HTMLParsedResultToHTMLElementWithRootMarkupProcessor.swift in the source code.

At this point, we have actually completed the functionality of Selector 🎉

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public class HTMLSelector: CustomStringConvertible {
    
    let markup: Markup
    let components: [HTMLElementMarkupComponent]
    init(markup: Markup, components: [HTMLElementMarkupComponent]) {
        self.markup = markup
        self.components = components
    }
    
    public func filter(_ htmlTagName: String) -> [HTMLSelector] {
        let result = markup.childMarkups.filter({ components.value(markup: $0)?.tag.tagName.isEqualTo(htmlTagName) ?? false })
        return result.map({ .init(markup: $0, components: components) })
    }

    //...
}

We can filter leaf node objects layer by layer.

Corresponds to the implementation of HTMLSelector in the source code.

Parser — HTML to MarkupStyle (Abstract of NSAttributedString.Key)

Next, we need to complete the process of converting HTML to MarkupStyle (NSAttributedString.Key).

NSAttributedString sets the style of the text using NSAttributedString.Key Attributes. We have abstracted all the fields of NSAttributedString.Key to correspond to MarkupStyle, MarkupStyleColor, MarkupStyleFont, and MarkupStyleParagraphStyle.

Purpose:

  • Originally, the data structure of Attributes was [NSAttributedString.Key: Any?], which, if exposed directly, would be difficult to control the values the user brings in. If incorrect values are provided, it could lead to crashes, for example, .font: 123.
  • Styles need to be inheritable, for example, <a><b>test</b></a>, where the style of the text “test” is inherited from the link’s bold formatting (bold+link). If we directly expose the Dictionary, it would be challenging to control inheritance rules effectively.
  • Encapsulate objects belonging to iOS/macOS (UIKit/Appkit).

MarkupStyle Struct

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
public struct MarkupStyle {
    public var font: MarkupStyleFont
    public var paragraphStyle: MarkupStyleParagraphStyle
    public var foregroundColor: MarkupStyleColor? = nil
    public var backgroundColor: MarkupStyleColor? = nil
    public var ligature: NSNumber? = nil
    public var kern: NSNumber? = nil
    public var tracking: NSNumber? = nil
    public var strikethroughStyle: NSUnderlineStyle? = nil
    public var underlineStyle: NSUnderlineStyle? = nil
    public var strokeColor: MarkupStyleColor? = nil
    public var strokeWidth: NSNumber? = nil
    public var shadow: NSShadow? = nil
    public var textEffect: String? = nil
    public var attachment: NSTextAttachment? = nil
    public var link: URL? = nil
    public var baselineOffset: NSNumber? = nil
    public var underlineColor: MarkupStyleColor? = nil
    public var strikethroughColor: MarkupStyleColor? = nil
    public var obliqueness: NSNumber? = nil
    public var expansion: NSNumber? = nil
    public var writingDirection: NSNumber? = nil
    public var verticalGlyphForm: NSNumber? = nil
    //...

    // Inherited from...
    // Default: When a field is nil, it is filled with data from the "from" MarkupStyle object.
    mutating func fillIfNil(from: MarkupStyle?) {
        guard let from = from else { return }

        var currentFont = self.font
        currentFont.fillIfNil(from: from.font)
        self.font = currentFont

        var currentParagraphStyle = self.paragraphStyle
        currentParagraphStyle.fillIfNil(from: from.paragraphStyle)
        self.paragraphStyle = currentParagraphStyle
        //...
    }

    // Convert MarkupStyle to NSAttributedString.Key: Any
    func render() -> [NSAttributedString.Key: Any] {
        var data: [NSAttributedString.Key: Any] = [:]

        if let font = font.getFont() {
            data[.font] = font
        }

        if let ligature = self.ligature {
            data[.ligature] = ligature
        }
        //...
        return data
    }
}

public struct MarkupStyleFont: MarkupStyleItem {
    public enum FontWeight {
        case style(FontWeightStyle)
        case rawValue(CGFloat)
    }
    public enum FontWeightStyle: String {
        case ultraLight, light, thin, regular, medium, semibold, bold, heavy, black
        // ...
    }

    public var size: CGFloat?
    public var weight: FontWeight?
    public var italic: Bool?
    //...
}

public struct MarkupStyleParagraphStyle: MarkupStyleItem {
    public var lineSpacing: CGFloat? = nil
    public var paragraphSpacing: CGFloat? = nil
    public var alignment: NSTextAlignment? = nil
    public var headIndent: CGFloat? = nil
    public var tailIndent: CGFloat? = nil
    public var firstLineHeadIndent: CGFloat? = nil
    public var minimumLineHeight: CGFloat? = nil
    public var maximumLineHeight: CGFloat? = nil
    public var lineBreakMode: NSLineBreakMode? = nil
    public var baseWritingDirection: NSWritingDirection? = nil
    public var lineHeightMultiple: CGFloat? = nil
    public var paragraphSpacingBefore: CGFloat? = nil
    public var hyphenationFactor: Float? = nil
    public var usesDefaultHyphenation: Bool? = nil
    public var tabStops: [NSTextTab]? = nil
    public var defaultTabInterval: CGFloat? = nil
    public var textLists: [NSTextList]? = nil
    public var allowsDefaultTighteningForTruncation: Bool? = nil
    public var lineBreakStrategy: NSParagraphStyle.LineBreakStrategy? = nil
    //...
}

public struct MarkupStyleColor {
    let red: Int
    let green: Int
    let blue: Int
    let alpha: CGFloat
    //...
}

This corresponds to the implementation of MarkupStyle in the source code.

Additionally, we also referred to W3c wiki, where browser predefined color names are enumerated with their corresponding color text and color R, G, B values: MarkupStyleColorName.swift.

HTMLTagStyleAttribute & HTMLTagStyleAttributeVisitor

Let’s talk a bit about these two objects since HTML tags allow them to be combined with CSS style settings. To do this, we use the same abstraction as in HTMLTagName and apply it again to HTML Style Attributes.

For instance, HTML might provide: <a style=”color:red;font-size:14px”>RedLink</a>, which means this link should be styled with red color and a font size of 14px.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public protocol HTMLTagStyleAttribute {
    var styleName: String { get }
    
    func accept<V: HTMLTagStyleAttributeVisitor>(_ visitor: V) -> V.Result
}

public protocol HTMLTagStyleAttributeVisitor {
    associatedtype Result
    
    func visit(styleAttribute: HTMLTagStyleAttribute) -> Result
    func visit(_ styleAttribute: ColorHTMLTagStyleAttribute) -> Result
    func visit(_ styleAttribute: FontSizeHTMLTagStyleAttribute) -> Result
    //...
}

public extension HTMLTagStyleAttributeVisitor {
    func visit(styleAttribute: HTMLTagStyleAttribute) -> Result {
        return styleAttribute.accept(self)
    }
}

Corresponding implementation of HTMLTagStyleAttribute in the source code.

HTMLTagStyleAttributeToMarkupStyleVisitor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct HTMLTagStyleAttributeToMarkupStyleVisitor: HTMLTagStyleAttributeVisitor {
    typealias Result = MarkupStyle?
    
    let value: String
    
    func visit(_ styleAttribute: ColorHTMLTagStyleAttribute) -> Result {
        // Extract Color Hex or Mapping from HTML Pre-defined Color Name using regex, please refer to the source code.
        guard let color = MarkupStyleColor(string: value) else { return nil }
        return MarkupStyle(foregroundColor: color)
    }
    
    func visit(_ styleAttribute: FontSizeHTMLTagStyleAttribute) -> Result {
        // Extract 10px -> 10 using regex, please refer to the source code.
        guard let size = self.convert(fromPX: value) else { return nil }
        return MarkupStyle(font: MarkupStyleFont(size: CGFloat(size)))
    }
    // ...
}

Corresponding implementation of HTMLTagAttributeToMarkupStyleVisitor.swift in the source code.

The value of init is set to the value of attribute, and it is converted to the corresponding MarkupStyle field based on the visit type.

HTMLElementMarkupComponentMarkupStyleVisitor

After introducing the MarkupStyle object, we need to convert the results from HTMLElementComponents of Normalization into MarkupStyle.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
// MarkupStyle policy
public enum MarkupStylePolicy {
    case respectMarkupStyleFromCode // Take the style from Code as the main one and use HTML Style Attribute to fill in the gaps
    case respectMarkupStyleFromHTMLStyleAttribute // Take the style from HTML Style Attribute as the main one and use Code to fill in the gaps
}

struct HTMLElementMarkupComponentMarkupStyleVisitor: MarkupVisitor {

    typealias Result = MarkupStyle?
    
    let policy: MarkupStylePolicy
    let components: [HTMLElementMarkupComponent]
    let styleAttributes: [HTMLTagStyleAttribute]

    func visit(_ markup: BoldMarkup) -> Result {
        // The `.bold` is just a default style defined in MarkupStyle. Please refer to the Source Code.
        return defaultVisit(components.value(markup: markup), defaultStyle: .bold)
    }
    
    func visit(_ markup: LinkMarkup) -> Result {
        // The `.link` is just a default style defined in MarkupStyle. Please refer to the Source Code.
        var markupStyle = defaultVisit(components.value(markup: markup), defaultStyle: .link) ?? .link
        
        // Get the corresponding HTMLElement for LinkMarkup from HTMLElementComponents
        // Find the href parameter in the attributes of HtmlElement (in the form of an HTML URL string)
        if let href = components.value(markup: markup)?.attributes?["href"] as? String,
           let url = URL(string: href) {
            markupStyle.link = url
        }
        return markupStyle
    }

    // ...
}

extension HTMLElementMarkupComponentMarkupStyleVisitor {
    // Get the specified customized MarkupStyle from the HTMLTag container
    private func customStyle(_ htmlElement: HTMLElementMarkupComponent.HTMLElement?) -> MarkupStyle? {
        guard let customStyle = htmlElement?.tag.customStyle else {
            return nil
        }
        return customStyle
    }
    
    // Default action
    func defaultVisit(_ htmlElement: HTMLElementMarkupComponent.HTMLElement?, defaultStyle: MarkupStyle? = nil) -> Result {
        var markupStyle: MarkupStyle? = customStyle(htmlElement) ?? defaultStyle
        // Get the LinkMarkup corresponding to HtmlElementComponents
        // Check if the HtmlElement has the `Style` Attribute
        guard let styleString = htmlElement?.attributes?["style"],
              styleAttributes.count > 0 else {
            // If not, return the markupStyle as is
            return markupStyle
        }

        // If there are Style Attributes
        // Split the Style Value string into an array
        // e.g. font-size:14px;color:red -> ["font-size":"14px","color":"red"]
        let styles = styleString.split(separator: ";").filter { $0.trimmingCharacters(in: .whitespacesAndNewlines) != "" }.map { $0.split(separator: ":") }
        
        for style in styles {
            guard style.count == 2 else {
                continue
            }
            // e.g font-szie
            let key = style[0].trimmingCharacters(in: .whitespacesAndNewlines)
            // e.g. 14px
            let value = style[1].trimmingCharacters(in: .whitespacesAndNewlines)
            
            if let styleAttribute = styleAttributes.first(where: { $0.isEqualTo(styleName: key) }) {
                // Use the HTMLTagStyleAttributeToMarkupStyleVisitor from the previous context to convert to MarkupStyle
                let visitor = HTMLTagStyleAttributeToMarkupStyleVisitor(value: value)
                if var thisMarkupStyle = visitor.visit(styleAttribute: styleAttribute) {
                    // When the Style Attribute has a converted value..
                    // Merge the previous MarkupStyle result with this one
                    thisMarkupStyle.fillIfNil(from: markupStyle)
                    markupStyle = thisMarkupStyle
                }
            }
        }
        
        // If there is a default Style
        if var defaultStyle = defaultStyle {
            switch policy {
                case .respectMarkupStyleFromHTMLStyleAttribute:
                  // Take the Style Attribute MarkupStyle as the main one and then merge with the defaultStyle result
                    markupStyle?.fillIfNil(from: defaultStyle)
                case .respectMarkupStyleFromCode:
                  // Take the defaultStyle as the main one and then merge with the Style Attribute MarkupStyle result
                  defaultStyle.fillIfNil(from: markupStyle)
                  markupStyle = defaultStyle
            }
        }
        
        return markupStyle
    }
}

The implementation corresponds to the original code in HTMLTagAttributeToMarkupStyleVisitor.swift.

We will define some default styles in MarkupStyle. In some cases, if certain Markup elements do not have the desired styles specified externally, they will use the default styles.

There are two style inheritance strategies:

  • respectMarkupStyleFromCode: The default styles take precedence; then, check the Style Attributes to see if any additional styles can be applied, but ignore them if they already have a value.
  • respectMarkupStyleFromHTMLStyleAttribute: The Style Attributes take precedence; then, check the default styles to see if any additional styles can be applied, but ignore them if they already have a value.

HTMLElementWithMarkupToMarkupStyleProcessor

This processor converts the normalization result into an AST & MarkupStyleComponent.

Declare a new MarkupComponent to hold the corresponding MarkupStyle:

1
2
3
4
5
6
7
8
9
10
struct MarkupStyleComponent: MarkupComponent {
    typealias T = MarkupStyle
    
    let markup: Markup
    let value: MarkupStyle
    init(markup: Markup, value: MarkupStyle) {
        self.markup = markup
        self.value = value
    }
}

Simple traversal of the Markup Tree & HTMLElementMarkupComponent structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
let styleAttributes: [HTMLTagStyleAttribute]
let policy: MarkupStylePolicy
    
func process(from: (Markup, [HTMLElementMarkupComponent])) -> [MarkupStyleComponent] {
  var components: [MarkupStyleComponent] = []
  let visitor = HTMLElementMarkupComponentMarkupStyleVisitor(policy: policy, components: from.1, styleAttributes: styleAttributes)
  walk(markup: from.0, visitor: visitor, components: &components)
  return components
}
    
func walk(markup: Markup, visitor: HTMLElementMarkupComponentMarkupStyleVisitor, components: inout [MarkupStyleComponent]) {
        
  if let markupStyle = visitor.visit(markup: markup) {
    components.append(.init(markup: markup, value: markupStyle))
  }
        
  for markup in markup.childMarkups {
    walk(markup: markup, visitor: visitor, components: &components)
  }
}

// print(components)
// [(markup: LinkMarkup, MarkupStyle(link: https://zhgchg.li, color: .blue)]
// [(markup: BoldMarkup, MarkupStyle(font: .init(weight: .bold))]

Corresponding implementation in the source code can be found in HTMLElementWithMarkupToMarkupStyleProcessor.swift.

Flow result as shown in the above image

Flow result as shown in the above image

Render — Convert To NSAttributedString

Now that we have the abstract HTML Tag tree structure and corresponding MarkupStyle, we can proceed with the final step of generating the NSAttributedString rendering result.

MarkupNSAttributedStringVisitor

This is the implementation of the MarkupVisitor protocol to convert markup into NSAttributedString.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
struct MarkupNSAttributedStringVisitor: MarkupVisitor {
    typealias Result = NSAttributedString
    
    let components: [MarkupStyleComponent]
    // MarkupStyle for root/base, externally specified, for example, to set the overall font size.
    let rootStyle: MarkupStyle?
    
    func visit(_ markup: RootMarkup) -> Result {
        // Traverse to the RawString object.
        return collectAttributedString(markup)
    }
    
    func visit(_ markup: RawStringMarkup) -> Result {
        // Return the Raw String.
        // Collect all MarkupStyles in the chain.
        // Apply the Style to NSAttributedString.
        return applyMarkupStyle(markup.attributedString, with: collectMarkupStyle(markup))
    }
    
    func visit(_ markup: BoldMarkup) -> Result {
        // Traverse to the RawString object.
        return collectAttributedString(markup)
    }
    
    func visit(_ markup: LinkMarkup) -> Result {
        // Traverse to the RawString object.
        return collectAttributedString(markup)
    }
    // ...
}

private extension MarkupNSAttributedStringVisitor {
    // Apply the Style to NSAttributedString.
    func applyMarkupStyle(_ attributedString: NSAttributedString, with markupStyle: MarkupStyle?) -> NSAttributedString {
        guard let markupStyle = markupStyle else { return attributedString }
        let mutableAttributedString = NSMutableAttributedString(attributedString: attributedString)
        mutableAttributedString.addAttributes(markupStyle.render(), range: NSMakeRange(0, mutableAttributedString.string.utf16.count))
        return mutableAttributedString
    }

    func collectAttributedString(_ markup: Markup) -> NSMutableAttributedString {
        // Collect from downstream.
        // Root -> Bold -> String("Bold")
        //      \
        //       > String("Test")
        // Result: Bold Test
        // Traverse down the tree to find raw strings, recursively visit and combine them into the final NSAttributedString.
        return markup.childMarkups.compactMap({ visit(markup: $0) }).reduce(NSMutableAttributedString()) { partialResult, attributedString in
            partialResult.append(attributedString)
            return partialResult
        }
    }
    
    func collectMarkupStyle(_ markup: Markup) -> MarkupStyle? {
        // Collect from upstream.
        // String("Test") -> Bold -> Italic -> Root
        // Result: style: Bold+Italic
        // Traverse up the tree to find parent tag's markup style.
        // Then inherit styles layer by layer.
        var currentMarkup: Markup? = markup.parentMarkup
        var currentStyle = components.value(markup: markup)
        while let thisMarkup = currentMarkup {
            guard let thisMarkupStyle = components.value(markup: thisMarkup) else {
                currentMarkup = thisMarkup.parentMarkup
                continue
            }

            if var thisCurrentStyle = currentStyle {
                thisCurrentStyle.fillIfNil(from: thisMarkupStyle)
                currentStyle = thisCurrentStyle
            } else {
                currentStyle = thisMarkupStyle
            }

            currentMarkup = thisMarkup.parentMarkup
        }
        
        if var currentStyle = currentStyle {
            currentStyle.fillIfNil(from: rootStyle)
            return currentStyle
        } else {
            return rootStyle
        }
    }
}

This corresponds to the MarkupNSAttributedStringVisitor.swift in the source code.

The workflow and result are depicted in the above image.

The workflow and result are depicted in the above image.

Finally, we arrive at the following:

1
2
3
4
5
6
7
8
9
10
11
Link{
    NSColor = "Blue";
    NSFont = "<UICTFont: 0x145d17600> font-family: \".SFUI-Regular\"; font-weight: normal; font-style: normal; font-size: 13.00pt";
    NSLink = "https://zhgchg.li";
}nk{
    NSColor = "Blue";
    NSFont = "<UICTFont: 0x145d18710> font-family: \".SFUI-Semibold\"; font-weight: bold; font-style: normal; font-size: 13.00pt";
    NSLink = "https://zhgchg.li";
}Bold{
    NSFont = "<UICTFont: 0x145d18710> font-family: \".SFUI-Semibold\"; font-weight: bold; font-style: normal; font-size: 13.00pt";
}

🎉🎉🎉🎉 It’s done! 🎉🎉🎉🎉

We have now completed the entire conversion process from HTML String to NSAttributedString.

Stripper — Removing HTML Tags

Stripping HTML tags is relatively simple, requiring only the following code snippet:

1
2
3
4
5
6
7
8
9
10
func attributedString(_ markup: Markup) -> NSAttributedString {
  if let rawStringMarkup = markup as? RawStringMarkup {
    return rawStringMarkup.attributedString
  } else {
    return markup.childMarkups.compactMap({ attributedString($0) }).reduce(NSMutableAttributedString()) { partialResult, attributedString in
      partialResult.append(attributedString)
      return partialResult
    }
  }
}

The corresponding implementation can be found in the MarkupStripperProcessor.swift file.

It functions similarly to Render, but specifically returns the content when RawStringMarkup is encountered.

Extend — Dynamic Extension

To extend coverage for all HTML Tags/Style Attributes, a dynamic extension approach was adopted, making it convenient to dynamically expand objects directly from the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public struct ExtendTagName: HTMLTagName {
    public let string: String
    
    public init(_ w3cHTMLTagName: WC3HTMLTagName) {
        self.string = w3cHTMLTagName.rawValue
    }
    
    public init(_ string: String) {
        self.string = string.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagNameVisitor {
        return visitor.visit(self)
    }
}
// to
final class ExtendMarkup: Markup {
    weak var parentMarkup: Markup? = nil
    var childMarkups: [Markup] = []

    func accept<V>(_ visitor: V) -> V.Result where V : MarkupVisitor {
        return visitor.visit(self)
    }
}

//----

public struct ExtendHTMLTagStyleAttribute: HTMLTagStyleAttribute {
    public let styleName: String
    public let render: ((String) -> (MarkupStyle?)) // Dynamic use of closure to change MarkupStyle
    
    public init(styleName: String, render: @escaping ((String) -> (MarkupStyle?))) {
        self.styleName = styleName
        self.render = render
    }
    
    public func accept<V>(_ visitor: V) -> V.Result where V : HTMLTagStyleAttributeVisitor {
        return visitor.visit(self)
    }
}

ZHTMLParserBuilder

Finally, we employ the Builder Pattern to allow external modules to swiftly construct the necessary objects for ZMarkupParser and handle Access Level Control.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
public final class ZHTMLParserBuilder {
    
    private(set) var htmlTags: [HTMLTag] = []
    private(set) var styleAttributes: [HTMLTagStyleAttribute] = []
    private(set) var rootStyle: MarkupStyle?
    private(set) var policy: MarkupStylePolicy = .respectMarkupStyleFromCode
    
    public init() {
        
    }
    
    public static func initWithDefault() -> Self {
        var builder = Self.init()
        for htmlTagName in ZHTMLParserBuilder.htmlTagNames {
            builder = builder.add(htmlTagName)
        }
        for styleAttribute in ZHTMLParserBuilder.styleAttributes {
            builder = builder.add(styleAttribute)
        }
        return builder
    }
    
    public func set(_ htmlTagName: HTMLTagName, withCustomStyle markupStyle: MarkupStyle?) -> Self {
        return self.add(htmlTagName, withCustomStyle: markupStyle)
    }
    
    public func add(_ htmlTagName: HTMLTagName, withCustomStyle markupStyle: MarkupStyle? = nil) -> Self {
        // Only one instance of the same tagName can exist
        htmlTags.removeAll { htmlTag in
            return htmlTag.tagName.string == htmlTagName.string
        }
        
        htmlTags.append(HTMLTag(tagName: htmlTagName, customStyle: markupStyle))
        
        return self
    }
    
    public func add(_ styleAttribute: HTMLTagStyleAttribute) -> Self {
        styleAttributes.removeAll { thisStyleAttribute in
            return thisStyleAttribute.styleName == styleAttribute.styleName
        }
        
        styleAttributes.append(styleAttribute)
        
        return self
    }
    
    public func set(rootStyle: MarkupStyle) -> Self {
        self.rootStyle = rootStyle
        return self
    }
    
    public func set(policy: MarkupStylePolicy) -> Self {
        self.policy = policy
        return self
    }
    
    public func build() -> ZHTMLParser {
        // ZHTMLParser init is only accessible internally, and external entities cannot initialize it directly.
        // It can only be initialized through ZHTMLParserBuilder init.
        return ZHTMLParser(htmlTags: htmlTags, styleAttributes: styleAttributes, policy: policy, rootStyle: rootStyle)
    }
}

Corresponding implementation for ZHTMLParserBuilder.swift in the source code.

The ‘initWithDefault’ function is set to include all implemented HTML tag names and style attributes by default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public extension ZHTMLParserBuilder {
    static var htmlTagNames: [HTMLTagName] {
        return [
            A_HTMLTagName(),
            B_HTMLTagName(),
            BR_HTMLTagName(),
            DIV_HTMLTagName(),
            HR_HTMLTagName(),
            I_HTMLTagName(),
            LI_HTMLTagName(),
            OL_HTMLTagName(),
            P_HTMLTagName(),
            SPAN_HTMLTagName(),
            STRONG_HTMLTagName(),
            U_HTMLTagName(),
            UL_HTMLTagName(),
            DEL_HTMLTagName(),
            TR_HTMLTagName(),
            TD_HTMLTagName(),
            TH_HTMLTagName(),
            TABLE_HTMLTagName(),
            IMG_HTMLTagName(handler: nil),
            // ...
        ]
    }
}

public extension ZHTMLParserBuilder {
    static var styleAttributes: [HTMLTagStyleAttribute] {
        return [
            ColorHTMLTagStyleAttribute(),
            BackgroundColorHTMLTagStyleAttribute(),
            FontSizeHTMLTagStyleAttribute(),
            FontWeightHTMLTagStyleAttribute(),
            LineHeightHTMLTagStyleAttribute(),
            WordSpacingHTMLTagStyleAttribute(),
            // ...
        ]
    }
}

The initialization of ZHTMLParser restricts it to being internal, meaning it cannot be directly initialized from outside and can only be initialized through ZHTMLParserBuilder.

ZHTMLParser encapsulates the operations for rendering, selecting, and stripping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
public final class ZHTMLParser: ZMarkupParser {
    let htmlTags: [HTMLTag]
    let styleAttributes: [HTMLTagStyleAttribute]
    let rootStyle: MarkupStyle?

    internal init(...) {
    }
    
    // Retrieves link style attributes
    public var linkTextAttributes: [NSAttributedString.Key: Any] {
        // ...
    }
    
    public func selector(_ string: String) -> HTMLSelector {
        // ...
    }
    
    public func selector(_ attributedString: NSAttributedString) -> HTMLSelector {
        // ...
    }
    
    public func render(_ string: String) -> NSAttributedString {
        // ...
    }
    
    // Allows rendering NSAttributedString within a node using the HTMLSelector result
    public func render(_ selector: HTMLSelector) -> NSAttributedString {
        // ...
    }
    
    public func render(_ attributedString: NSAttributedString) -> NSAttributedString {
        // ...
    }
    
    public func stripper(_ string: String) -> String {
        // ...
    }
    
    public func stripper(_ attributedString: NSAttributedString) -> NSAttributedString {
        // ...
    }
    
    // ...
}

This corresponds to the implementation in the ZHTMLParser.swift source code.

UIKit Issue

When using NSAttributedString, the most common scenario is to display it in a UITextView. However, there are some considerations to be aware of:

  • The link style inside a UITextView is uniformly determined by the linkTextAttributes property, and it won’t take into account the settings in NSAttributedString.Key. Moreover, individual link styles cannot be set separately. This is why we have the ZMarkupParser.linkTextAttributes available.
  • As for UILabel, there is currently no direct way to change the link style. Also, since UILabel does not have TextStorage, if you want to include NSTextAttachment images, you will need to handle it differently.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public extension UITextView {
    func setHtmlString(_ string: String, with parser: ZHTMLParser) {
        self.setHtmlString(NSAttributedString(string: string), with: parser)
    }
    
    func setHtmlString(_ string: NSAttributedString, with parser: ZHTMLParser) {
        self.attributedText = parser.render(string)
        self.linkTextAttributes = parser.linkTextAttributes
    }
}

public extension UILabel {
    func setHtmlString(_ string: String, with parser: ZHTMLParser) {
        self.setHtmlString(NSAttributedString(string: string), with: parser)
    }
    
    func setHtmlString(_ string: NSAttributedString, with parser: ZHTMLParser) {
        let attributedString = parser.render(string)
        attributedString.enumerateAttribute(NSAttributedString.Key.attachment, in: NSMakeRange(0, attributedString.string.utf16.count), options: []) { (value, effectiveRange, nil) in
            guard let attachment = value as? ZNSTextAttachment else {
                return
            }
            
            attachment.register(self)
        }
        
        self.attributedText = attributedString
    }
}

With these extensions added to UIKit, external users can simply use setHTMLString() without worries to accomplish the binding.

Handling Complex Rendering - Item Lists

Here, we document the implementation of item lists.

Using <ol> / <ul> in HTML to represent item lists:

1
2
3
4
5
6
<ul>
    <li>ItemA</li>
    <li>ItemB</li>
    <li>ItemC</li>
    //...
</ul>

Using the same parsing method mentioned earlier, we can obtain the other list items in visit(_ markup: ListItemMarkup) and know the current list index (thanks to the conversion to AST).

1
2
3
4
func visit(_ markup: ListItemMarkup) -> Result {
  let siblingListItems = markup.parentMarkup?.childMarkups.filter({ $0 is ListItemMarkup }) ?? []
  let position = (siblingListItems.firstIndex(where: { $0 === markup }) ?? 0)
}

NSParagraphStyle has an NSTextList object that can be used to display list items, but customization of the blank width is not possible (personally, I find the default blank width too large). If there is any space between the item symbol and the string, it may cause the line break to occur in an unexpected place, resulting in a strange display, as shown below:

Line Break Issue

There is a possibility to achieve better results through setting headIndent, firstLineHeadIndent, and NSTextTab, but even with testing, it may not produce perfect results for longer strings with varying font sizes.

For now, we have reached an acceptable result by manually composing the item list strings and inserting them before the content.

We only utilize NSTextList.MarkerFormat to generate item list symbols, rather than directly using NSTextList.

Supported list item symbols can be found here: MarkupStyleList.swift

The final display result: ( <ol><li> )

Final Display

Handling Complex Rendering - Tables

Similar to item lists, but this time for tables.

Using <table> in HTML to represent a table, <tr> for table rows, and <td>/<th> for table cells:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<table>
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
</table>

Testing with the native NSAttributedString.DocumentType.html has shown that it relies on Private macOS API NSTextBlock to achieve the rendering of HTML tables, enabling it to display the styles and contents accurately.

However, relying on Private API is not recommended. We cannot use Private API 🥲

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
func visit(_ markup: TableColumnMarkup) -> Result {
    let attributedString = collectAttributedString(markup)
    let siblingColumns = markup.parentMarkup?.childMarkups.filter({ $0 is TableColumnMarkup }) ?? []
    let position = (siblingColumns.firstIndex(where: { $0 === markup }) ?? 0)
    
    // Check if a desired width is specified externally, if not, set .max to prevent string truncation
    var maxLength: Int? = markup.fixedMaxLength
    if maxLength == nil {
        // If not specified, find the length of the first line in the same column as the maximum length
        if let tableRowMarkup = markup.parentMarkup as? TableRowMarkup,
           let firstTableRow = tableRowMarkup.parentMarkup?.childMarkups.first(where: { $0 is TableRowMarkup }) as? TableRowMarkup {
            let firstTableRowColumns = firstTableRow.childMarkups.filter({ $0 is TableColumnMarkup })
            if firstTableRowColumns.indices.contains(position) {
                let firstTableRowColumnAttributedString = collectAttributedString(firstTableRowColumns[position])
                let length = firstTableRowColumnAttributedString.string.utf16.count
                maxLength = length
            }
        }
    }
    
    if let maxLength = maxLength {
        // Truncate the field if it exceeds maxLength
        if attributedString.string.utf16.count > maxLength {
            attributedString.mutableString.setString(String(attributedString.string.prefix(maxLength)) + "...")
        } else {
            attributedString.mutableString.setString(attributedString.string.padding(toLength: maxLength, withPad: " ", startingAt: 0))
        }
    }
    
    if position < siblingColumns.count - 1 {
        // Add whitespace as spacing, external spacing width can be specified in number of whitespace characters
        attributedString.append(makeString(in: markup, string: String(repeating: " ", count: markup.spacing)))
    }
    
    return attributedString
}

func visit(_ markup: TableRowMarkup) -> Result {
    let attributedString = collectAttributedString(markup)
    attributedString.append(makeBreakLine(in: markup)) // Add line break, please refer to Source Code for details
    return attributedString
}

func visit(_ markup: TableMarkup) -> Result {
    let attributedString = collectAttributedString(markup)
    attributedString.append(makeBreakLine(in: markup)) // Add line break, please refer to Source Code for details
    attributedString.insert(makeBreakLine(in: markup), at: 0) // Add line break, please refer to Source Code for details
    return attributedString
}

**Final rendering effect as shown in the figure below:**

![Rendered Table](/assets/2724f02f6e7/1*Dft7H2BbeyWIO-dH4QpuSw.png)

The implementation is not perfect, but it is acceptable.

#### Complex Rendering Item — Image

Now, let's talk about the ultimate challenge - loading remote images into NSAttributedString.

**In HTML, use `<img>` to represent an image:**
```xml
<img src="https://user-images.githubusercontent.com/33706588/219608966-20e0c017-d05c-433a-9a52-091bc0cfd403.jpg" width="300" height="125"/>

You can specify the desired display size using the width / height HTML attributes.

Displaying images in NSAttributedString is much more complicated than expected; there is no perfect solution yet. I encountered some difficulties while working on text wrapping around images in UITextView, and this time, I have researched it extensively but still haven’t found a perfect solution.

For now, let’s ignore the issue of NSTextAttachment not being reusable and not releasing memory. We’ll focus on implementing a solution where we download the image from a remote source, place it in an NSTextAttachment, and then add it to NSAttributedString, with automatic content updates.

I have separated this functionality into a smaller project, so it can be optimized and reused in other projects:

ZNSTextAttachment on GitHub

The main idea is inspired by the series of articles Asynchronous NSTextAttachments. However, I replaced the final content update part (to display the downloaded image properly), and I added a Delegate/DataSource for external extensions.

Workflow and Relationships as shown in the image above

  • Declare the ZNSTextAttachmentable object, encapsulating the NSTextStorage object (built-in with UITextView) and UILabel itself (UILabel does not have NSTextStorage).
    • The replace(attachment: ZNSTextAttachment, to: ZResizableNSTextAttachment) function is used to implement replacing the attributedString within a specific NSRange.
  • The process involves wrapping the imageURL, PlaceholderImage, and desired display size in a ZNSTextAttachment, and initially displaying the image using a placeholder.
  • When the system needs to display the image on the screen, it will call the image(forBounds… method, and we start downloading the image data.
  • The DataSource is used externally to decide how to download or implement the Image Cache Policy. By default, URLSession is used to request the image data.
  • Once the download is complete, a new ZResizableNSTextAttachment is created, and the logic to customize the image size is implemented in attachmentBounds(for….
  • The replace(attachment: ZNSTextAttachment, to: ZResizableNSTextAttachment) method is called to replace the ZNSTextAttachment with the ZResizableNSTextAttachment.
  • A didLoad Delegate notification is sent out, allowing external connections if needed.
  • Completion.

For detailed code, please refer to the Source Code repository.

In order to refresh the UI without using NSLayoutManager.invalidateLayout(forCharacterRange: range, actualCharacterRange: nil) and NSLayoutManager.invalidateDisplay(forCharacterRange: range), the reason was that the UI wasn’t updating correctly. Since we already know the specific range, we can directly trigger the replacement of NSAttributedString to ensure the UI updates accurately.

The final display result is as follows:

1
2
<span style="color:red">こんにちは</span>こんにちはこんにちは <br />
<img src="https://user-images.githubusercontent.com/33706588/219608966-20e0c017-d05c-433a-9a52-091bc0cfd403.jpg"/>

![/assets/2724f02f6e7/1*bl65v-SVOK3H9ajR-Ksg6w.png)

Testing & Continuous Integration

For this project, in addition to writing Unit Tests for individual testing, Snapshot Tests were established to perform integration testing for an overall assessment of NSAttributedString.

The main functional logic has UnitTests, and combined with integration testing, the final Test Coverage is approximately 85%.

ZMarkupParser — codecov

ZMarkupParser — codecov

Snapshot Test

Directly import the framework and use:

1
2
3
4
5
6
7
8
9
10
11
12
13
import SnapshotTesting
// ...
func testShouldKeepNSAttributedString() {
  let parser = ZHTMLParserBuilder.initWithDefault().build()
  let textView = UITextView()
  textView.frame.size.width = 390
  textView.isScrollEnabled = false
  textView.backgroundColor = .white
  textView.setHtmlString("html string...", with: parser)
  textView.layoutIfNeeded()
  assertSnapshot(matching: textView, as: .image, record: false)
}
// ...

![/assets/2724f02f6e7/1*hLPeaOTOviA0jTPNOPu1hg.png)

Directly comparing the final result to the expected one ensures that the integration is functioning without any abnormalities.

Codecov Test Coverage

Integrating with Codecov.io (free for Public Repo) to evaluate Test Coverage. Simply install Codecov Github App and configure it.

After setting up the connection between Codecov and the Github Repo, you can also add codecov.yml in the root directory of the project.

1
2
3
4
5
6
comment:                  # this is a top-level key
  layout: "reach, diff, flags, files"
  behavior: default
  require_changes: false  # if true: only post the comment if coverage changes
  require_base: no        # [yes :: must have a base report to post]
  require_head: yes       # [yes :: must have a head report to post]

With this configuration, every time a PR is created or reopened, the CI will automatically run, and the test result will be commented in the PR.

![/assets/2724f02f6e7/1*AcKpF4dijglahV-iVYLvvA.png)

Continuous Integration

Github Action, CI integration: ci.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
name: CI

on:
  workflow_dispatch:
  pull_request:
    types: [opened, reopened]
  push:
    branches:
    - main

jobs:
  build:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v3
      - name: spm build and test
        run: |
          set -o pipefail
          xcodebuild test -workspace ZMarkupParser.xcworkspace -testPlan ZMarkupParser -scheme ZMarkupParser -enableCodeCoverage YES -resultBundlePath './scripts/TestResult.xcresult' -destination 'platform=iOS Simulator,name=iPhone 14,OS=16.1' build test | xcpretty
      - name: Codecov
        uses: codecov/codecov-action@v3.1.1
        with:
          xcode: true
          xcode_archive_path: './scripts/TestResult.xcresult'

This configuration triggers the build and test on PR opened/reopened or push to the main branch. The test coverage report will be uploaded to Codecov.

Regex

When it comes to regular expressions, every time I use them, I improve my skills. In this project, I didn’t use them extensively, but I wanted to extract paired HTML tags using regex, so I researched how to write the expression for that purpose.

Here are some cheat sheet notes on what I learned this time:

  • The ?: construct allows () to match and group the result but does not capture and return it. e.g., (?:https?:\/\/)?(?:www\.)?example\.com will return the entire URL https://www.example.com instead of just https:// and www.
  • The .+? construct performs a non-greedy match (finds the closest match and returns it). e.g., <.+?> will return <a> and </a> instead of the entire string <a>test</a>.
  • The (?=XYZ) construct matches any string until the string XYZ appears. Note that [^XYZ] is similar but matches any character until X, Y, or Z appears. e.g., (?:__)(.+?(?=__))(?:__) will match test.
  • The ?R construct recursively searches for values with the same rule. e.g., \((?:[^()]|((?R)))+\) will match (simple), (and(nested)), and (nested) in (simple) (and(nested)).

Swift currently does not support the above constructs.

Other Useful Regex Articles:

Swift Package Manager & Cocoapods

This was my first time developing with SPM and Cocoapods, and it was quite interesting. SPM is genuinely convenient. However, I encountered an issue when both projects depended on the same package; building both projects simultaneously caused one of them to fail due to the package not being found.

I uploaded ZMarkupParser to Cocoapods, but I haven’t tested whether it works properly since I developed it with SPM 😝.

ChatGPT

Based on my experience using ChatGPT in development, I found it most useful for assisting in proofreading Readme files. Regarding development questions, I didn’t always get the most accurate answers, especially when asking mid-senior level questions. In those cases, ChatGPT couldn’t provide a definite or correct answer (I encountered this when asking about certain regex rules).

Moreover, I wouldn’t rely on ChatGPT to write complex code. While it can help with simple code generation for objects, it’s not capable of completing an entire tool architecture. (At least, that’s how it is currently. Copilot might be more helpful for writing code in the future)

However, ChatGPT can provide guidance on certain knowledge gaps, giving us a general direction on how to approach certain tasks. Sometimes, our understanding might be too limited to effectively search for the right solution on Google, and that’s when ChatGPT becomes helpful.

Declaration

After more than three months of research and development, I am quite exhausted. Nevertheless, I want to emphasize that this project represents the feasible results I obtained through my research. It may not be the optimal solution, and there may still be room for improvement. This project is more like a starting point, and I welcome contributions to achieve the perfect solution for Markup Language to NSAttributedString conversion. Your contributions are greatly appreciated, as many aspects need the power of the community to improve.

Contributing

ZMarkupParser

ZMarkupParser

At this moment (2023/03/12), I can think of several areas for improvement, and I will document them in the repository later:

  1. Performance and algorithm optimization: Although it is faster and more stable than the native NSAttributedString.DocumentType.html, there is still room for improvement. I believe the performance is not as good as XMLParser. I hope that one day, we can achieve the same performance while maintaining customization and automatic error correction.
  2. Support for more HTML tags and style attribute conversions.
  3. Further optimization of ZNSTextAttachment to implement reuse and memory release, which may require studying CoreText.
  4. Support for Markdown parsing: As the underlying abstraction is not limited to HTML, it should be possible to create a front-end conversion from Markdown to Markup objects. Therefore, I named it ZMarkupParser instead of ZHTMLParser, hoping that one day it can also support Markdown to NSAttributedString conversion.
  5. Support for Any to Any conversion, e.g., HTML to Markdown, Markdown to HTML. Since we have the original AST tree (Markup objects), it is feasible to implement conversion between any Markup formats.
  6. Implement CSS !important functionality and enhance the inheritance strategy of MarkupStyle.
  7. Strengthen HTML selector functionality, which currently only provides basic filtering.
  8. And many more improvements. Please feel free to open issues.

If you find yourself with some spare time and want to support this project without coding, giving it a ⭐ will help more people discover the repo, and maybe some GitHub wizards will contribute!

Summary

ZMarkupParser

ZMarkupParser

These are all the technical details and thoughts behind my development of ZMarkupParser. It has taken me nearly three months of after-work and holiday time, countless research and experimentation, and finally, writing tests, increasing test coverage, and setting up CI to achieve a somewhat presentable result. I hope this tool can solve similar problems for others, and I hope we can all work together to make this tool even better.

pinkoi.com

pinkoi.com

Currently, it is being used in our company’s iOS app on pinkoi.com, and no issues have been found. 😄

Further Reading

Like Z Realm's work

For any questions or suggestions, please feel free to contact me.


This post is licensed under CC BY 4.0 by the author.

ZMarkupParser HTML String 轉換 NSAttributedString 工具

手工打造 HTML 解析器的那些事