iOS Vision Framework x WWDC 24: Discover Swift Enhancements in the Vision Framework Session

A review of the Vision framework features & hands-on with the new Swift API in iOS 18

Posted Aug 13, 2024 Updated Aug 14, 2024

By ZhgChgLi

25 min read

iOS Vision Framework x WWDC 24: Discover Swift Enhancements in the Vision Framework Session

ℹ️ℹ️ℹ️ The following content is translated by OpenAI.

Click here to view the original Chinese version. | 點此查看本文中文版

iOS Vision Framework x WWDC 24: Discover Swift Enhancements in the Vision Framework Session

A review of the Vision framework features & hands-on with the new Swift API in iOS 18

$Photo by [BoliviaInteligente](https://unsplash.com/@boliviainteligente?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash){:target="_blank"}$

Photo by BoliviaInteligente

Topic

The relationship between Vision Pro and hot dogs is as unrelated as it gets.

Vision Framework

The Vision framework is Apple’s integrated image recognition framework that leverages machine learning, allowing developers to easily and quickly implement common image recognition functionalities. Launched with iOS 11.0 (2017/iPhone 8), the framework has undergone continuous iterations and optimizations, enhancing its integration with Swift Concurrency to improve execution performance. Starting from iOS 18.0, it introduces a brand new Swift Vision framework API that maximizes the benefits of Swift Concurrency.

Features of the Vision Framework

Built-in methods for various image recognition and dynamic tracking (31 methods available as of iOS 18)
On-device processing using the phone’s chip, ensuring fast and secure recognition without relying on cloud services
Simple and user-friendly API
Supported across all Apple platforms: iOS 11.0+, iPadOS 11.0+, Mac Catalyst 13.0+, macOS 10.13+, tvOS 11.0+, visionOS 1.0+
Released for several years (2017-present) with ongoing updates
Integrated Swift language features to enhance computational performance

I played around with this 6 years ago: An Introduction to Vision — Automatic Face Cropping for App Profile Pictures (Swift)

This time, I revisited it alongside the WWDC 24 Discover Swift enhancements in the Vision framework Session to explore the new Swift features again.

CoreML

Apple also has another framework called CoreML, which is a machine learning framework based on on-device processing. It allows you to train your own models for recognizing objects and documents, which can then be integrated directly into your app. Interested developers can give it a try (e.g., Real-time Article Classification, real-time Spam Detection …).

P.S.

Vision vs. VisionKit:

Vision is primarily used for image analysis tasks such as face recognition, barcode detection, and text recognition. It provides powerful APIs for processing and analyzing visual content in static images or videos.

VisionKit is specifically designed for tasks related to document scanning. It provides a scanner view controller that can be used to scan documents and generate high-quality PDFs or images.

The Vision framework cannot run on M1 models in the simulator; it can only be tested on physical devices. Running it in a simulator environment will throw a Could not create Espresso context error. I checked the official forum discussion but couldn’t find a solution.

Since I don’t have a physical iOS 18 device for testing, all execution results in this article are based on older code (pre-iOS 18). If any errors arise with the new code, please feel free to leave a comment.

WWDC 2024 — Discover Swift Enhancements in the Vision Framework

$[Discover Swift enhancements in the Vision framework](https://developer.apple.com/videos/play/wwdc2024/10163/?time=45){:target="_blank"}$

Discover Swift enhancements in the Vision framework

This article shares notes from the WWDC 24 session on Discover Swift enhancements in the Vision framework along with some personal experimental insights.

Introduction — Vision Framework Features

Face Recognition and Contour Detection

Text Recognition in Image Content

As of iOS 18, it supports 18 languages.

  
// List of supported languages
if #available(iOS 18.0, *) {
  print(RecognizeTextRequest().supportedRecognitionLanguages.map { "\($0.languageCode!)-\(($0.region?.identifier ?? $0.script?.identifier)!)" })
} else {
  print(try! VNRecognizeTextRequest().supportedRecognitionLanguages())
}

// The actual available recognition languages are as follows:
// The output from iOS 18 shows the following results:
// ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant", "yue-Hans", "yue-Hant", "ko-KR", "ja-JP", "ru-RU", "uk-UA", "th-TH", "vi-VT", "ar-SA", "ars-SA"]
// I did not see the Swedish language mentioned at WWDC; it's unclear if it hasn't been released yet or if it's related to device region and language settings.

Dynamic Motion Capture

Enables dynamic capture of people and objects
Gesture recognition allows for air signature functionality

What’s New in Vision? (iOS 18) — Image Scoring Feature (Quality, Memorability)

Can calculate a score for input images, making it easier to filter out high-quality photos
The scoring method includes multiple dimensions, not just image quality, but also lighting, angle, subject matter, and whether it evokes a memorable point … etc.

During WWDC, the above three images were used for explanation (under the same quality):

High-scoring image: good composition, lighting, and memorable points
Low-scoring image: lacks a subject, appears to be a casual or accidental shot
Utility image: technically well-taken but lacks memorable points, like stock images.

iOS ≥ 18 New API: CalculateImageAestheticsScoresRequest

  
let request = CalculateImageAestheticsScoresRequest()
let result = try await request.perform(on: URL(string: "https://zhgchg.li/assets/cb65fd5ab770/1*yL3vI1ADzwlovctW5WQgJw.jpeg")!)

// Photo score
print(result.overallScore)

// Whether it is classified as a utility image
print(result.isUtility)

What’s New in Vision? (iOS 18) — Simultaneous Detection of Body and Gesture Poses

Previously, body poses and hand poses could only be detected individually. This update allows developers to detect body poses and hand poses simultaneously, combining them into a single request and result, facilitating the development of more application functionalities.

iOS ≥ 18 New API: DetectHumanBodyPoseRequest

  
var request = DetectHumanBodyPoseRequest()
// Also detect hand poses
request.detectsHands = true

guard let bodyPose = try await request.perform(on: image).first else { return }

// Body Pose Joints
let bodyJoints = bodyPose.allJoints()
// Left Hand Pose Joints
let leftHandJoints = bodyPose.leftHand.allJoints()
// Right Hand Pose Joints
let rightHandJoints = bodyPose.rightHand.allJoints()

New Vision API

In this update, Apple has provided new Swift Vision API wrappers for developers. In addition to supporting the original functionalities, the focus is on enhancing Swift 6 / Swift Concurrency features, offering better performance and a more Swift-like API operation style.

Get Started with Vision

The speaker reintroduced the basic usage of the Vision framework, which Apple has packaged into 31 types (as of iOS 18) of common image recognition requests and their corresponding observation objects.

Request: DetectFaceRectanglesRequest for face area recognition Result: FaceObservation The previous article “An Introduction to Vision — Automatic Face Cropping for App Profile Pictures (Swift)” used this pair of requests.
Request: RecognizeTextRequest for text recognition Result: RecognizedTextObservation
Request: GenerateObjectnessBasedSaliencyImageRequest for subject object recognition Result: SaliencyImageObservation

All 31 Types of Requests:

VisionRequest.

Here’s the translated text in naturalistic English while preserving the original markdown image sources:

Request Purpose	Observation Description
CalculateImageAestheticsScoresRequest Calculate the aesthetics score of an image.	AestheticsObservation Returns the aesthetics score of the image, considering factors like composition and color.
ClassifyImageRequest Classify the content of an image.	ClassificationObservation Returns classification labels and confidence levels for objects or scenes in the image.
CoreMLRequest Analyze the image using a Core ML model.	CoreMLFeatureValueObservation Generates observations based on the output of the Core ML model.
DetectAnimalBodyPoseRequest Detect the pose of animals in the image.	RecognizedPointsObservation Returns the skeletal points of the animal and their locations.
DetectBarcodesRequest Detect barcodes in the image.	BarcodeObservation Returns barcode data and types (e.g., QR code).
DetectContoursRequest Detect contours in the image.	ContoursObservation Returns the detected contour lines in the image.
DetectDocumentSegmentationRequest Detect and segment documents in the image.	RectangleObservation Returns the rectangular boundary positions of the document.
DetectFaceCaptureQualityRequest Evaluate the quality of face capture.	FaceObservation Returns a quality assessment score for the face image.
DetectFaceLandmarksRequest Detect facial landmarks.	FaceObservation Returns detailed positions of facial landmarks (e.g., eyes, nose, etc.).
DetectFaceRectanglesRequest Detect faces in the image.	FaceObservation Returns the boundary box positions of the faces.
DetectHorizonRequest Detect the horizon in the image.	HorizonObservation Returns the angle and position of the horizon.
DetectHumanBodyPose3DRequest Detect 3D human body poses in the image.	RecognizedPointsObservation Returns 3D skeletal points of the human body and their spatial coordinates.
DetectHumanBodyPoseRequest Detect human body poses in the image.	RecognizedPointsObservation Returns skeletal points of the human body and their coordinates.
DetectHumanHandPoseRequest Detect hand poses in the image.	RecognizedPointsObservation Returns skeletal points of the hand and their locations.
DetectHumanRectanglesRequest Detect humans in the image.	HumanObservation Returns the boundary box positions of the human figures.
DetectRectanglesRequest Detect rectangles in the image.	RectangleObservation Returns the coordinates of the four vertices of the rectangle.
DetectTextRectanglesRequest Detect text areas in the image.	TextObservation Returns the positions and boundary boxes of the text areas.
DetectTrajectoriesRequest Detect and analyze the motion trajectories of objects.	TrajectoryObservation Returns the motion trajectory points and their time series.
GenerateAttentionBasedSaliencyImageRequest Generate an attention-based saliency image.	SaliencyImageObservation Returns a saliency map highlighting the most attractive areas in the image.
GenerateForegroundInstanceMaskRequest Generate a foreground instance mask image.	InstanceMaskObservation Returns the mask of the foreground object.
GenerateImageFeaturePrintRequest Generate an image feature fingerprint for comparison.	FeaturePrintObservation Returns the feature fingerprint data of the image for similarity comparison.
GenerateObjectnessBasedSaliencyImageRequest Generate an objectness-based saliency image.	SaliencyImageObservation Returns a saliency map highlighting the salient areas based on objectness.
GeneratePersonInstanceMaskRequest Generate a person instance mask image.	InstanceMaskObservation Returns the mask of the person instance.
GeneratePersonSegmentationRequest Generate a person segmentation image.	SegmentationObservation Returns a binary image of the person segmentation.
RecognizeAnimalsRequest Detect and identify animals in the image.	RecognizedObjectObservation Returns the type of animal and its confidence level.
RecognizeTextRequest Detect and recognize text in the image.	RecognizedTextObservation Returns the detected text content and its area location.
TrackHomographicImageRegistrationRequest Track the homographic image registration.	ImageAlignmentObservation Returns the homographic transformation matrix between images for alignment.
TrackObjectRequest Track objects in the image.	DetectedObjectObservation Returns the position and speed information of the object in the image.
TrackOpticalFlowRequest Track optical flow in the image.	OpticalFlowObservation Returns the optical flow vector field to describe pixel movement.
TrackRectangleRequest Track rectangles in the image.	RectangleObservation Returns the position, size, and rotation angle of the rectangle in the image.
TrackTranslationalImageRegistrationRequest Track the translational image registration.	ImageAlignmentObservation Returns the translational transformation matrix between images for alignment.

Prefixing with VN indicates the old API syntax (for versions prior to iOS 18).

The speaker mentioned several commonly used requests, as follows.

ClassifyImageRequest

Recognize the input image and obtain classification labels and confidence levels.

[Travelogue] 2024 Second Visit to Kyushu 9-Day Free Trip, Entering via Busan → Hakata Cruise

  
if #available(iOS 18.0, *) {
    // New API using Swift features
    let request = ClassifyImageRequest()
    Task {
        do {
            let observations = try await request.perform(on: URL(string: "https://zhgchg.li/assets/cb65fd5ab770/1*yL3vI1ADzwlovctW5WQgJw.jpeg")!)
            observations.forEach {
                observation in
                print("\(observation.identifier): \(observation.confidence)")
            }
        }
        catch {
            print("Request failed: \(error)")
        }
    }
} else {
    // Old syntax
    let completionHandler: VNRequestCompletionHandler = {
        request, error in
        guard error == nil else {
            print("Request failed: \(String(describing: error))")
            return
        }
        guard let observations = request.results as? [VNClassificationObservation] else {
            return
        }
        observations.forEach {
            observation in
            print("\(observation.identifier): \(observation.confidence)")
        }
    }

    let request = VNClassifyImageRequest(completionHandler: completionHandler)
    DispatchQueue.global().async {
        let handler = VNImageRequestHandler(url: URL(string: "https://zhgchg.li/assets/cb65fd5ab770/1*3_jdrLurFuUfNdW4BJaRww.jpeg")!, options: [:])
        do {
            try handler.perform([request])
        }
        catch {
            print("Request failed: \(error)")
        }
    }
}

Analysis Results:

  
 • outdoor: 0.75392926
 • sky: 0.75392926
 • blue_sky: 0.7519531
 • machine: 0.6958008
 • cloudy: 0.26538086
 • structure: 0.15728651
 • sign: 0.14224191
 • fence: 0.118652344
 • banner: 0.0793457
 • material: 0.075975396
 • plant: 0.054406323
 • foliage: 0.05029297
 • light: 0.048126098
 • lamppost: 0.048095703
 • billboards: 0.040039062
 • art: 0.03977703
 • branch: 0.03930664
 • decoration: 0.036868922
 • flag: 0.036865234
.... and more

RecognizeTextRequest

Recognize the text content in the image (a.k.a. image-to-text).

[Travelogue] 2023 Tokyo 5-Day Free Trip

  
if #available(iOS 18.0, *) {
    // New API using Swift features
    var request = RecognizeTextRequest()
    request.recognitionLevel = .accurate
    request.recognitionLanguages = [.init(identifier: "ja-JP"), .init(identifier: "en-US")] // Specify language code, e.g., Traditional Chinese
    Task {
        do {
            let observations = try await request.perform(on: URL(string: "https://zhgchg.li/assets/9da2c51fa4f2/1*fBbNbDepYioQ-3-0XUkF6Q.jpeg")!)
            observations.forEach {
                observation in
                let topCandidate = observation.topCandidates(1).first
                print(topCandidate?.string ?? "No text recognized")
            }
        }
        catch {
            print("Request failed: \(error)")
        }
    }
} else {
    // Old syntax
    let completionHandler: VNRequestCompletionHandler = {
        request, error in
        guard error == nil else {
            print("Request failed: \(String(describing: error))")
            return
        }
        guard let observations = request.results as? [VNRecognizedTextObservation] else {
            return
        }
        observations.forEach {
            observation in
            let topCandidate = observation.topCandidates(1).first
            print(topCandidate?.string ?? "No text recognized")
        }
    }

    let request = VNRecognizeTextRequest(completionHandler: completionHandler)
    request.recognitionLevel = .accurate
    request.recognitionLanguages = ["ja-JP", "en-US"] // Specify language code, e.g., Traditional Chinese
    DispatchQueue.global().async {
        let handler = VNImageRequestHandler(url: URL(string: "https://zhgchg.li/assets/9da2c51fa4f2/1*fBbNbDepYioQ-3-0XUkF6Q.jpeg")!, options: [:])
        do {
            try handler.perform([request])
        }
        catch {
            print("Request failed: \(error)")
        }
    }
}

Analysis Results:

  
LE LABO 青山店
TEL:03-6419-7167
＊Thank you for your purchase*
No: 21347
Date: 2023/06/10 14:14:57
Person in charge:
1690370
Register: 008A 1
Product Name
Tax-inclusive Price Quantity Total Price
Kaiak 10 EDP FB 15ML
J1P7010000S
16,800
16,800
Another 13 EDP FB 15ML
J1PJ010000S
10,700
10,700
Lip Balm 15ML
JOWC010000S
2,000
1
Total Amount
(Tax included)
CARD
2,000
3 items purchased
29,500
0
29,500
29,500

DetectBarcodesRequest

Detect barcodes and QR code data in the image.

Recommended by locals in Thailand: Goose Brand Cooling Balm

  
let filePath = Bundle.main.path(forResource: "IMG_6777", ofType: "png")! // Local test image
let fileURL = URL(filePath: filePath)
if #available(iOS 18.0, *) {
    // New API using Swift features
    let request = DetectBarcodesRequest()
    Task {
        do {
            let observations = try await request.perform(on: fileURL)
            observations.forEach {
                observation in
                print("Payload: \(observation.payloadString ?? "No payload")")
                print("Symbology: \(observation.symbology)")
            }
        }
        catch {
            print("Request failed: \(error)")
        }
    }
} else {
    // Old syntax
    let completionHandler: VNRequestCompletionHandler = {
        request, error in
        guard error == nil else {
            print("Request failed: \(String(describing: error))")
            return
        }
        guard let observations = request.results as? [VNBarcodeObservation] else {
            return
        }
        observations.forEach {
            observation in
            print("Payload: \(observation.payloadStringValue ?? "No payload")")
            print("Symbology: \(observation.symbology.rawValue)")
        }
    }

    let request = VNDetectBarcodesRequest(completionHandler: completionHandler)
    DispatchQueue.global().async {
        let handler = VNImageRequestHandler(url: fileURL, options: [:])
        do {
            try handler.perform([request])
        }
        catch {
            print("Request failed: \(error)")
        }
    }
}

Analysis Results:

  
Payload: 8859126000911
Symbology: VNBarcodeSymbologyEAN13
Payload: https://lin.ee/hGynbVM
Symbology: VNBarcodeSymbologyQR
Payload: http://www.hongthaipanich.com/
Symbology: VNBarcodeSymbologyQR
Payload: https://www.facebook.com/qr?id=100063856061714
Symbology: VNBarcodeSymbologyQR

RecognizeAnimalsRequest

Identify animals in the image along with their confidence levels.

$[meme Source](https://www.redbubble.com/i/canvas-print/Funny-AI-Woman-yelling-at-a-cat-meme-design-Machine-learning-by-omolog/43039298.5Y5V7){:target="_blank"}$

meme Source

  
let filePath = Bundle.main.path(forResource: "IMG_5026", ofType: "png")! // Local test image
let fileURL = URL(filePath: filePath)
if #available(iOS 18.0, *) {
    // New API using Swift features
    let request = RecognizeAnimalsRequest()
    Task {
        do {
            let observations = try await request.perform(on: fileURL)
            observations.forEach { observation in
                let labels = observation.labels
                labels.forEach { label in
                    print("Detected animal: \(label.identifier) with confidence: \(label.confidence)")
                }
            }
        } catch {
            print("Request failed: \(error)")
        }
    }
} else {
    // Old method
    let completionHandler: VNRequestCompletionHandler = { request, error in
        guard error == nil else {
            print("Request failed: \(String(describing: error))")
            return
        }
        guard let observations = request.results as? [VNRecognizedObjectObservation] else {
            return
        }
        observations.forEach { observation in
            let labels = observation.labels
            labels.forEach { label in
                print("Detected animal: \(label.identifier) with confidence: \(label.confidence)")
            }
        }
    }

    let request = VNRecognizeAnimalsRequest(completionHandler: completionHandler)
    DispatchQueue.global().async {
        let handler = VNImageRequestHandler(url: fileURL, options: [:])
        do {
            try handler.perform([request])
        } catch {
            print("Request failed: \(error)")
        }
    }
}

Analysis Results:

  
Detected animal: Cat with confidence: 0.77245045

Others:

Detect humans in images: DetectHumanRectanglesRequest
Detect poses of humans and animals (both 3D and 2D): DetectAnimalBodyPoseRequest, DetectHumanBodyPose3DRequest, DetectHumanBodyPoseRequest, DetectHumanHandPoseRequest
Detect and track the motion trajectories of objects (in videos and animations): DetectTrajectoriesRequest, TrackObjectRequest, TrackRectangleRequest

iOS ≥ 18 Update Highlights:

  
VN*Request -> *Request (e.g. VNDetectBarcodesRequest -> DetectBarcodesRequest)
VN*Observation -> *Observation (e.g. VNRecognizedObjectObservation -> RecognizedObjectObservation)
VNRequestCompletionHandler -> async/await
VNImageRequestHandler.perform([VN*Request]) -> *Request.perform()

WWDC Example

The official WWDC video uses a supermarket product scanner as an example.

Most products have barcodes that can be scanned.

We can obtain the location of the barcode from observation.boundingBox, but unlike the common UIView coordinate system, the BoundingBox relative position starts from the bottom left corner, with values ranging from 0 to 1.

  
let filePath = Bundle.main.path(forResource: "IMG_6785", ofType: "png")! // Local test image
let fileURL = URL(filePath: filePath)
if #available(iOS 18.0, *) {
    // New API using Swift features
    var request = DetectBarcodesRequest()
    request.symbologies = [.ean13] // Specify to scan only EAN13 Barcode for better performance
    Task {
        do {
            let observations = try await request.perform(on: fileURL)
            if let observation = observations.first {
                DispatchQueue.main.async {
                    self.infoLabel.text = observation.payloadString
                    // Color layer for marking
                    let colorLayer = CALayer()
                    // iOS >=18 new coordinate conversion API toImageCoordinates
                    // Not tested; actual calculations may require adjustments for ContentMode = AspectFit:
                    colorLayer.frame = observation.boundingBox.toImageCoordinates(self.baseImageView.frame.size, origin: .upperLeft)
                    colorLayer.backgroundColor = UIColor.red.withAlphaComponent(0.5).cgColor
                    self.baseImageView.layer.addSublayer(colorLayer)
                }
                print("BoundingBox: \(observation.boundingBox.cgRect)")
                print("Payload: \(observation.payloadString ?? "No payload")")
                print("Symbology: \(observation.symbology)")
            }
        } catch {
            print("Request failed: \(error)")
        }
    }
} else {
    // Old method
    let completionHandler: VNRequestCompletionHandler = { request, error in
        guard error == nil else {
            print("Request failed: \(String(describing: error))")
            return
        }
        guard let observations = request.results as? [VNBarcodeObservation] else {
            return
        }
        if let observation = observations.first {
            DispatchQueue.main.async {
                self.infoLabel.text = observation.payloadStringValue
                // Color layer for marking
                let colorLayer = CALayer()
                colorLayer.frame = self.convertBoundingBox(observation.boundingBox, to: self.baseImageView)
                colorLayer.backgroundColor = UIColor.red.withAlphaComponent(0.5).cgColor
                self.baseImageView.layer.addSublayer(colorLayer)
            }
            print("BoundingBox: \(observation.boundingBox)")
            print("Payload: \(observation.payloadStringValue ?? "No payload")")
            print("Symbology: \(observation.symbology.rawValue)")
        }
    }

    let request = VNDetectBarcodesRequest(completionHandler: completionHandler)
    request.symbologies = [.ean13] // Specify to scan only EAN13 Barcode for better performance
    DispatchQueue.global().async {
        let handler = VNImageRequestHandler(url: fileURL, options: [:])
        do {
            try handler.perform([request])
        } catch {
            print("Request failed: \(error)")
        }
    }
}

iOS ≥ 18 Update Highlights:

// iOS >=18 new coordinate conversion API toImageCoordinates
observation.boundingBox.toImageCoordinates(CGSize, origin: .upperLeft)
// https://developer.apple.com/documentation/vision/normalizedpoint/toimagecoordinates(from:imagesize:origin:)

Helper:

  
// Generated by ChatGPT 4o
// Since the photo in the ImageView is set with ContentMode = AspectFit
// We need to calculate the vertical displacement caused by the fit
func convertBoundingBox(_ boundingBox: CGRect, to view: UIImageView) -> CGRect {
    guard let image = view.image else {
        return .zero
    }

    let imageSize = image.size
    let viewSize = view.bounds.size
    let imageRatio = imageSize.width / imageSize.height
    let viewRatio = viewSize.width / viewSize.height
    var scaleFactor: CGFloat
    var offsetX: CGFloat = 0
    var offsetY: CGFloat = 0
    if imageRatio > viewRatio {
        // Image fits in width
        scaleFactor = viewSize.width / imageSize.width
        offsetY = (viewSize.height - imageSize.height * scaleFactor) / 2
    } else {
        // Image fits in height
        scaleFactor = viewSize.height / imageSize.height
        offsetX = (viewSize.width - imageSize.width * scaleFactor) / 2
    }

    let x = boundingBox.minX * imageSize.width * scaleFactor + offsetX
    let y = (1 - boundingBox.maxY) * imageSize.height * scaleFactor + offsetY
    let width = boundingBox.width * imageSize.width * scaleFactor
    let height = boundingBox.height * imageSize.height * scaleFactor
    return CGRect(x: x, y: y, width: width, height: height)
}

Output Results

  
BoundingBox: (0.5295758928571429, 0.21408638121589782, 0.0943080357142857, 0.21254415360708087)
Payload: 4710018183805
Symbology: VNBarcodeSymbologyEAN13

Some products do not have barcodes, such as bulk fruits that only have product labels.

Therefore, our scanner also needs to support scanning plain text labels.

  
let filePath = Bundle.main.path(forResource: "apple", ofType: "jpg")! // Local test image
let fileURL = URL(filePath: filePath)
if #available(iOS 18.0, *) {
    // New API using Swift features
    var barcodesRequest = DetectBarcodesRequest()
    barcodesRequest.symbologies = [.ean13] // Specify to scan only EAN13 Barcode for better performance
    var textRequest = RecognizeTextRequest()
    textRequest.recognitionLanguages = [.init(identifier: "zh-Hnat"), .init(identifier: "en-US")]
    Task {
        do {
            let handler = ImageRequestHandler(fileURL)
            // Parameter pack syntax; we must wait for all requests to finish before using their results.
            // let (barcodesObservation, textObservation, ...) = try await handler.perform(barcodesRequest, textRequest, ...)
            let (barcodesObservation, textObservation) = try await handler.perform(barcodesRequest, textRequest)
            if let observation = barcodesObservation.first {
                DispatchQueue.main.async {
                    self.infoLabel.text = observation.payloadString
                    // Color layer for marking
                    let colorLayer = CALayer()
                    // iOS >=18 new coordinate conversion API toImageCoordinates
                    // Not tested; actual calculations may require adjustments for ContentMode = AspectFit:
                    colorLayer.frame = observation.boundingBox.toImageCoordinates(self.baseImageView.frame.size, origin: .upperLeft)
                    colorLayer.backgroundColor = UIColor.red.withAlphaComponent(0.5).cgColor
                    self.baseImageView.layer.addSublayer(colorLayer)
                }
                print("BoundingBox: \(observation.boundingBox.cgRect)")
                print("Payload: \(observation.payloadString ?? "No payload")")
                print("Symbology: \(observation.symbology)")
            }
            textObservation.forEach { observation in
                let topCandidate = observation.topCandidates(1).first
                print(topCandidate?.string ?? "No text recognized")
            }
        } catch {
            print("Request failed: \(error)")
        }
    }
} else {
    // Old method
    let barcodesCompletionHandler: VNRequestCompletionHandler = { request, error in
        guard error == nil else {
            print("Request failed: \(String(describing: error))")
            return
        }
        guard let observations = request.results as? [VNBarcodeObservation] else {
            return
        }
        if let observation = observations.first {
            DispatchQueue.main.async {
                self.infoLabel.text = observation.payloadStringValue
                // Color layer for marking
                let colorLayer = CALayer()
                colorLayer.frame = self.convertBoundingBox(observation.boundingBox, to: self.baseImageView)
                colorLayer.backgroundColor = UIColor.red.withAlphaComponent(0.5).cgColor
                self.baseImageView.layer.addSublayer(colorLayer)
            }
            print("BoundingBox: \(observation.boundingBox)")
            print("Payload: \(observation.payloadStringValue ?? "No payload")")
            print("Symbology: \(observation.symbology.rawValue)")
        }
    }

    let textCompletionHandler: VNRequestCompletionHandler = { request, error in
        guard error == nil else {
            print("Request failed: \(String(describing: error))")
            return
        }
        guard let observations = request.results as? [VNRecognizedTextObservation] else {
            return
        }
        observations.forEach { observation in
            let topCandidate = observation.topCandidates(1).first
            print(topCandidate?.string ?? "No text recognized")
        }
    }

    let barcodesRequest = VNDetectBarcodesRequest(completionHandler: barcodesCompletionHandler)
    barcodesRequest.symbologies = [.ean13] // Specify to scan only EAN13 Barcode for better performance
    let textRequest = VNRecognizeTextRequest(completionHandler: textCompletionHandler)
    textRequest.recognitionLevel = .accurate
    textRequest.recognitionLanguages = ["en-US"]
    DispatchQueue.global().async {
        let handler = VNImageRequestHandler(url: fileURL, options: [:])
        do {
            try handler.perform([barcodesRequest, textRequest])
        } catch {
            print("Request failed: \(error)")
        }
    }
}

Output Results:

94128s
ORGANIC
Pink Lady®
Produce of USh

iOS ≥ 18 Update Highlights:

  
let handler = ImageRequestHandler(fileURL)
// Parameter pack syntax; we must wait for all requests to finish before using their results.
// let (barcodesObservation, textObservation, ...) = try await handler.perform(barcodesRequest, textRequest, ...)
let (barcodesObservation, textObservation) = try await handler.perform(barcodesRequest, textRequest)

iOS ≥ 18 performAll() Method

The previous perform(barcodesRequest, textRequest) method requires waiting for both requests to complete before proceeding; starting with iOS 18, a new performAll() method is provided, allowing for streaming responses. You can handle results as soon as one of the requests is completed, such as responding immediately upon scanning a barcode.

  
if #available(iOS 18.0, *) {
    // New API using Swift features
    var barcodesRequest = DetectBarcodesRequest()
    barcodesRequest.symbologies = [.ean13] // Specify to scan only EAN13 Barcode for better performance
    var textRequest = RecognizeTextRequest()
    textRequest.recognitionLanguages = [.init(identifier: "zh-Hnat"), .init(identifier: "en-US")]
    Task {
        let handler = ImageRequestHandler(fileURL)
        let observation = handler.performAll([barcodesRequest, textRequest] as [any VisionRequest])
        for try await result in observation {
            switch result {
                case .detectBarcodes(_, let barcodesObservation):
                    if let observation = barcodesObservation.first {
                        DispatchQueue.main.async {
                            self.infoLabel.text = observation.payloadString
                            // Color layer for marking
                            let colorLayer = CALayer()
                            // iOS >=18 new coordinate conversion API toImageCoordinates
                            // Not tested; actual calculations may require adjustments for ContentMode = AspectFit:
                            colorLayer.frame = observation.boundingBox.toImageCoordinates(self.baseImageView.frame.size, origin: .upperLeft)
                            colorLayer.backgroundColor = UIColor.red.withAlphaComponent(0.5).cgColor
                            self.baseImageView.layer.addSublayer(colorLayer)
                        }
                        print("BoundingBox: \(observation.boundingBox.cgRect)")
                        print("Payload: \(observation.payloadString ?? "No payload")")
                        print("Symbology: \(observation.symbology)")
                    }
                case .recognizeText(_, let textObservation):
                    textObservation.forEach { observation in
                        let topCandidate = observation.topCandidates(1).first
                        print(topCandidate?.string ?? "No text recognized")
                    }
                default:
                    print("Unrecognized result: \(result)")
            }
        }
    }
}

Optimize with Swift Concurrency

Assuming we have a list of image thumbnails, where each image needs to be automatically cropped to focus on the main subject, we can effectively utilize Swift Concurrency to enhance loading efficiency.

Original Implementation

  
func generateThumbnail(url: URL) async throws -> UIImage {
  let request = GenerateAttentionBasedSaliencyImageRequest()
  let saliencyObservation = try await request.perform(on: url)
  return cropImage(url, to: saliencyObservation.salientObjects)
}
    
func generateAllThumbnails() async throws {
  for image in images {
    image.thumbnail = try await generateThumbnail(url: image.url)
  }
}

This approach processes one image at a time, resulting in slow efficiency and performance.

Optimization (1) — TaskGroup Concurrency

  
func generateAllThumbnails() async throws {
  try await withThrowingDiscardingTaskGroup { taskGroup in
    for image in images {
      image.thumbnail = try await generateThumbnail(url: image.url)
    }
  }
}

Here, each task is added to a TaskGroup for concurrent execution.

Note: Image recognition and cropping operations are very memory-intensive. If too many concurrent tasks are initiated without restraint, it may lead to user lag or out-of-memory (OOM) crashes.

Optimization (2) — TaskGroup Concurrency + Limiting Concurrent Tasks

  
func generateAllThumbnails() async throws {
    try await withThrowingDiscardingTaskGroup { taskGroup in
        // Limit the maximum number of concurrent tasks to 5
        let maxImageTasks = min(5, images.count)
        // Initially fill 5 tasks
        for index in 0..<maxImageTasks {
            taskGroup.addTask {
                images[index].thumbnail = try await generateThumbnail(url: images[index].url)
            }
        }
        var nextIndex = maxImageTasks
        for try await _ in taskGroup {
            // When a task in the taskGroup completes...
            // Check if we've reached the end of the index
            if nextIndex < images.count {
                let image = images[nextIndex]
                // Continue adding tasks (keeping the limit at 5)
                taskGroup.addTask {
                    image.thumbnail = try await generateThumbnail(url: image.url)
                }
                nextIndex += 1
            }
        }
    }
}

Update an Existing Vision App

Vision will remove CPU and GPU support for certain requests on devices equipped with a Neural Engine. On these devices, the Neural Engine is the optimal choice for performance. You can check this using the supportedComputeDevices() API.
Remove all VN prefixes: VNXXRequest, VNXXXObservation → Request, Observation.
Use async/await instead of the original VNRequestCompletionHandler.
Directly use *Request.perform() instead of the original VNImageRequestHandler.perform([VN*Request]).

Wrap-Up

APIs designed with new Swift language features.
New functionalities and methods are Swift-only, available for iOS ≥ 18.
New image scoring features, body and hand motion tracking.

Thanks!

KKday Recruitment

👉👉👉 This sharing session originates from the weekly technical sharing activities of the KKday App Team. The team is currently actively recruiting Senior iOS Engineers. Interested candidates are welcome to submit their resumes. 👈👈👈

References

Discover Swift enhancements in the Vision framework

The Vision Framework API has been redesigned to leverage modern Swift features like concurrency, making it easier and faster to integrate a wide array of Vision algorithms into your app. We’ll tour the updated API and share sample code, along with best practices, to help you get the benefits of this framework with less coding effort. We’ll also demonstrate two new features: image aesthetics and holistic body pose.

Chapters

Vision framework Apple Developer Documentation

If you have any questions or feedback, feel free to contact me.

This article was first published on Medium ➡️ Click Here

Automatically converted and synchronized using ZMediumToMarkdown and Medium-to-jekyll-starter.

Improve this page on Github.

Follow Me on Medium 1K+ Followers

1,335 Total Views
Last Statistics Date: 2025-08-06 | 596 Views on Medium.

KKday Tech Blog

This post is licensed under CC BY 4.0 by the author.

ℹ️ℹ️ℹ️ The following content is translated by OpenAI.

Click here to view the original Chinese version. | 點此查看本文中文版

iOS Vision Framework x WWDC 24: Discover Swift Enhancements in the Vision Framework Session

Topic

Vision Framework

CoreML

P.S.

WWDC 2024 — Discover Swift Enhancements in the Vision Framework

Introduction — Vision Framework Features

Face Recognition and Contour Detection

Text Recognition in Image Content

Dynamic Motion Capture

What’s New in Vision? (iOS 18) — Image Scoring Feature (Quality, Memorability)

What’s New in Vision? (iOS 18) — Simultaneous Detection of Body and Gesture Poses

New Vision API

Get Started with Vision

All 31 Types of Requests:

ClassifyImageRequest

RecognizeTextRequest

DetectBarcodesRequest

RecognizeAnimalsRequest

Others:

iOS ≥ 18 Update Highlights:

WWDC Example

Most products have barcodes that can be scanned.

Some products do not have barcodes, such as bulk fruits that only have product labels.

iOS ≥ 18 performAll() Method

Optimize with Swift Concurrency

Original Implementation

Optimization (1) — TaskGroup Concurrency

Optimization (2) — TaskGroup Concurrency + Limiting Concurrent Tasks

Update an Existing Vision App

Wrap-Up

Thanks!

KKday Recruitment

References

Discover Swift enhancements in the Vision framework

Chapters

Vision framework Apple Developer Documentation

Trending Tags