Exploring Vision — Automatic Face Detection and Cropping for App Profile Pictures (Swift)

Practical Applications of Vision

Posted Oct 17, 2018 Updated Aug 13, 2024

By ZhgChgLi

6 min read

Exploring Vision — Automatic Face Detection and Cropping for App Profile Pictures (Swift)

ℹ️ℹ️ℹ️ The following content is translated by OpenAI.

Click here to view the original Chinese version. | 點此查看本文中文版

Exploring Vision — Automatic Face Detection and Cropping for App Profile Pictures (Swift)

Practical Applications of Vision

[2024/08/13 Update]

Please refer to the new article and API: “iOS Vision framework x WWDC 24 Discover Swift enhancements in the Vision framework Session”

Without further ado, here’s a before-and-after image:

$Before vs After Optimization — [Marriage App](https://itunes.apple.com/tw/app/%E7%B5%90%E5%A9%9A%E5%90%A7-%E4%B8%8D%E6%89%BE%E6%9C%80%E8%B2%B4-%E5%8F%AA%E6%89%BE%E6%9C%80%E5%B0%8D/id1356057329?ls=1&mt=8){:target="_blank"}$

Before vs After Optimization — Marriage App

Recently, with the release of iOS 12, I noticed the newly available CoreML machine learning framework. I found it quite interesting and started to think about how I could apply it to our current products.

CoreML introductory article has been published: Using Machine Learning to Automatically Predict Article Categories, Including Self-Training the Model

CoreML provides interfaces for training and integrating machine learning models for text and images into apps. My initial idea was to use CoreML for face detection to address the issue of profile pictures being cropped awkwardly in the app, as shown in the left image above. When a face appears near the edges, it can easily get cut off due to scaling and cropping.

After some research, I realized that I was a bit behind the curve; this functionality was already introduced in iOS 11 with the “Vision” framework, which supports text detection, face detection, image matching, QR code detection, object tracking, and more.

The face detection feature is what we are using here, and after optimization, it looks like the right image above; it detects faces and crops the image accordingly.

Let’s Get Started:

First, we’ll implement a feature to mark the position of detected faces and get acquainted with how to use Vision.

Demo App

The completed image is shown above, which can mark the positions of faces in the photo.

P.S. It can only mark “faces”; it doesn’t include the entire head or hair 😅

This part of the code is mainly divided into two sections. The first part addresses the issue of leaving white space when scaling the original image to fit into an ImageView. In simple terms, we want the size of the Image to match the size of the ImageView. If we directly insert the image, it can lead to misalignment as shown below:

You might think about changing the ContentMode to fill, fit, or redraw, but that would distort the image or cut it off.

  
let ratio = UIScreen.main.bounds.size.width
// This is because I set the left and right alignment of the UIImageView to 0, with an aspect ratio of 1:1.

let sourceImage = UIImage(named: "Demo2")?.kf.resize(to: CGSize(width: ratio, height: CGFloat.leastNonzeroMagnitude), for: .aspectFill)
// Using KingFisher's image resizing feature, with width as the base and height flexible.

imageView.contentMode = .redraw
// Set contentMode to redraw to fill the space.

imageView.image = sourceImage
// Assign the image.

imageViewConstraints.constant = (ratio - (sourceImage?.size.height ?? 0))
imageView.layoutIfNeeded()
imageView.sizeToFit()
// This part modifies the imageView's constraints; for details, see the complete example at the end.

This is how we handle the image.

Cropping is assisted by Kingfisher, but you can replace it with other libraries or custom methods.

The second part focuses on the main code for face detection:

  
if #available(iOS 11.0, *) {
    // Supported only in iOS 11 and later
    let completionHandle: VNRequestCompletionHandler = { request, error in
        if let faceObservations = request.results as? [VNFaceObservation] {
            // Detected faces
            
            DispatchQueue.main.async {
                // Update UI on the main thread
                let size = self.imageView.frame.size
                
                faceObservations.forEach({ (faceObservation) in
                    // Coordinate system transformation
                    let translate = CGAffineTransform.identity.scaledBy(x: size.width, y: size.height)
                    let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -size.height)
                    let transRect = faceObservation.boundingBox.applying(translate).applying(transform)
                    
                    let markerView = UIView(frame: transRect)
                    markerView.backgroundColor = UIColor(red: 0/255, green: 255/255, blue: 0/255, alpha: 0.3)
                    self.imageView.addSubview(markerView)
                })
            }
        } else {
            print("No faces detected")
        }
    }
    
    // Face detection request
    let baseRequest = VNDetectFaceRectanglesRequest(completionHandler: completionHandle)
    let faceHandle = VNImageRequestHandler(ciImage: ciImage, options: [:])
    DispatchQueue.global().async {
        // Face detection takes time, so we run it in a background thread to avoid freezing the UI
        do {
            try faceHandle.perform([baseRequest])
        } catch {
            print("Error: \(error)")
        }
    }
  
} else {
    print("Not supported")
}

The key point to note is the coordinate system transformation; the results from detection are in the original image’s coordinates. We need to convert them to the actual coordinates of the enclosing ImageView to use them correctly.

Now, let’s move on to the main event of the day — cropping the profile picture based on the detected face position.

  
let ratio = UIScreen.main.bounds.size.width
// This is because I set the left and right alignment of the UIImageView to 0, with an aspect ratio of 1:1; for details, see the complete example at the end.

let sourceImage = UIImage(named: "Demo")

imageView.contentMode = .scaleAspectFill
// Use scaleAspectFill mode to fill the space.

imageView.image = sourceImage
// Directly assign the original image; we will manipulate it later.

if let image = sourceImage, #available(iOS 11.0, *), let ciImage = CIImage(image: image) {
    let completionHandle: VNRequestCompletionHandler = { request, error in
        if request.results?.count == 1, let faceObservation = request.results?.first as? VNFaceObservation {
            // One face detected
            let size = CGSize(width: ratio, height: ratio)
            
            let translate = CGAffineTransform.identity.scaledBy(x: size.width, y: size.height)
            let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -size.height)
            let finalRect = faceObservation.boundingBox.applying(translate).applying(transform)
            
            let center = CGPoint(x: (finalRect.origin.x + finalRect.width / 2 - size.width / 2), y: (finalRect.origin.y + finalRect.height / 2 - size.height / 2))
            // Calculate the center point of the face's bounding box
            
            let newImage = image.kf.resize(to: size, for: .aspectFill).kf.crop(to: size, anchorOn: center)
            // Crop the image based on the center point
            
            DispatchQueue.main.async {
                // Update UI on the main thread
                self.imageView.image = newImage
            }
        } else {
            print("Detected multiple faces or no faces")
        }
    }
    let baseRequest = VNDetectFaceRectanglesRequest(completionHandler: completionHandle)
    let faceHandle = VNImageRequestHandler(ciImage: ciImage, options: [:])
    DispatchQueue.global().async {
        do {
            try faceHandle.perform([baseRequest])
        } catch {
            print("Error: \(error)")
        }
    }
} else {
    print("Not supported")
}

The logic is similar to marking the face positions, but the difference for the profile picture is that it has a fixed size (e.g., 300x300), so we skip the initial part about fitting the Image to the ImageView.

Another difference is that we need to calculate the center point of the face’s bounding box and use this point for cropping the image.