iOSSoftware DevelopmentSwift

The Tech Behind Polar

Polaroids are instantly recognizable pieces of pop culture. They became a de-facto umbrella term for all instant photos. There really is something special about capturing a moment and seeing a physical print appear in your hands. This unique format attracts such a vibrant online community that shares their creations with the world. This kinds of communities would not be possible without a nice way to scan your shots.

What’s so hard about scanning images?

This is nothing new, right? Scanning is around for ages. Instant photos, however, have a property which makes them a bit hard to scan without a nice high-density flatbed scanner. They are glossy. Taking a simple photo with your phone will cause your beautiful image to be covered with a reflection of your phone and the lights around you.

There is a simple fix for this, however, you just take a picture from an angle which does not cause any reflections on the image. To get a nice shareable image you now have to fire up photoshop, correct the perspective, crop the image and fix the colors.

To get this tedious process simpler the nice guys at The Impossible Project created an app quite a few years back. It allows you to do all of this on your phone and streamlines the process. This app served its purpose – but after several years of no maintenance and technology advancements – we can do better.

This is how the idea for Polar was born.



What does Polar do differently?

To scan an image with The Impossible Project app you have to select all four corners of the image yourself. This takes about 30 seconds after you get the right perspective of the actual image. This doesn’t seem like much but gets really annoying when scanning multiple shots.

But wait, what we are actually trying to get the user to do is to detect a rectangle on a plane. A pretty simple problem for a machine learning algorithm. If you are developing for iOS – it is as simple as using the built-in Vision API model.

After the rectangle is detected, we have to do a couple more steps to create a perfect scan for our user. We have to crop out the Polaroid out of the image, we have to correct the perspective and finally correct the colors.

The result is an app that makes scanning Polaroids a very satisfying breeze – Polar.

Let’s talk about the actual theme of this blog post. – how is all of this implemented? We will go step by step over each step of the process.

Rectangle Detection

I mentioned earlier that the Vision API provided by Apple on all iOS devices solves this problem easily. The Vision framework allows you to send a VNDetectRectanglesRequest which accepts a single image as the output and provides all detected rectangles as the output. This feature works surprisingly well, and best of all – it is blazing fast!

I run the rectangle detection subroutine for every frame of the camera output. This way I can draw the detected rectangle on the screen on each frame and provide a live preview to the user. When the user taps on the preview of the detected rectangle, they begin the scanning process for the detected photo.


Cropping & Perspective Correction

As a result of our previous operation we get four points that represent the four corners of the rectangle in the image. Those are given as points on a [0,1] x [0,1] plane which means we just have to scale the coordinates to the resolution of the image:

func getScaled(scaleX: CGFloat, scaleY: CGFloat) -> DetectedRectangle {
    return DetectedRectangle(
        bottomLeft: CGPoint(x: bottomLeft.x * scaleX, y: bottomLeft.y * scaleY),
        bottomRight: CGPoint(x: bottomRight.x * scaleX, y: bottomRight.y * scaleY),
        topLeft: CGPoint(x: topLeft.x * scaleX, y: topLeft.y * scaleY),
        topRight: CGPoint(x: topRight.x * scaleX, y: topRight.y * scaleY)

Extracting a perspective corrected rectangle we just detected from the image is quite easy. We can use the CIPerspectiveCorrection image filter that comes bundled in the CoreImage framework. This filter accepts an image along with four corners of the rectangle we detected as the input. It outputs exactly what we want – a perspective corrected and cropped image of our rectangle.

static func extractRectangleFrom(image: CIImage, detectedRectangle: DetectedRectangle) -> CGImage {
    let filter: CIFilter = CIFilter(name: "CIPerspectiveCorrection")!
    filter.setValue(image, forKey: "inputImage")
    filter.setValue(CIVector(cgPoint: detectedRectangle.topLeft), forKey: "inputTopLeft")
    filter.setValue(CIVector(cgPoint: detectedRectangle.topRight), forKey: "inputTopRight")
    filter.setValue(CIVector(cgPoint: detectedRectangle.bottomLeft), forKey: "inputBottomLeft")
    filter.setValue(CIVector(cgPoint: detectedRectangle.bottomRight), forKey: "inputBottomRight")
    return CIContext(options: nil).createCGImage(filter.outputImage!, from: filter.outputImage!.extent)!

Color Correction

Scanning the image with our phone does not have the luxury of controlled lighting conditions we have in a flatbed. Which means that we need to do some colour-correction to complete our scanning process. This can be done by the user in any photo editing app. But why not automate the process if it is in any way possible?

When I talk about colour-correction of the scan, I am actually talking about fixing its white-balance. White balance is the main thing that different coloured lighting affects. The goal of our white-balance correction is to make whites look white in the digital image. If you take a look at my perspective corrected image above, you will see that the white frame appears yellow. This is because I scanned it under tungsten lighting.

One of the ways to correct the white-balance is to contrast-stretch all 3 colour layers of the image. This is a pretty basic technique but works well for the most part. Here is an image of how the histograms look on the original image and the goal we are trying to achieve.

We want each colour layer to utilise the whole range of the histogram, without having unutilised space on the sides of the histograms.

This effect can be achieved by using the Accelerate framework available in iOS. Accelerate framework provides APIs for running complex image modification algorithms fast on a relatively slow processor. By using the vImageContrastStretch_ARGB8888 method that Accelerate publishes we can get a decent result right away (middle image). However, the whites on the middle image are still not as white as I want them to be. This is because vImageContrastStretch is really careful not to destroy any image data, even if that data is almost insignificant. You can see the parts of the histogram it didn’t want to remove on the image above. There was some information in those parts of the histogram it just didn’t want to flatten.

To produce the results that are a bit more in line with my expectations we can use vImageEndsInContrastStretch_ARGB8888 which allows us to specify a percentage of the colour information we don’t mind losing. In my case, I chose to snip off 1% of the colour information from both sides.

let low = [0, 1, 1, 1].map { return UInt32($0) }
let hi = [0, 1, 1, 1].map { return UInt32($0) }


A couple of really useful APIs slapped together and we have an app which scans instant photos in a heartbeat. You can watch the full-size demo video here, or check it out for yourself on the App Store.

Leave a Reply