Opt-In Vision

Transforming images to retain utility while minimizing sensitive information

Sander De Coninck —— 2025-08-01

Images are powerful. In a single frame, you can capture not just what’s happening, but who’s involved, where it’s taking place, what time of day it is, and countless other details. This richness of information makes images a goldmine for machine learning applications. But that same richness can be a liability, particularly when it comes to privacy.

Take a simple task like counting how many people pass by a certain street each day. A seemingly straightforward solution would be to install a camera and use one of the many off-the-shelf person detection models to track and count individuals. Easy enough. But the camera doesn’t just capture silhouettes or counts: it records faces, behaviors, social interactions, even potentially sensitive activities. In other words, it captures far more than what’s needed for the task.

This over-collection is at the heart of the problem: images are inherently too informative. For example, someone trying to count how many people pass a street doesn’t need to know who those people are, what they’re doing, or where they’re headed, yet all of that is captured by default. The challenge, then, is how to retain just enough information for your task, while protecting everything else.

Two Paths Forward

There are two common strategies to mitigate these privacy concerns:

Process on the edge: Keep the footage local to the camera. Run your person detection model there, and only output the final count—never transferring the raw footage anywhere else.
Transform before sending: Instead of full footage, apply a lightweight transformation to the image that removes unnecessary information but retains what your model needs. The transformed image can then be safely sent elsewhere for processing.

We focus on the second approach, and here’s why.

Edge compute resources are limited. While lightweight person detection might work on-device, more complex tasks (like behavior analysis or multi-object tracking) often require models too large to run locally, especially if privacy-preserving computation (like secure enclaves) is added to the mix.

Additionally, in many real-world applications, you may rely on third-party services. These services aren’t going to give you their proprietary models to run on the edge, which makes cloud deployment more practical. Centralized models are also easier to maintain and update, which is crucial for scalability.

So if cloud-based processing is the goal, the question becomes: How do we transform images so they’re safe to transmit and store, but still useful for the task at hand?

Our Approach: Learning to Obfuscate

We propose a system that uses a lightweight autoencoder neural network to transform the raw image into a new representation that retains only the information relevant to the task (like person detection), while discarding everything else. This is a form of “opt-in” privacy: the image explicitly opts in to only the information the task requires.

Training such a system is tricky. How do you ensure that the transformation truly removes sensitive details while keeping the image useful?

We tackle this using an adversarial training scheme, inspired by generative adversarial networks. The core idea is to pit two networks against each other:

The obfuscator, which learns to strip away as much irrelevant information as possible while preserving task-specific utility.
The deobfuscator, whose job is to reconstruct the original image from the obfuscated version.

The more information the obfuscator leaves behind, the better the deobfuscator will do. So the obfuscator learns to make the deobfuscator’s job as hard as possible, effectively minimizing the mutual information between the original and obfuscated image.

But we add an important constraint: the obfuscated image still needs to be usable for the target task. So we pass the obfuscated images to a pre-trained, frozen network for that task (e.g., pedestrian detection), and include its loss in the obfuscator’s training objective. This ensures that while the obfuscator removes excess information, it retains what the task-specific model needs to perform well.

The result is a system that learns what not to share, automatically.

Putting It to the Test

In our paper, we introduce this framework and demonstrate its effectiveness using pedestrian detection as a case study. The system is lightweight, privacy-aware, and retains high task performance, without exposing more information than necessary. For a sneak peek at the results, you can find an image, its obfuscated version and the person detections below.

Find the full version of our paper, with more information on method and results here: https://link.springer.com/article/10.1007/s10489-024-05489-9.