Overview

The Audience Measurement Engine is a C++ computer vision library that measures how people interact with a product or display that appears in a camera image (for example a shop window, a product stand, or a digital signage screen). It processes video frames, detects faces, estimates gaze and emotions, and produces both enriched images and numeric indicators that can be consumed by an analytics layer.

At a high level, the engine answers three key questions for a configured region of interest in the scene:

How many people looked at the product? — number of detected viewers and the time they spent in front of the camera.
How strongly did the product capture their attention? — attention estimation based on head and eye gaze direction toward the target region.
What emotional response did it evoke? — total time spent in six discrete emotion categories, aggregated across viewers and over time.

Main capabilities

Face detection and tracking using a DNN-based face detector and dense facial landmarks.
Gaze estimation for head and eyes to determine whether a viewer is actually looking at the product or display.
Facial emotion recognition based on a neural network classifier, providing per-frame emotion probabilities for each detected viewer.
Temporal aggregation of per-frame results into viewer-level and session-level statistics (counts, dwell times, emotional time distribution).
On-image visualization such as face outlines, halos around active detections, information boxes, and simple history charts for debugging and demo purposes.

Architecture and key components

Internally, the Audience Measurement Engine builds on a small set of core components:

EngineBase — common base class that handles initialization, image pre-/post-processing, and basic visualization helpers (watermark/signature and simple history charts).
AudienceMeasurementEngine — specialization of EngineBase that wires together face detection, face mesh, gaze estimation, and emotion classification into a single processing pipeline exposed via init() and process().
core::Detection — lightweight structure that stores detection results (bounding box, landmarks, optional features and property map) and is passed between modules.
Face analysis stack:
- core::face::FaceDetectorYunet — finds faces in the input frame using OpenCV’s DNN-based YuNet detector.
- core::face::FaceMesh — predicts dense facial landmarks for each detected face.
- core::filter::LandmarkFlowFilter — tracks landmarks over time using optical flow to obtain stable trajectories in video.
- core::filter::PoseKalmanFilter (used by gaze modules) — smooths noisy pose estimates with a Kalman filter.
Gaze estimation stack:
- gaze::HeadGazeEstimation — estimates 3D head pose and gaze direction.
- gaze::EyeGazeEstimation — refines gaze using eye-region landmarks and also estimates eye openness.
- gaze::GazeEstimation — wrapper that combines head and eye gaze into a single interface used by the engine.
Emotion recognition stack:
- core::NeuralBase — common infrastructure for loading, configuring and running OpenCV DNN models.
- core::NeuralClassifierBase — convenience wrapper that runs a classification network and returns a map of class probabilities.
- core::face::EmotionClassifier — concrete classifier for facial emotions; its outputs are normalized by AudienceMeasurementEngine::normalizeMap() and aggregated over time.
Visualization helpers in the visualization namespace:
- Drawing of facial landmarks and triangulations for debugging (FaceDrawer).
- Halo and halo animations around faces to highlight active detections.
- Overlay of key-value information boxes and detection properties next to each face.
- Support for simple on-frame charts showing the history of selected metrics.

Processing pipeline

For each incoming frame, AudienceMeasurementEngine::process() executes the following steps:

Pre-processing — resize and pad the input frame to the working resolution and color format expected by the underlying models.
Face detection — run face detection and create core::Detection objects with bounding boxes and (optionally) sparse landmarks.
Landmarks and pose — refine landmarks with the face mesh network, track them over time, and estimate head pose.
Gaze estimation — compute gaze direction and eye openness, which is used to determine whether a viewer is actually looking at the configured region of interest.
Emotion classification — crop the face, run the emotion network, and obtain a probability distribution over the configured emotion classes.
Aggregation — update internal history buffers with the latest attention and emotion scores so that time-based statistics (dwell time, total time per emotion category, trends) can be computed.
Visualization and output — draw optional overlays (halos, info boxes, history charts, watermark) onto an output frame and return it to the caller along with any exported metrics.

Typical usage

The engine is designed to be embedded into a host application that acquires video frames from a camera and forwards them to the processing pipeline. A minimal integration in C++ looks like this:

AudienceMeasurementEngine engine;

// Load models, configuration files and visual assets from a resource folder.
engine.init("path/to/resources");

cv::Mat inputFrame;
cv::Mat outputFrame;

for (;;)
{
    // 1. Grab a frame from the camera into inputFrame.
    // 2. Process the frame with the engine.
    engine.process(inputFrame, outputFrame);

    // 3. Display or stream outputFrame, and collect exported metrics
    //    (viewer counts, attention scores, emotion statistics).
}

The resource directory passed to init() typically contains the neural network model files (for face detection, face mesh, gaze, emotion recognition), class label files, and any visual assets required by the on-frame overlays.

Use cases

Audience measurement for products placed in a shop window or behind glass.
Measuring viewer engagement for in-store promotional stands and shelves.
Evaluating the impact of digital signage content on passers-by.
Running A/B tests on product placement or creative content by comparing traffic, attention, and emotional response.

How to navigate this documentation

Use the Namespaces tab to explore modules such as visualization and core.
Use the Classes tab to inspect details of classes like AudienceMeasurementEngine, EngineBase, EmotionClassifier, and the gaze-related components.
Use the Files tab to see the public headers that make up the engine API.

Together, these pages give a detailed view of how the Audience Measurement Engine is implemented and how to integrate it into your own applications.