The diversity of sign representation is essential for Sign Language Production (SLP) as it captures variations in appearance, facial expressions, and hand movements. However, existing SLP models are often unable to capture diversity while preserving visual quality and modelling non-manual attributes such as emotions. To address this problem, we propose a novel approach that leverages Latent Diffusion Model (LDM) to synthesise photorealistic digital avatars from a generated reference image. We propose a novel sign feature aggregation module that explicitly models the non-manual features (e.g., the face) and the manual features (e.g., the hands). We show that our proposed module ensures the preservation of linguistic content while seamlessly using reference images with different ethnic backgrounds to ensure diversity. Experiments on the YouTube-SL-25 sign language dataset show that our pipeline achieves superior visual quality compared to state-of-the-art methods, with significant improvements on perceptual metrics.
Given a sequence of video frames \(\mathcal{V} = \{ \mathbf{V}_i \}\) of a sign language, our goal is to synthesise a diverse sequence \(\mathcal{O} = \{ \mathbf{O}_i \}\) that faithfully preserves both the manual and non-manual linguistic features while allowing for variation across different signer appearances. Our novel feature aggregation module, \(\Psi_{\text{motion}}\), uses multi-scale dilated convolutions with dilation rates \(d \in \{1, 2, 4\}\) to fuse fine-grained non-manual details (e.g., facial expressions) and coarse manual gestures (e.g., hand movements) into a unified representation. The LDM then generates each frame \(\mathbf{O}_i\) through an iterative denoising process in the latent space, guided by the aggregated features of \(\Psi_{\text{motion}}\), enabling the synthesis of signers with diverse ethnic and visual characteristics.
@article{diverse_sign,
title={Diverse Signer Avatars with Manual and Non-Manual Feature Modelling for Sign Language Production},
author={Mohamed Ilyes Lakhal and Richard Bowden},
booktitle={ArXiv},
year={2026}
}