A deep learning approach for automatic video coding of deictic gestures in children with autism

Roberta, Bruschetta; Simona, Campisi; Mastrogiuseppe, Marilina; Elisa, Leonardi; Stefania, Aiello; Christian, Salvatore; Alessandro, Venturi; Elia, Schiavon; Agrippina, Campisi; Fama Francesca Isabella,; Cristina, Carrozza; Carla, Blandino; Flavia, Marino; Antonio, Cerasa; Olga, Capirci; Giovanni, Pioggia; Liliana, Ruta; Tartarisco, Gennaro.

doi:10.1109/ICECCME57830.2023.10253245

Autism is a heterogeneous neurodevelopmental condition characterized by impairments in social communication, along with restrictive and repetitive patterns of interests and behaviors and sensory atypicalities. Early impairments in gestural communication, especially in deictic gestures, are significantly associated with autism and strong predictors of language development. Despite the implication of deictic gestures in autism has been acknowledged, it has not been sufficiently explored by artificial intelligence. To address this, the paper proposes an automatic digital coding approach based on deep learning models. By using a transformer architecture, a multi-frame modeling strategy has been implemented and applied on 37 video clips of naturalistic mother-child interactions with the aim to recognize four main deictic gestures: pointing, giving, showing and requesting. The system was trained and validated on 31 clips, internally tested on 6 clips and externally tested on 5 extra clips, using Python. Preprocessing phase involves using a 1024 feature extractor based on Densenet pretrained on Imagenet. Preliminary results showed respectively 100% of accuracy for training set, 80% for validation set and 67% for internal testing set. These findings suggest that the proposed system is a very promising approach for the automatic analysis of deictic gestures. In future work, we plan to validate our model on a larger number of samples to achieve higher and more reliable performances.