Back to All Events

Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis


  • In the medical domain, particularly in radiography, large-scale datasets are generally limited to English reports and to specific body areas. Data annotation is expensive and poorly available to train supervised deep learning models.

  • To explore vision-language pretraining using bone X-rays paired with French reports sourced from a single university hospital department.

  • We leveraged bone radiographs and their associated French reports from a single hospital to pretrain a versatile vision-language model, to be used as a backbone for a variety of tasks trained with limited supervision.

  • The obtained multi-modal representation is shown to result in downstream task performance that are competitive with models trained with a significantly larger amount of human supervision.

Project type: research

Team status: complete

Initiator: Alexandre Englebert, MD

Co-investigators: Anne-Sophie Collin, Olivier Cornu, MD & PhD, Christophe De Vleeschouwer, PhD

Fundings: FNRS/FRIA grant to Alexandre Englebert for his PhD

Project status: done

Previous
Previous
April 1

Rise of large language models in neurosurgery

Next
Next
July 20

Representation Bias in Global Healthcare: AI-depicted vs. real-world medical leadership