RESEARCH
Multilingual AI // Computer Vision // NLP
Multilingual Referring Expression Comprehension
Master's Thesis โข Instituto Superior Tรฉcnico, Lisbon โข 2024-2025
Overview
This research addresses a significant gap in multilingual referring expression comprehension by developing AI systems that can localize objects in images based on natural language descriptions across multiple languages.
The project demonstrates that effective multilingual referring expression comprehension can be achieved through strategic dataset expansion and architecture design, enabling more inclusive AI systems accessible to non-English speakers worldwide.
Key Contributions
- >Multilingual Dataset
Unified corpus spanning 10 languages: English, Portuguese, Spanish, French, German, Dutch, Italian, Korean, Chinese, and Russian. Contains 8 million referring expressions, 70,000 images, and 346,000 annotated objects. Built by expanding 12 existing English benchmarks.
- >Neural Architecture
Attention-anchored approach using frozen multilingual SigLIP2 encoders. Generates spatial anchors from attention distributions, refined through learned residuals for precise object localization.
- >Comprehensive Evaluation
Designed evaluation pipeline measuring model performance across languages and metrics, enabling systematic analysis of cross-lingual capabilities.
- >Open Resources
Published complete dataset, model weights, and evaluation code for community use on GitHub and Hugging Face platforms.
Performance Metrics
Available Resources
Publication
APA Citation
Nogueira, F. R. (2025).
Comprehension of Multilingual Expressions Referring to Target Objects in Visual Inputs.
arXiv preprint arXiv:2511.11427.
BibTeX
@misc{nogueira2025comprehensionmultilingualexpressionsreferring,
title={Comprehension of Multilingual Expressions Referring to Target Objects in Visual Inputs},
author={Francisco Nogueira and Alexandre Bernardino and Bruno Martins},
year={2025},
eprint={2511.11427},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.11427},
}Research Interests
Multilingual NLP
Developing AI systems that work across languages, enabling more inclusive and accessible technology for diverse global communities.
Computer Vision
Developing vision systems that bridge language and visual understanding, enabling machines to interpret and reason about visual content through natural language interactions.
Previous Research Experience
Research Assistant โ Data Analysis
Universidade de Sรฃo Paulo & Universidade do Rio de Janeiro โข Aug 2021 - Aug 2022
Applied topological data analysis to multivariate biological datasets for epidemiological research. Developed data visualizations and analysis pipelines for understanding complex biological patterns.