RESEARCH

Multilingual AI // Computer Vision // NLP


๐Ÿ”ฌ

Multilingual Referring Expression Comprehension

Master's Thesis โ€ข Instituto Superior Tรฉcnico, Lisbon โ€ข 2024-2025

Overview

This research addresses a significant gap in multilingual referring expression comprehension by developing AI systems that can localize objects in images based on natural language descriptions across multiple languages.

The project demonstrates that effective multilingual referring expression comprehension can be achieved through strategic dataset expansion and architecture design, enabling more inclusive AI systems accessible to non-English speakers worldwide.

Key Contributions

  • >
    Multilingual Dataset

    Unified corpus spanning 10 languages: English, Portuguese, Spanish, French, German, Dutch, Italian, Korean, Chinese, and Russian. Contains 8 million referring expressions, 70,000 images, and 346,000 annotated objects. Built by expanding 12 existing English benchmarks.

  • >
    Neural Architecture

    Attention-anchored approach using frozen multilingual SigLIP2 encoders. Generates spatial anchors from attention distributions, refined through learned residuals for precise object localization.

  • >
    Comprehensive Evaluation

    Designed evaluation pipeline measuring model performance across languages and metrics, enabling systematic analysis of cross-lingual capabilities.

  • >
    Open Resources

    Published complete dataset, model weights, and evaluation code for community use on GitHub and Hugging Face platforms.

Performance Metrics

86.9%
Accuracy at IoU@50
RefCOCO Multilingual
2-4%
Gap from English
Romance Languages
<8%
Performance Variance
Across Language Families

Publication

APA Citation

Nogueira, F. R. (2025).

Comprehension of Multilingual Expressions Referring to Target Objects in Visual Inputs.

arXiv preprint arXiv:2511.11427.

BibTeX

@misc{nogueira2025comprehensionmultilingualexpressionsreferring,
      title={Comprehension of Multilingual Expressions Referring to Target Objects in Visual Inputs},
      author={Francisco Nogueira and Alexandre Bernardino and Bruno Martins},
      year={2025},
      eprint={2511.11427},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.11427},
}

Research Interests

Multilingual NLP

Developing AI systems that work across languages, enabling more inclusive and accessible technology for diverse global communities.

Computer Vision

Developing vision systems that bridge language and visual understanding, enabling machines to interpret and reason about visual content through natural language interactions.

Previous Research Experience

Research Assistant โ€“ Data Analysis

Universidade de Sรฃo Paulo & Universidade do Rio de Janeiro โ€ข Aug 2021 - Aug 2022

Applied topological data analysis to multivariate biological datasets for epidemiological research. Developed data visualizations and analysis pipelines for understanding complex biological patterns.