PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO Versão em Português Versão em Inglês Versão em Espanhol Versão em Francês

Telefone/Ramal: (35) 3829-5195/5195
E-mail: posgrad_si.icet@ufla.br
Notícias

Banca de DEFESA: JOAO PAULO PAIVA LIMA

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
DISCENTE: JOAO PAULO PAIVA LIMA
DATA: 22/05/2026
HORA: 14:00
LOCAL: meet.google.com/row-mwxp-sdv
TÍTULO:
How Good Lusophones are Data Science LLM Agents?: Evaluating Agentic Approaches for Data Science in Portuguese Contexts

PALAVRAS-CHAVES:

Large Language Models; Data Science; Machine Learning; Portuguese; Natural Language Processing.


PÁGINAS: 109
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
SUBÁREA: Metodologia e Técnicas da Computação
RESUMO:
Data Science (DS), a complex field often involving advanced concepts of statistics, mathematics, and programming, has historically been restricted to deeply knowledgeable human professionals. Recent advancements in artificial reasoning and the expanding toolset of Large Language Models (LLMs) have challenged this paradigm. More specifically, agentic frameworks, which provide LLMs tools like code execution and document searching, have pushed automated DS performance to near-human levels. However, current research on LLM-based methods for automated DS is still heavily biased towards the English language. In this work, we provide an assessment for agentic DS for prompting, data, and metadata in Portuguese in comparison to English. To this end, we expanded on previous advancements to develop \textsc{DataExplainer}, an agentic framework for generating DS solutions with step-by-step explanations in the target language. We employed the aforementioned framework to evaluate four different LLM models on a set of seven different translated Kaggle competitions. All assessed LLMs produced coherent Portuguese solutions (more than 97\% of text generated in the correct language), with the agent powered by GPT-OSS surpassing the average Kaggle user on five out of the seven evaluated competitions. However, there is still a gap in performance, with 10.1\% lower mean scores for Portuguese across models when compared to English. Additionally, the data indicates a larger gap for code-tuned models and a smaller gap for thinking models.
 

MEMBROS DA BANCA:
Externo à Instituição - WLADMIR CARDOSO BRANDÃO - PUC-MG (Membro)
Interno - MARLUCE RODRIGUES PEREIRA (Membro)
Interno - LUIZ HENRIQUE DE CAMPOS MERSCHMANN (Membro)
Interno - ELAINE CECILIA GATTO (Suplente)
Presidente - DENILSON ALVES PEREIRA (Membro)
Externo à Instituição - ANDERSON ALMEIDA FERREIRA - UFOP (Suplente)
Notícia cadastrada em: 11/05/2026 07:58
SIGAA | DGTI - Diretoria de Gestão de Tecnologia da Informação - Contatos (abre nova janela): https://ufla.br/contato | © UFLA | appserver4.srv4inst1 15/06/2026 10:03