How Good Lusophones are Data Science LLM Agents?:Evaluating Agentic Approaches for Data Science in Portuguese Contexts
Large Language Models; Data Science; Machine Learning; Portuguese; Natural Language Processing.
Data science (DS) and data analysis are complex fields, often requiring highly specialized professionals and time-intensive methods. Analyzing data and creating predictive models commonly involve intricate planning and reasoning capabilities once considered exclusive to humans. However, current advancements in large language models (LLMs), such as complex reasoning and tool use, have challenged this notion. Such advancements are reflected in the numerous recent studies that effectively apply tool-using LLM agents for data analysis and machine learning tasks. However, these developments are not as accessible as one might hope, with most frameworks and evaluations exclusively conducted using English prompts, data, and metadata. Specifically, LLM-automated DS in Portuguese contexts remains largely unassessed in the literature. To address this gap, we aim to evaluate how capable language models are at conducting lusophone analysis. To that end, we will develop a new evaluation set for Portuguese automated data science and hope to employ it to validate LLMs' accuracy and linguistic consistency when performing exploratory data analysis (EDA) and machine learning engineering (MLE). Additionally, we will explore a novel approach to assess agents on automated exploratory analysis, by ranking their analyses based on the improvement they provide to the subsequent task of automated MLE when compared to an EDA-free baseline.