This study evaluates the performance of two generative pre-trained transformer (GPT) models, ChatGPT 4o and Gemini Pro 1.5, in responding to a banking sector outlook survey designed to support supervisory policy assessment. The study is structured around three main exercises. The first classifies questions related to the analysis of financial risks, challenges, and outlooks. The second employs prompt engineering to mitigate biases and address potential hallucinations in the models. The third constructs Cosine Similarity Indices (CSIs) to compare the responses of the GPT models with those of banks.
The findings suggest that both models can serve as valuable tools for assessing a bank’s financial performance, risks, and macroeconomic outlook. These results underscore their potential to complement a supervisor’s analytical tasks. However, the observed differences in responses may point to the complex and potentially non-linear relationships that underlie bank performance, thereby reinforcing the continued importance of human judgment in bank supervision.