|
Title:
|
ARTIFICIAL INTELLIGENCE (AI) ASSISTANTS' EVALUATION OF ENVIRONMENTAL, SOCIAL, AND GOVERNANCE (ESG): HOW CONSISTENT AND RELIABLE ARE THE ASSISTANTS? |
|
Author(s):
|
Victor K. Y. Chan |
|
ISBN:
|
978-989-8704-62 |
|
Editors:
|
Paula Miranda and Pedro Isaías |
|
Year:
|
2024 |
|
Edition:
|
Single |
|
Keywords:
|
Environmental, Social, Governance, ESG, Generative Artificial Intelligence (AI), S&P 500 |
|
Type:
|
Full |
|
First Page:
|
285 |
|
Last Page:
|
292 |
|
Language:
|
English |
|
Cover:
|
|
|
Full Contents:
|
click to dowload
|
|
Paper Abstract:
|
This paper aims to investigate how consistent and thus reliable individual popular generative artificial intelligence (AI)
assistants are in evaluating the environmental, social, and governance (ESG) performance of the top companies/stocks
among the S&P 500. The three assistants employed in the underlying study were Meta Llama, Google PaLM, and
Microsoft Copilot, which were independently requested to award rating scores to the three ESG performance
components, namely, (1) Environmental, (2) Social, and (3) Governance, of the top 40 companies/stocks among the S&P
500. For each of the three assistants, the minimum, the maximum, the range, and the standard deviation of the rating
scores for each of the three components were calculated across all the 40 companies/stocks. The rating score difference
for each of the three components between any pair of the above three assistants was computed for each company/stock.
The mean of the absolute value, the minimum, the maximum, the range, and the standard deviation of the differences for
each component between each pair of assistants were calculated across all the companies/stocks. A paired sample t-test
was then administered to each component for the rating score difference between each assistant pair over all the
companies/stocks. Finally, Cronbach's coefficient alpha of the rating scores was computed for each of the three
components between all the three assistants across all the companies/stocks. These computational results were to signify
whether the three assistants accorded discrimination in evaluating each component across the companies/stocks, whether
each assistant, vis-à-vis each other assistant, erratically or systematically overrate or underrate any component over the
companies/stocks, and whether the three assistants were consistent and reliable in evaluating each component across the
companies/stocks. Apart from some ancillary results, it was affirmed that the three assistants were marginally consistent
and thus reliable, at least in a sense analogous to convergent validity and internal consistency, in evaluating all the three
components of the top 40 companies/stocks among the S&P 500. |
|
|
|
|
|
|