|
Title:
|
EXPLORING RANKING CONSISTENCY OF GENERATIVE AI IN MOOC PLATFORM EVALUATION: A NON-PARAMETRIC APPROACH |
|
Author(s):
|
Victor K. Y. Chan |
|
ISBN:
|
978-989-8704-72-6 |
|
Editors:
|
Demetrios G. Sampson, Dirk Ifenthaler and Pedro IsaĆas |
|
Year:
|
2025 |
|
Edition:
|
Single |
|
Keywords:
|
Generative AI, MOOC Platforms, Ranking Consistency, Relative Rankings |
|
Type:
|
Full Paper |
|
First Page:
|
53 |
|
Last Page:
|
60 |
|
Language:
|
English |
|
Cover:
|
|
|
Full Contents:
|
if you are a member please login
|
|
Paper Abstract:
|
This paper extends a prior study on the consistency of generative Artificial Intelligence (AI) models in evaluating Massive Open Online Course (MOOC) platforms. While the original work focused on the consistency of direct numerical scores, this research investigates the consistency of the rankings derived from those scores. When evaluating platforms, the relative order (i.e., which platform is better than another) is often more critical to a decision-maker than the absolute scores, which may be subject to systematic biases. This study analyzes the scores of 31 MOOC platforms across eight dimensions as evaluated by two AI models, Claude+ and Dragonfly. A suite of non-parametric statistical methods are employed, including Spearman's Rank Correlation Coefficient (?), Kendall's Tau (?), and the top-weighted Rank-Biased Overlap (RBO), to measure the concordance of the platform rankings produced by each model. The Wilcoxon Signed-Rank Test is used to assess systematic differences in scoring. Results indicate a moderate to strong monotonic correlation in rankings for dimensions like (2) pedagogical design, (1) content/course quality, and (6) Learner Engagement, reinforcing the original study's findings of consistency. However, the RBO analysis reveals that this agreement is weaker for the top-ranked platforms, providing a more nuanced understanding of AI evaluation consistency. The systemic scoring bias found in the original study is also reaffirmed here. This rank-based analysis offers a robust alternative to score-based comparisons, mitigating the effects of differing internal scoring scales and highlighting the practical utility of AI evaluations for comparative decision-making. By shifting the focus from absolute scores to relative rankings, this study underscores the practical value of generative AI as a decision-support tool in educational technology evaluation. The findings not only enhance methodological rigor in AI-based assessments but also provide actionable insights for learners and institutions navigating an increasingly complex MOOC landscape. |
|
|
|
|
|
|