DIGITAL LIBRARY
VERIFICATION OF THE INTER-RATER RELIABILITY IN THE NATIONAL UNIVERSITY CORPORATION EVALUATION IN JAPAN
1 National Institution for Academic Degrees and Quality Enhancement of Higher Education (JAPAN)
2 Bunkyo University (JAPAN)
3 Toyohashi University of Technology (JAPAN)
About this paper:
Appears in: INTED2023 Proceedings
Publication year: 2023
Page: 1051 (abstract only)
ISBN: 978-84-09-49026-4
ISSN: 2340-1079
doi: 10.21125/inted.2023.0312
Conference name: 17th International Technology, Education and Development Conference
Dates: 6-8 March, 2023
Location: Valencia, Spain
Abstract:
There is an increasing demand for university evaluation to assure the quality of higher education. To maintain the quality of the evaluation system, the quality assurance system should be improved, and the peer-review process is incorporated in the quality assurance system. Therefore, the peer-review process should be visualized and verified. However, few studies have discussed the peer-review judgment process. This paper examined the extent to which research performance ratings in university evaluations are consistent across multiple evaluators.

An extract from the evaluation and self-assessment reports regarding the judgment of the level of research achievement performed by national universities in FY2020 was analyzed. External evaluators judged research achievement using the “Research Achievement Statement” submitted by universities. The evaluation was made from two perspectives: “academic significance” and “social, economic, and cultural significance,” on a three-point scale: “SS,” “S,” or “less than S.” Each evaluation report was evaluated by at least two external evaluators assigned to each area. Because the number of submitted research achievements in some academic fields was smaller than required for quantitative examination, this analysis included combinations of >=20 research achievements. Consequently, 248 academic fields were analyzed regarding “academic significance” and 94 regarding “social, economic, and cultural significance.” The weighted Cohen kappa coefficient was used as an indicator of inter-rater reliability on an ordinal scale (−1.0 to 1.0), with higher values indicating greater agreement in judgments.

Results showed that the most frequently observed response was “Fair (0.2 < κ <= 0.4),” followed by “Slight (0.0 < κ <= 0.2)” and “No agreement (κ <= 0.0).” Regarding differences by academic field, a high degree of agreement existed among raters in traditional fields (e.g., science, pharmacy), unlike interdisciplinary fields. In conclusion, overall ratings tended to fairly agree with each other. Even when evaluators were not in agreement, their evaluations were not necessarily rejected, possibly due to a balancing act to avoid bias toward fields represented by the same keywords within the same academic field. Even if conflicting results existed for the same research achievement, there was a tendency for consistency in the results of each judgment. These results will provide new insights regarding the future development of transparent evaluation methods in quality assurance agencies and universities.
Keywords:
University evaluation, inter-rater reliability, evaluation reports, kappa coefficient.