M. Oliveira1, B.J. Duarte1, R. Escarpini1, L. Silva1, B. Almeida Pimentel1, J. Fusco Lobo2, N. Cruz1, R. de Amorim Silva1
Disagreements among individuals are common in everyday life. However, when these differences arise in structured evaluation tasks—such as digital textbook assessments—their consequences can be significant. This is particularly true in large-scale public programs like Brazil’s Programa Nacional do Livro e do Material Didático (PNLD), the National Textbook and Teaching Materials Program. Managed by the Ministry of Education, PNLD is one of the largest textbook distribution programs in the world. It is responsible for acquiring, evaluating, and distributing instructional materials to millions of students and educators across Brazilian public schools.
Each textbook submitted to the PNLD undergoes a thorough evaluation process involving multiple reviewers who assess pedagogical quality, curricular alignment, and technical aspects of the material. However, divergent reviews often emerge during this process, which can hinder the effectiveness of decision-making. These discrepancies not only compromise the reliability of the selection process but can also generate rework, delay approvals, and affect the financial negotiations involved in the procurement of educational content.
In editorial contexts where review outcomes determine pricing or market access, subjective or inconsistent evaluations can result in biased decisions and financial inefficiencies. Therefore, strategies that provide transparent, data-driven support for resolving evaluation impasses are crucial. This study responds to that need by proposing an automated, metric-based method designed to enhance objectivity and improve confidence in final decisions.
This study proposes an automated method, called Automated Minerva Vote (AVM), to support decision-making in cases of impasse between reviewers. The method introduces quantitative metrics that allow for the assessment of the reliability and degree of alignment of the reviews issued.
The AVM was developed based on real data proceeding from a digital textbook evaluation process within the Brazilian National Textbook Program and implemented in Python. Three main metrics were defined to support the selection of the most reliable review: (i) Magnitude of Divergence: measures the distance between the reviews; (ii) Relative Weight: assesses the reviewer's history of alignment with previous final decisions; (iii) Accuracy Rate: calculates the frequency with which the reviews coincided with the outcome of the process. These metrics were calculated individually for each reviewer and exported in structured CSV files, enabling subsequent analysis and the creation of informative dashboards.
The application of VMA revealed recurring patterns of divergence among certain reviewers. It was possible to identify which reviews were more or less consistent with the consensus decisions, enabling more assertive interventions by editorial teams. The approach proved to be a useful and scalable tool for supporting the resolution of impasses and improving the reliability of the review process.
The use of VMA represents a step forward in the search for more objective and impartial reviews in editorial contexts. It is argued that the model can increase transparency, mitigate bias, and reduce losses due to inconsistent judgments. It is an expandable model since it is possible to adapt the model to contexts with more than two reviewers and extend it to other areas, such as peer review in scientific journals and cultural project selection.
Keywords: Educational Evaluation, Textbook Assessment, Digital Textbooks, Public Policy in Education, National Textbook Program (PNLD).