To assess the advisors` agreement, we first calculated two reliable variation indices (RPI), one based on the test reliability of the ELAN manual, the second taking CCI into account for our study population. Note that while the two reliability indicators can be used to calculate the ROI, they are not equivalent in terms of accuracy and rigor. Test-test correlations represent a very accurate estimate of the reliability of the instrument (compared to a stable construction over time), the reliability of the Interrater reflects rather the accuracy of the evaluation process. The share of the (reliable) agreement was assessed on the basis of the two estimates of honourability to show the impact of the choice of the insurance measure on the evaluation and interpretation of the agreement. In addition to the absolute proportion of compliance, information on the magnitude of the (reliable) differences and on a possible systematic orientation of the differences is also relevant to the full evaluation of the agreement of the raters. Thus, this report takes into account three aspects of agreement: the percentages of ratings that differ reliably, if any, to what extent they differ, and the direction of the difference (i.e. a systematic tendency of the two groups of advisors to react to the other). In the analyses presented here, we also refer to the size of the differences based on factors that may influence the likelihood of divergent assessments in our sample: the sex of the assessed child, the bilingual family environment, and the subgroup of raters. We calculated reliability between rats for the mother-mother, as well as for the parent-teacher evaluation subgroups and for the study population as a whole. We calculated the intra-class correlation coefficient as a measure of Inter-Rater`s reliability, which reflects the accuracy of the scoring process according to the formula proposed by Bortz and During (2006), see also Shrout and Fleiss (1979): Inter-Rater reliability was calculated within the subgroups and in the general population studied as an estimate of the accuracy of the scoring process.
For the mother-father rating subgroup, the intra-class correlation coefficient (ICC) rICC – 0.906, for the parent-teacher subgroup, a rICC CCI – 0.793 was found. For the population as a whole, the CCI calculation revealed a reliability of rICC – 0.837. Confidence intervals (α – 0.05) of commitments for subgroups and for the study population overlap, indicating that they do not differ from each other (see Figure 2 for CCIs and corresponding confidence intervals). Thus, we found no evidence that the ability of THE NANL to distinguish between children with high and low vocabulary is reduced when, instead of two parents, a parent and a teacher make assessments. Liao, S.C., Hunt, E. A., and Chen, W. (2010). Comparison between Inter-Rater`s reliability and the Inter-Rater agreement for performance evaluation. Annal. Acad. Med.
Singapore 39, 613. Variations between advisors in measurement methods and variability in the interpretation of measurement results are two examples of sources of error variability in evaluation measures. Clear guidelines for reporting assessments are required for reliability in ambiguous or demanding measurement scenarios. Another way to illustrate the magnitude of the differences is to indicate the distribution of significant differences, with the average T values represented against the absolute differential values proposed by Bland and Altman (1986, 2003).