What is reliability? Why do I need to do reliability test?

In the process of content analysis, whether through traditional content analysis methods or big data content analysis methods, the reliability test requires two or more coders to independently make judgments about the characteristics of the same piece of information or record and reach consistent conclusions. Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).

What is reliability test?

Reliability refers to the degree to which measurement data in a research process is independent of the measuring tool. It reflects the consistency of results generated by different researchers when they conduct repeated tests on the same phenomenon. [1]

In other words, if the measurement process is conducted two or more times, the resulting conclusions should be similar. This ensures the effectiveness or reliability of content analysis.

The reliability test can further manage the influence of researchers on the data based on scientific sampling. Ensuring reliability among coders guarantees a more consistent handling of the data, leading to more objective research results.

What is inter-coder reliability?

 In content analysis, it normally requires two or more coders to independently code a piece of information or record the features (i.e., coding units) of the content and reach a consensus. This consensus is quantitatively measured and referred to as inter-coder reliability [2].

Since one of the goals of big data content analysis is to define and record the features of information objectively, reliability is extremely important. Without establishing reliability, the measurement in big data content analysis is merely empty talk [3].

Why is reliability so important?

Inter-coder reliability is a standard of research quality. High variability among coders usually represents weak research methods, including poor operational definitions, categories, and coding training [4]. Typically, the information we study has manifest content and latent content. For manifest content, such as the size of a page or the source of a message, it is relatively easy to achieve a high level of objectivity and consensus. However, for latent content, such as tone or values conveyed in a report, coders must make subjective interpretations based on their own cognitive systems. In this case, the mutual subjective judgments among coders become more important because these subjective judgments should ideally be shared by other readers as well [5].

From a practical perspective, inter-coder reliability is crucial because high reliability reduces the chances of decision-makers making incorrect decisions [6]. Inter-coder reliability is a necessary condition (though not sufficient) for assessing the validity of a content analysis study. Without reliability, the conclusions of the study become questionable or even meaningless.

How to conduct reliability test?

Reliability can be measured using necessary tools, such as manual calculations using formulas or computer programs. Let’s take the example of assessing inter-coder reliability among coders in big data content analysis. The specific steps are as follows:

Step 1: Develop a coding guide based on the codebook. The coding guide should provide clear and unified instructions. It helps coders familiarize themselves with the topic, understand the codebook categories, and ensure that all coders understand the meaning of each category.

Step 2: Conduct coding test. Select test sample for coding. During the coding test for this sample, each coder should code independently without discussing or guiding each other. If machine coding has been executed, the reliability test for machine coding can be executed directly (in DiVoMiner, complete the reliability test for machine coding with a single click).

Step 3: Perform coding corrections. If the coding test results do not meet the desired reliability, the coding test need to be re-conducted. Before retesting, provide additional training and guidance to coders, especially for categories where there are significant differences in coding results. If there is machine coding involved, recheck and revise the keywords for options in the coding manual, striving to improve the options for categories, and then re-execute the reliability test for machine coding.

Step 4: Proceed with formal coding. Once all coders achieve the desired reliability, formal coding can begin.

Click here to learn how to conduct reliability test on DiVoMiner

A comprehensive reliability report should include the following information:

  • Number of samples and the rationale for the reliability analysis.
  • Relationship between the reliability samples and the total data: whether the reliability samples are a subset of the total samples or additional samples.
  • Coder information: number of coders (must be 2 or more), background, and whether the researchers are also coders.
  • Quantity of coding performed by each coder.
  • Selection of reliability indicators and rationale.
  • Inter-coder reliability for each variable.
  • Training time for coders.
  • Approach to handling disagreements among coders during the coding process for the total samples.
  • Where readers can access detailed coding guidelines, procedures, and coding tables.
  • Report the reliability level for each variable individually, rather than reporting an overall reliability for all variables.

TIPS:

Currently, there are 39 different agreement metrics [7], and the ones commonly used in communication are Percent agreement, Holstis Coefficient Reliability, Cohens kappa(k), Scotts pi(π), Cohens kappa(k), the Krippendorffs alpha(α). Holstis Coefficient Reliability is the most popular metric currently.

References:

[1] Earl R. Babbie (2005). The Practice of Social Research. Beijing: Huaxia Publishing House, 137-140; Michael W. Singletary (2000). Mass Communication Research: Contemporary Methods and Applications. Beijing: Huaxia Publishing House, 94-97;Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA: Sage, 12.

[2] Lombard, M., Snyder-Duch, J., & Bracken, C. C. (2004). A call for standardization in content analysis reliability. Human Communication Research, 30(3), 434-437.

[3] Neuendorf, K. A. (2002). The Content Analysis Guidebook. Sage Publications Inc., California.

[4] Kolbe, R. H. & Burnett. M. S. (1991). Content-analysis research: an examination of applications with directives for improving research reliability and objectivity. Journal of Consumer Research, 18(2), 243-250.

[5] Potter, W. J., & Levinedonnerstein, D. (1999). Rethinking validity and reliability in content analysis. Journal of Applied Communication Research, 27(3), 258-284.

[6] Rust, R. T., & Cooil, B. (1994). Reliability measures for qualitative data: theory and implications. Journal of Marketing Research, 31(1), 1-14.

[7] Popping, G. (1988). Stone parting device. EP0283674.

Is this article helpful?

Related Articles