Skip to Content

Phred33 Quality Scores in Bioinformatics

Phred33 is a widely used quality score encoding system in bioinformatics, especially in next-generation sequencing (NGS) data. It represents the confidence of each base call in a DNA sequence and is essential for evaluating sequencing accuracy and data reliability.

Phred33 is the current standard encoding used by modern sequencing platforms and most bioinformatics tools.

What Is a Phred Quality Score?

A Phred quality score (Q score) quantifies the probability that a nucleotide base is incorrectly identified during sequencing.

The score is calculated using the formula:

Q=−10log⁡10(P)Q = -10 \log_{10}(P)Q=−10log10​(P)

Where:

  • P = probability that the base call is wrong
  • A higher Q score means higher confidence

Example:

  • Q20 → 1 error in 100 bases
  • Q30 → 1 error in 1,000 bases
  • Q40 → 1 error in 10,000 bases

What Does “Phred33” Mean?

In FASTQ files, quality scores are stored as ASCII characters.

Phred33 means that the numeric quality score is encoded by adding 33 to the Phred value.

ASCII character=Q+33\text{ASCII character} = Q + 33ASCII character=Q+33

This encoding allows quality scores to be efficiently stored as text characters.


https://pubmed.ncbi.nlm.nih.gov/15459287/