AddAdapters

Script: addadapters.sh Package: jgi Class: AddAdapters.java Status: DEPRECATED for paired reads

Tool designed for benchmarking adapter-trimming software. Adds synthetic adapters to reads for testing trimmer performance, or evaluates adapter trimming accuracy on previously processed files.

⚠️ DEPRECATED for paired reads: This tool does not understand insert size, making adapter placement unrealistic for paired-end data. Use RandomReads instead for paired reads as it adds adapters at biologically correct locations based on insert size, enabling overlap-based adapter detection.

Purpose and Scope

AddAdapters is specifically designed for grading the performance of adapter-trimming tools. It serves two primary functions:

  1. Synthetic Data Generation: Creates test datasets by adding known adapter contamination to clean reads
  2. Performance Evaluation: Analyzes trimmed reads to calculate trimming accuracy metrics
Recommended Workflow for Paired Reads:
Instead of AddAdapters, use RandomReads for realistic adapter contamination:
randomreads.sh ref=ref.fa out=reads.fq len=150 paired reads=100k \
    mininsert=50 maxinsert=350 fragadapter1=AGATCGGAAGAGC fragadapter2=CTGTCTCTTATAC
rename.sh in=reads.fq out=renamed.fq renamebytrim interleaved

The resulting reads can still be evaluated by AddAdapters in grade mode.

Basic Usage

addadapters.sh in=<file> in2=<file2> out=<outfile> out2=<outfile2> adapters=<file>

Operation Modes

Add Mode (default)
Synthetically contaminates clean reads with adapter sequences at random positions. Encodes the correct trimming answer in read headers for later evaluation.
Grade Mode
Evaluates adapter trimming performance by comparing actual read lengths against the encoded correct answers.

Parameters

Input/Output Parameters

in=<file>
Primary input file (FASTQ/FASTA format). Can be stdin.
in2=<file>
Secondary input file for paired reads (optional).
out=<file>
Primary output file. Required in add mode, unused in grade mode.
out2=<file>
Secondary output file for paired reads (optional).
ow=f
(overwrite) Overwrites files that already exist.
int=f
(interleaved) Determines whether INPUT file is considered interleaved.

Quality Parameters

qin=auto
ASCII offset for input quality. May be 33 (Sanger), 64 (Illumina), or auto.
qout=auto
ASCII offset for output quality. May be 33 (Sanger), 64 (Illumina), or auto (same as input).

Operation Mode Parameters

add
Add adapters to input files. Default mode.
grade
Evaluate trimmed input files.

Adapter Configuration

adapters=<file>
FASTA file of adapter sequences. Required in add mode.
literal=<sequence>
Comma-delimited list of adapter sequences as alternative to file.
left
Place adapters on the left (3') end of reads.
right
Place adapters on the right (5') end of reads. Default mode.
arc=f
Add reverse-complemented adapters as well as forward orientation.
rate=0.5
Fraction of reads that receive adapter contamination (0.0-1.0).

Adapter Addition Parameters

adderrors=t
Add sequencing errors to adapter bases using quality score error probabilities.
addpaired=t
Place adapters at the same position in both reads of a pair. Note: position is relative within each read, not based on insert size.
minlength=1
(minlen/ml) Minimum read length to consider valid after adapter addition.

Examples

Basic Synthetic Contamination (Single-End)

addadapters.sh in=clean_reads.fq out=contaminated.fq adapters=adapters.fa

Adds adapter sequences to 50% of reads (default rate) for benchmarking adapter trimming tools.

Controlled Contamination Rate

addadapters.sh in=reads.fq out=contaminated.fq literal=AGATCGGAAGAGC rate=0.3

Contaminates 30% of reads with the specified Illumina TruSeq adapter sequence.

Evaluate Trimming Performance

addadapters.sh in=trimmed_reads.fq grade

Analyzes trimmed reads to calculate adapter removal accuracy, over-trimming, and under-trimming rates.

Paired-End Contamination (Not Recommended)

addadapters.sh in1=r1.fq in2=r2.fq out1=cont_r1.fq out2=cont_r2.fq adapters=adapters.fa addpaired=t

Warning: This places adapters at the same relative position in both reads, which is not biologically realistic. Consider RandomReads instead.

Recommended Paired-End Workflow

# Generate realistic paired-end data with proper adapter placement
randomreads.sh ref=genome.fa out=reads.fq len=150 paired reads=100k \
    mininsert=50 maxinsert=350 fragadapter1=AGATCGGAAGAGC fragadapter2=CTGTCTCTTATAC

# Rename reads for compatibility with AddAdapters grading
rename.sh in=reads.fq out=test_data.fq renamebytrim interleaved

# Test your adapter trimmer
your_trimmer.sh in=test_data.fq out=trimmed.fq

# Grade the trimming performance
addadapters.sh in=trimmed.fq grade

This workflow creates biologically realistic test data where adapters appear due to read-through based on actual insert sizes.

Algorithm Details

Synthetic Contamination Strategy

In add mode, AddAdapters implements a controlled contamination algorithm:

Performance Evaluation Metrics

In grade mode, the tool calculates comprehensive trimming statistics:

Limitations and Considerations

Single-End Reads

Suitable for single-end read benchmarking where adapter contamination occurs through random fragmentation or read-through events.

Paired-End Limitations

For paired reads, AddAdapters has significant limitations:

RandomReads Advantages

RandomReads addresses these limitations by:

Memory and Performance

Output Interpretation

Add Mode Output

Contaminated reads with modified headers encoding the correct trimming position:

@150_85 original_read_header
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATGAATCTCGTATGCCGTCTTCTGCTTG...
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...

Header format: "original_length_correct_remaining_length"

Grade Mode Statistics

Total output:                        10000 reads                  1500000 bases          
Perfectly Correct (% of output):     8750 reads (87.500%)        1312500 bases (87.500%)
Incorrect (% of output):             1250 reads (12.500%)        187500 bases (12.500%)

Adapters Remaining (% of adapters):  125 reads (2.500%)          18750 bases (1.250%)
Non-Adapter Removed (% of valid):    50 reads (0.500%)           7500 bases (0.571%)

Best Practices