# BBTools Changelog and TODO List

## V39

### 39.00
- Refactored Bloom Filter code and added increment-by-amount function
- Eliminated some old versions of Bloom Filters; renamed KmerCount7MTA class to ReadCounter
- Added kmer count output to bloomfilter.sh, and supported mincount=0
- Fixed a crash when trimming mapped sam files
- Filtersam no longer automatically trims qname and rname after the first whitespace

### 39.01
- Accelerated SortByName sequence mode using kmer prefixes
- Adjusted some SortByName memory parameters to be more conservative; fixed pairing-related assertions
- Added a Clumpify-like sorting mode
- Added a BloomFilter results parser for collating bulk output
- Read Streams can now report their input source

**TODO:** randomgenome percent repeat and repeat length

## V38

### 38.00
- Moved ByteBuilder to Structures
- Added some formatting and comments to SuperLongList
- JsonObject printing now has an inArray state that prevents newlines from arrays of JsonObjects
- Improved JsonParser handling of booleans
- Added a JsonParser validate command
- Wrote TaxClient for internally doing tax lookups from the TaxServer
- Added post mode to TaxClient and TaxServer, for URLs over 2000 characters
- Moved StringNum to Structures
- Accession loader now sorts files in ascending order of size and can load some before others
- Fixed a flaw in the hash function for accession numbers that may have allowed collisions
- TaxTree.parseNodeFromHeader will now try harder for headers with certain formatting
- Fixed potential overflows by changing Integer.MAX_VALUE to Shared.MAX_ARRAY_LEN
- SketchTool now has a custom, low-garbage loader instead of relying on ByteFile
- RQCFilter2 now uses half as many threads for pigz as logical cores
- Wrote BloomFilter and BloomFilterWrapper
- Added BloomFilter support into BBMap and RQCFilter
- Wrote a better available memory estimation function for BloomFilter
- Accelerated BloomFilter lookup when minConsecutiveMatches>1
- Fixed logging of BBSplit vs BBMap in RQCFilter2
- Bloom filter creation from BBMap index now uses multiple threads per chunk
- Fixed a null pointer in TextStringWriter
- Fixed a static variable (ef) persisting in RQCFilter, which slowed human removal

### 38.01
- Added support for lowercase letters in accessions
- gi2ncbi now supports streaming and some other options like shrinknames in server mode
- Sketch can now return json format from a curl call
- Sketch server no longer crashes from invalid symbols in sequence in local mode
- SketchMaker now has a local cache of SketchHeaps per thread in per-taxa mode, allowing a 6x speedup by reducing synchronization and rework
- RefSeq now uses a 250-species blacklist limit with sizemult=2 instead of 300
- Wrote MergeSorted and mergesorted.sh to resume SortByName runs that crashed or were killed during merging
- Removed DumpCount from SortByName and CrisContainer. It was too confusing. To shuffle large datasets, they can be merged round-robin
- Fixed an error message when autodetecting quality encoding
- Refseq sketch server is now double the normal resolution (sizemult=2)
- SendSketch defaults to sizemult=2 for RefSeq
- Sketch server startup script now sets sizemult=2 for refseq
- Added logscale peak calling
- Added peaks file GC annotation
- Fixed an array out of bounds in EntropyTracker
- CallVariants now ignores duplicates by default (0x400 bit)
- StatsWrapper will now append to the gc output if there are multiple assemblies
- Wrote AnalyzeAccession and analyzeaccession.sh to reduce the memory footprint of accessions in the tax server
- Added entropy filter flag to RQCFilter2
- BloomFilter can now act as a highpass filter

### 38.02
- BloomFilter can now do error correction, using the Tadpole algorithm
- Added merge and unmerge to Tadpole and BloomFilter for dramatic error correction improvements
- Improved BloomFilter error correction defaults and added smoothing
- Improved BloomFilter's memory management and added a memfraction flag
- Fixed tuc not working
- Tadpole.BloomFilter ECC_ROLLBACK will now roll back merges also (but not ecco currently)
- Wrote Rollback object to simplify rollbacks during error correction
- Spun BloomFilterCorrectorWrapper of from BloomFilterWrapper
- Spun bbcms.sh off of bloomfilter.sh
- Fixed a bug in msa.sh handling of reverse-complements
- Improved msa.sh to fully expand undefined bases, accept fasta files, and name the output such that it is clear whether an alignment was forward or reverse
- msa.sh now allows a cutoff for min identity
- Improved bbcms smoothing
- bbcms now allows a minimum fraction of kmers above a certain count to be specified
- bbcms now prints more statistics about the loaded bloom filter

### 38.03
- Fixed broken interleaving in bbcms output
- Added seed flag to bbcms and bloomfilter
- Added BBMerge vstrict and ustrict flags to bbcms
- Added mergeOK and testmerge flags to BBMerge
- Added BloomFilter support to BBMerge
- BBMerge now automatically writes both mergable and unmergable pairs to out if ecco=t and mix is unset
- testmerge flag now works with ecco
- Fixed indentation for Tadpole/bbcms results

### 38.04
- bbcms and bloom filter now allow random seeds
- Changed version printing to not repeat arguments
- Eliminated redundant copies of mergeOK functions
- Fixed bbcms testmerge flag
- Fixed trim/qtrim flag in BBSplit help
- Added relative error threshold for mergeOK. TODO: Does not seem to help in my test; try on single cell data
- Added variable smooth width to bbcms
- Changed bbcms default bits to 4 after testing
- Fixed bbcms extra flag

### 38.05
- Fixed interleaving detection in SortByName
- Changed interleaving detection in FileFormat to autodetect more aggressively
- Fixed a bug with RQCFilter2 interleaving settings carrying over from BBMerge to FilterByTaxa

### 38.06
- Changed KmerArray to collide all possible kmer extensions into the same cell
- Wrote FillFast to grab all 4 possible kmer extensions with a single modulo operation
- Simplified some of BBDuk pair-tracking and discarding logic
- Added trimfailures bbduk flag
- Fixed a division by zero bug in SortByName.mergeRecursive
- Fixed an array-out-of-bounds in CallPeaks
- Made dual-kmer ANI estimation from Sketch more accurate
- Added loglog support to BBMerge and Seal
- Added loglogout support to BBMerge, BBDuk, and Seal
- RQCFilter2 status.log now tracks kmers
- Removed RQCFilter and pointed rqcfilter.sh to rqcfilter2.sh

### 38.07
- Changed KmerTable increment functions to require an incr value
- Added sortbuffer flag to Tadpole, but speed was barely improved on high-depth Clumpified data
- Migrated coremask and fillfast to tadpole2, but they make it slower for some reason
- Migrated shave and rinse improvements to Tadpole2; these can make those steps dramatically faster in metagenomes
- Added BloomFilter serialization
- Increased default k and minhits of Bloom filter in RQCFilter2 and added serialized filters
- Reduced RandomReads default quality
- Made gaussian insert size distribution default for RandomReads
- Wrote FastaShredInputStream for faster Bloom filter loading with lower memory consumption
- Fixed number of threads allocated to Bloom filter loading from index

### 38.08
- FilterByTaxa and RQCFilter no longer crash if a header cannot be parsed and the accession tables are not loaded

### 38.09
- bbcms default bits changed from 1 to 2
- Improved bbcms tossjunk function
- Added documentation to bbcms and Tadpole
- Added fixextensions flag, and enabled it for CallVariants, BBDuk, Reformat, RQCFilter, BBNorm, BBMerge, BBMap, Tadpole, and bbcms
- RQCFilter now extends reads prior to merging if there is enough memory. This means the insert size histogram will take longer, but allow non-overlapping inserts
- BBMap now tracks statistics correctly when Bloom filter is enabled
- Fixed Children flag in TaxServer
- Shave and rinse no longer checks owner for initial high kmers
- Shave and rinse now ignores initial high kmers above the isJunction trigger for extension in some cases, for a large speedup in isolates (uses shaveFast flag)
- Changed RandomReads default insert size distribution to more closely match JGI fragment library targets
- Multithreaded KmerCountArray/KmerCountArrayU ownership array allocation via OwnershipThread for a large speed increase in assembly
- Added 2passresize flag to Tadpole but it didn't seem to speed things up
- Added Constellation-like output option for CompareSketch
- Major changes to Kmer table sizing - a premade resize schedule is now used. Only for Kmer so far not UKmer

### 38.10
- Merged dev python changes

### 38.11
- Ported schedule to UKmer
- Fixed a bytesPerKmer bug in KmerCountExact for k>31
- Accelerated kmer lookups for k>31
- Condensed code for shave/rinse, but no speed increase
- Changed default exploredist from 100 to 300

### 38.12
- Stats now omits the first size bracket if it is less than minscaf
- Fixed problems with extended stats in format 4-6
- Fixed a bug in reporting amount of spikin removed in RQCFilter
- Multithreaded kmer frequency histogram generation using kmer and ukmer packages
- mutate.sh now outputs vcf files
- Fixed processing of sam files with M, =, and X in cigar string
- Fixed a bloom filter BBMap bug in counting reads
- Updated some pipelines shell scripts
- Started writing a new KCountArray class, but abandoned it as the current one looks as efficient as possible

### 38.13
- Fixed a casting exception in Shared.sort
- Fixed missing column from mutate.sh vcf output
- Addslash for RandomReads now works with the illuminanames flag
- Fixed mutate.sh VCF files
- Wrote Contig and Edge classes
- Wrote ContigLengthComparator
- Transitioned Tadpole from building Reads to building Contigs
- Wrote ProcessContigThread
- Tadpole now writes additional information about contig ends to headers
- Tadpole now strictly uses F_BRANCH and B_BRANCH instead of just BRANCH (TODO: D_BRANCH)
- Tadpole output should now have canonical orientation, order, and names (apart from circular contigs)
- Tadpole1 now has a preliminary contig graph processing phase (in progress)
- Tadpole now supports preliminary dot output (not yet correct)
- Added appendln to some ByteBuilder methods
- Added print(Contig) to bsw

### 38.14-38.15
- Integrated dev Python changes; merging Git branches

### 38.16
- Ported Tadpole1 ProcessContigThread to Tadpole2
- Added perfile flag to CompareSketch, which allows multithreaded loading
- Added prealloc flag to CompareSketch
- Revised TaxServer to use Sketch index, and typically run 1 thread per sketch
- Added outsketch flag to CompareSketch
- Modified RandomGenome to be faster and more flexible, and added a shell script

### 38.17
- Added Sketch minLevelExtended flag
- Fixed bbcms loglog using quality scores from the wrong read
- Wrote MergeSketch and mergesketch.sh
- Fixed a major bug in TaxTree.getNodeAtLevel and restarted all servers
- Wrote KmerLimit and kmerlimit.sh
- Wrote Shuffle2 and shuffle2.sh
- Changed blacklist_nt_species_1000.sketch to blacklist_nt_species_500.sketch

### 38.18
- Modified RQCFilter and BBMap to correctly track and report unmapped reads and bases when using the Bloom filter
- Wrote RQCFilterStats for tracking relevant RQCFilter stats. This is printed to filterStats2.txt
- Added some columns to BBMap scafstats/refstats where a read is assigned to at most a single reference
- All classes that used ThreadLocalRandom now call Shared.threadLocalRandom() to comply with Java 6
- Wrote KmerLimit and kmerlimit.sh to restrict a randomly-ordered file to a specific number of unique kmers
- Wrote KmerLimit2 and kmerlimit2.sh to restrict an arbitrarily-ordered file to a specific number of unique kmers via subsampling
- Updated /pipelines/ scripts for fetching and sketching

### 38.19
- Updated RQCFilterData tar
- Updated wrapper shellscripts to handle Cori error messages
- Fixed a bug in tracking duplicate reads in RQCFilter

### 38.20
- Added logsum and powsum to stats.sh gc output format 5
- Fixed a bug in tracking reads in RQCFilter
- Fixed a basic to extended taxonomy translation routine in TaxTree
- Added JSON (format 8) to stats.sh
- Fixed(?) BBMap tracking of trimm/untrimmed bases for mapped and unmapped reads
- Fixed bugs in RQCFilter tracking of trim/untrimmed mapped bases

### 38.21
- Wrote JsonLiteral and modified Stats to not put quotes around formatted floats
- Added support for accession, gi, and header lookups to RenameGiToNcbi
- --help or --version now exit with status 0 rather than 1
- Updated some documentation
- Added BBDuk trimpolyg flag
- FlowCell MicroTiles now track more data and have more methods
- Wrote PlotFlowCell and plotflowcell.sh, to look at the distribution of polyG in NovaSeq runs
- Fixed a broken if-else in AccessionToTaxId that was causing TaxServer to start with prealloc false
- Fixed a bug in verifying other mapped stats in RQCFilter2

### 38.22
- Added getters for sketch.Comparison and sketch.CompareBuffer, and made fields private
- Fixed bug causing Sketch unique count to display incorrectly - bitsetbits had been changed from 2 to 1. It should be 2; made static final
- Fixed an array size bug in Tadpole caused by increasing the range of termination codes
- Fixed a problem of Kmers being appended to ByteBuilders reverse-complemented. This impacted Shaver2
- Fixed a static variable (MASK_CORE) hangover from Tadpole1 into Tadpole2 with TadWrapper
- Added more BBDuk polyG options
- Added polyG options and tracking to RQCFilter
- Fixed an incident where a new KmerComparator was created unnecessarily
- Clumpify now correctly counts the number of reads when a temp file is streamed without being clumped

### 38.23
- Wrote hiseq.CycleTracker
- Fixed a parse error in AnalyzeFlowCell
- Added preliminary G-bubble-detection and elimination to AnalyzeFlowCell, but it is not clear if it is working correctly
- Wrote hiseq.IlluminaHeaderParser
- Revised A_Sample, A_SampleMT, and A_SampleByteFile with additional submethods to reduce the length of long methods
- Removed JNI path flag from BBMerge, BBMap, and RQCFilter shell scripts
- Fixed a bug in reading adaptersOut.fa from RQCFilter2
- Changed the way path is appended to output files in RQCFilter2
- Added poly-C flags to BBDuk
- Wrote PolymerTracker
- Added polymer count tracking to BBDuk and RQCFilter
- Added clipfilter to Reformat

### 38.24
- *Skipped this version*

### 38.25
- Added maxcov flag to Tadpole
- Seal now supports filenames without the ref= flag to allow wildcard expansion
- Removed calcmem.sh perl dependency on Genepool, since Genepool is gone
- Fixed a logging bug in RQCFilter
- Added optical alias to RQCFilter
- Modified mergesorted.sh
- SortByName and MergeSorted buffer-resizing logic made safer
- Fixed leftRatio calculation in Tadpole for printing in contig headers
- Fixed an unwanted print statement in Tadpole dot generation
- Fixed a crash in Clumpify when handling Ns
- BBMap bloomserial now defaults to true
- Deleted normandcorrectwrapper.sh
- Updated removehuman, removehuman2, etc. to use Bloom filters and clarified that the scripts are for NERSC
- Wrote PercentEncoding for translating URLs, and made it more efficient by removing String functions

### 38.26
- Improved Blacklist name translation
- Data internmap is now faster and takes less memory
- Made prok package for prok gene-calling
- Moved LOGICAL_PROCESSORS to Shared to avoid an initialization order problem
- Fixed a bug in FastaReadInputStream with buffer resizing logic
- Disabled some assertions in BBIndex that do not appear to be valid with a long maxindel and many short contigs
- Added nl() and tab() to ByteBuilder
- Reduced memory prealloc request for kmer tables on high memory (>120G) nodes
- Fixed CallVariants reporting of deletion count
- Clarified CallVariants SamStreamer flag, and capped it at Shared.threads()
- Clarified callvariants2.sh purpose and function
- Wrote AnalyzeGenes, CallGenes, and CompareGff
- Added amino acid output to CallGenes

### 38.27
- Bugfixes and improvements to gene calling
- Began adding RNA models to gene calling
- Refactored gene-caller to allow more flexibility with models; pgm format changed
- Adjusted default gene model

### 38.28
- Multithreaded AnalyzeGenes
- Wrote FloatList
- Fixed a bug in Tools.reverseInPlace for partial arrays
- Added trimcircular flag to Tadpole to trim ends of loop-loop contigs, which are presumably circular
- Finished tRNA and rRNA models and calling functions
- Fixed a bug in 3-column Sketch colors

### 38.29
- Calibration of gene models
- Fixed a bug with chloroOutFile/fbtOutFile name in RQCFilter2
- Sketch now allows integrated gene-calling for nucleotide to protein translation
- Added minsize and maxsize to RepresentativeSet

### 38.30
- More calibration of gene models
- Fixed some misassumptions in percent encoding
- Modified GatherKapaStats to output raw data
- Generated a minimal representation of RefSeq Microbial... achieved 80% size reduction
- Changed the way pileup calculates coverage from soft-clipped bases; they are now ignored
- Changed the way samtools/sambamba exclusion flags are processed to be more flexible and faster
- Pileup now uses samtools to parse the header and sambamba to parse the reads, since sambamba is slow at reading headers
- Added key=value pair output to pileup
- Wrote ScoreTracker to track scores of accepted and rejected ORFs when calling genes

### 38.31
- Added long kmer support to RNA calling in CallGenes
- Added BBMerge flags maxmismatches and forcemerge
- Added Tadpole flag filtermem

### 38.32
- Tadpole now refuses to run with no input files
- BBMerge now supports filtermemory flag
- Wrote KmerFilterSetMaker and kmerfilterset.sh to generate small covering sets of kmers for use with BBDuk
- Added silent flags to suppress screen messages from BBDuk, Reformat, and KmerTableSet-related classes
- Added reformat padding flags

### 38.33
- Shred now validates input files
- Reformat now has options for padding sequences
- ****KmerFilterSet now accepts an initial kmer set
- Wrote IntList3
- Wrote HashArrayHybridFast
- Changed HashArray bulk add contract
- Back-ported HashArrayHybridFast changes to KmerNode2D
- Seal now uses HashArrayHybrid; indexing Silva became >100x faster
- Sketch now uses HashArrayHybrid; indexing speed increased somewhat
- Added amino support to BBDuk
- Added amino support to KmerCountExact
- Added amino support to EntropyTracker
- Modified entropy defaults for amino acid mode with Sketch(?) and BBDuk(?)
- Fixed tracking of PercentOfPairs for insert size statistics
- CompareSketch now automatically sets the protein, fungi, or mito path on NERSC
- Mutate.sh now works on amino acid sequences
- Validated CompareSketch on raw reads in protein space; it works amazingly well

### 38.34
- Wrote MetagenomeDataWriter to produce some stats for Brian Foster
- Modified PreParser and Shared to deal with determining the original command line
- TODO: (Brian Foster) report base and read counts exactly, not rounded to the nearest million
- Refactored and commented IntList classes
- Added merge and ecco to CallGenes
- Wrote MetadataWriter to allow unified reads in and out nomenclature for certain programs
- Removed a constructor from PreParser
- Fixed MetadataWriter for AssemblyStats
- Added support for protein Sketch server
- Fixed some printing errors in CallGenes
- Added recode and retranslate to CallGenes
- Increased SendSketch default sizemult for RefSeq and proteins to 2.2

### 38.35
- Added sketchonly flag to CompareSketch, allowing it to just sketch and write files but not actually run comparisons
- Protein sketch server is now active
- Added TaxTree.descendsFrom(child, parent)
- TaxTree now classifies species-attached no-rank archaeal nodes as strains in addition to bacteria
- pigz --version is now recorded to determine whether -11 and -I flags are supported
- Added sketch sixframes flag, for dealing with indels. This works surprisingly well but bloats the genome size. Probably the size should be divided by 6
- Added prokprot sketch to RQCFilter
- Sketch now ignores AA kmers spanning stop codons in sixframes mode
- Fixed a flaw in rkmer generation following Ns, in many classes
- Added Sketch toValue2 function to process dual kmers in an unbiased manner. This yields more accurate ANI
- Added comparison logic for tracking k1 and k2 matches independently
- toValue2 now handles aminos as well
- Changed default kmer lengths from 31,0 to 32,23, and 10,7 to 11,7
- Simplified some parts of Sketch, like removing aniFromWkid flag
- Changed an assertion in TaxTree to a warning, because the latest version of NCBI taxdump contains errors
- Validation of K and hash version between sketches is now more robust
- Fixed all instances of kmer bitmasks to work correctly with k=32; prior limit was k=31
- Added 1-bit antialiasing to Sketch hashcodes
- Bumped hash version to 2
- Increased amino default kmer length to 12,8 to increase specificity
- Fixed an assertion failure in comparesketch perfile mode
- Increased size of prokprot blacklist
- Added Sketch refhits flag, to indicate the number of references sharing kmers with keys hitting a reference
- Remade prokprot blacklists at a higher taxonomic level to deal with high conservation
- Fixed an assertion with regards to sketchonly mode in comparesketch
- avgrefhits is now weakly factored into score
- Modified some rqcfilter2 sketch flags such as minprob

### 38.36
- Increased Sketch minprob to 0.0008. Q7 (80% accurate) areas will be used but Q6 (75%) will be ignored; before it was 0.0001 (Q6.1). This slightly increases accuracy with raw reads
- Trimrname now works on sam headers
- Trimrname is now automatically set to the same as trd unless explicitly overridden with the trimrname flag
- Added small RNA adapters to adapters.fa (thanks to Daniel N.)
- Sketch now reports the number of unique kmers indexed
- BBTools can now read embl and gbk formats
- Added support for subcohort taxonomic level

### 38.37
- Fixed a bug in BBDuk JSON readsOut reporting
- BBSketch format 3 now prints taxID
- Fixed broken qin flag (was being overridden by autodetection)
- Improved quality autodetection for out-of-range quality scores
- FastqReadInputStream now correctly inherits interleaving from FileFormat rather than running internal tests
- Added JsonParser.parseJsonObjectStatic
- Added Blacklist.toBlacklist
- Added SendSketch.toAddress, .setFromAddress, and .sendSketch (static)
- Simplified SendSketch parsing
- TestFormat now automatically tries to detect organism with SendSketch
- ReadStats bhist is now faster by formatting with ByteBuilder
- Added TestFormat bhistlen flag to disable gigantic bhists

### 38.38
- Fixed a parsing error in SendSketch
- Wrote docs/RestartingServers.txt
- Fixed CallGenes load failure with under 9 threads
- Added a 100k limit to SendSketch queries per instance, and added reference tars to the website
- Increased buffer sizes of SendSketch
- Reduced number of threads per session for Sketch servers
- Added trackers for number of Sketches processed, bytes received, and bytes sent to Sketch server

### 38.39
- Fixed a bug in phist (required polysymbol to be set)
- Fixed a bug in BBDuk amino mode (failure to support k=12)
- Fixed a bug in bhist (no newlines!)
- Sketch and Tax servers now tracks single versus bulk queries
- Converted several ReadStats histograms from TextStreamWriter to ByteStreamWriter

### 38.40
- Replaced some obsolete StringBuilder methods (mainly for read printing) with ByteBuilder
- Deleted obsolete classes ReadStreamStringWriter and SortByMapping
- Replaced many instances of StringBuilder with ByteBuilder
- Moved some fields from Gene to Shared
- Made Header class
- Fixed a float-to-int rounding-down problem making BBMerge not strictly obey the maxmismatches flag
- Redid RandomReads naming format to be pair-capable in sam format
- Converted all known header-parsing functions to use the new format
- Wrote SuperLongList.toString
- Added Reformat prioritizelength flag for subsampling variable-length reads
- Fixed trailing whitespace in bhist

### 38.41
- Fixed a compile error

### 38.42
- Wrote SubSketch and subsketch.sh to pull partial sketches out of larger sketches (e.g. to shrink RefSeq)
- Added stats handler to TaxServer, with version and quantity tracking
- Added bbversion field to sendsketch header
- Fixed SendSketch address parsing
- Added p and q suffixes to parseKMG
- Added PacBio read length modelling to RandomReads
- Fixed a CallVariants assertion with SamLine.RNAME_AS_BYTES
- Fixed major bug in vcf line reading, misinterpreting variant types, preventing BBDuk from parsing vcf properly
- Wrote SamStreamerMF, a multifile SamStreamer
- Integrated SamStreamerMF into CallVariants. Now, with 8 sam.gz files, CallVariants is about 5x as fast on a 32-core node
- Fixed CallVariants vcf output MCOV reporting -1 when out= is set instead of vcf=
- Fixed ihist not working in BBDuk

### 38.43
- Wrote var2.VarKey for hashing. May not use it
- Added indel processing to fixVars, and Read.containsVars()
- Fixed bugs in reading insertions from VCF files
- TaxServer usage no longer displays stats (stats are on the /stats page)
- Added ref flag to CompareVCF
- Added shist to FilterVCF (for vars passing filter)
- FilterVCF no longer requires a reference (in most cases) if the VCF has a correct header
- CallVariants modified to reduce negative impact of strand bias and read bias on score, in cases that otherwise appear fine
- Demuxbyname can now do 1 file per sequence header, but it does not close the streams as soon as a sequence is written. This would be better as a custom program
- Removed a mysterious automatic newline from Read.toSam(bb)
- Wrote CoverageArray3A, Atomic version
- Added atomic flag to CallVariants, which increases speed by up to 300 percent
- Increased speed of multithreaded coverage calculation even without atomic flag
- Fixed stranded coverage default to false
- Added CoverageArray.incrementRangeSynchronized
- CallVariants trackstrand now correctly defaults to false, which disables the DP4 field
- CalcTrueQuality should now ignore indels declared in a VCF

### 38.44
- Fixed a bug in Tools.parseKMG
- Added qualhist to CallVariants
- Added code in CallVariants to deal with recalibrated base quality
- CallVariants no longer needs ref= prefix before fasta reference
- FilterVCF can now split alleles
- Modified mutate.sh to allow variable-length indels, and not put them too close together (to allow better grading)
- **Major:** Fixed BBDuk/Seal/Clumpify issue in failure to correctly reverse-complement some kmers

### 38.45
- Last restarted timestamp fixed for TaxServer stats page
- Clarified randomreads.sh description of generating twin files versus interleaved
- Added Read.countVars, CallVariants.findUniqueVars
- Added support for indels and border to FilterSam
- CallVariants can now force calls of specific alleles with an input vcf
- VarMap is now iterable over values
- Modified ShrinkAccession to optionally retain GI numbers
- Fixed VCF genotype call of 1 for haploids failing filters
- Updated GiToNcbi to read gi numbers from accession files since gi files will disappear soon
- Clarified bbduk.sh comment on maxlength
- Added unzip.sh script
- Split Sketch displayfname into rfname and qfname
- Fixed file column being enabled by default for sendsketch
- Changed VarMap WAYS to 8, allowing 16 billion variants
- Short match strings no longer generate consecutive symbols like mm because it is hard to parse
- MSA.score() now accepts short or long match strings
- CallVariants no longer generates long match strings prior to trimming, for perfect matches; 5-10% faster
- FilterVCF can now split long substitutions into SNPs with the splitsubs flag
- Fixed CalcTrueQuality ploidy unset warning
- Add ls to testfilesystem. May be inaccurate due to cache effects
- Added amino acid codes B and Z, mapped to ANY (same as X)
- CallVariants now integrated into FilterSam
- BBCMS now supports sam files, if error-correction is disabled (depth filtering is allowed)
- Added some columns to CallVariants screen output for average allele depth
- Added taxonomic levels series and section
- Added RenameGiToTaxid badheaders flag for logging
- Added RenameGiToTaxid maxbadheaders flag for early termination when exceeded, and included it in the download scripts (at 5000 since recent nt contains 2440 headers with no TaxID)
- Removed sharedVarMap from CallVariants2; replace with forcedVars1 or forcedVars2 for the two passes
- FungalRelease agp generation now uses ByteStreamWriter over tsw and Read.breakAtGaps uses ByteBuilder over sb to save memory
- Fully commented MSA11ts fullUnlimited

### 38.46
- Added Unzip.java and fixed unzip.sh. It is pretty resource-intensive, though, for a program that does nothing. This is possible to improve
- Added KID and WKID to Sketch format 3, and flags to disable them
- CompareVCF now prints results to screen correctly when there is no output file
- TaxServer now defaults to 200k max reads in local mode
- In local mode, TaxServer no longer reads files with pigz
- FilterVCF now correctly observes del and ins flags
- Added Var.COMPOUND type for multiallelic variations
- Added VCFLine.trimPrefix() and trimSuffix()
- Fixed bugs in trimToCanonical handling of compound variations
- VCFLines split by allele now split INFO fields as well
- Wrote demuxbyname2, to support massively multiplexed Novaseq runs
- Splitting alleles now also splits the info field of VCFLines

### 38.47
- Added demuxbyname2 hamming distance support
- Renamed Var.COMPOUND to Var.MULTI and added Var.COMPLEX
- Modified demuxbyname2.sh to use pigz
- Increased compiler error level (@Override, shadowing) and fixed resulting errors
- Wrote MultiCros3, which supports concurrent streams; this makes DemuxByName2 faster
- Made BufferedMultiCross an abstract superclass of MultiCros2 and MultiCros3

### 38.48
- Added samline field to Read. obj field is no longer used for SamLines. Caused substantial refactoring; may have introduced bugs when processing sam files (they will not be subtle if present)
- BufferedMultiCross now offers a threaded mode, but this has not improved performance
- BufferedMultiCross now supports minReadsToDump and puts residual reads into unknown
- Fixed DemuxByName2 hamming distance code, and improved it to only remove colliding keys

### 38.49
- Fully commented DemuxByName2, BufferedMultiCros, MultiCros2, and MultiCros3
- Fixed a bug in MultiCros3 that created some duplicate reads. Speed is now >950MB/s for twin files

### 38.50
- Added bgzip control flags and version parsing
- .vcf.gz files now default to being written and read by bgzip
- All gzip files now default to being read with bgzip over pigz
- Non-vcf files will only be written with bgzip if the bgzip flag is added (for now)
- Added alternate Sketch addresses via vm flag
- minProb and minQual moved from SketchObject to DisplayParams, requiring the modification of many methods
- Simplified some Sketch method signatures by allowing DisplayParams to substitute for multiple parameters
- Added Locale to all String formatting without it
- Refactored DemuxByName2
- Improved commenting of DemuxByName2 and related classes
- Added PacBio subread support to PartionReads (partition.sh)
- Disabled ByteFile1 being forced outside of JGI. ByteFile2 caused some problems, but those should be resolved now, I think...
- Added loglog and barcode flags to DemuxByName2
- Fixed order of SendSketch setting server address to allow alternate (VM) server use
- Fixed DemuxByName2 order of parsing parser args, allowing the barcode flag to trigger
- Unified DemuxByName2 modes under a single mode field
- Fixed maxrecords not being observed in Sketch JSON format
- TaxServer sketch handler now does full parsing of URL arguments
- Added D3 support to Sketch results

### 38.51
- Changed handling of same-name JSON keys; by default they are now replaced
- Improved Sketch D3 output - added more keys, fixed depth handling
- Subprocess testing now returns false for exit codes 126 and higher (missing libraries yield 127)
- Turned bgzip and pigz on by default for all programs
- Made bgzip the default for RQCFilter
- Modified TaxServer sketch portion to prevent carryover of parameters from subsequent queries
- Fixed Sketch header reporting observed depth as actual depth
- Wrote IceCreamFinder, IceCreamAligner, and icecreamfinder.sh
- Wrote A_Sample_Generator, IceCreamMaker, and icecreammaker.sh
- Moved A_Sample classes to new templates package
- Changed some new Random() calls to Shared.threadLocalRandom()
- Added jsonarrays flag to Sketch
- Wrote IceCreamGrader and icecreamgrader.sh
- Renamed demuxbyname2.sh to demuxbyname.sh

### 38.52
- Made IceCreamFinder ~50% faster by debranching loops and optimizing cache footprint
- Added IceCreamFinder junction output
- Simplified shell scripts by centralizing path-setting commands
- Moved JNI library loading to Shared
- Wrote IceCreamAligner JNI version

### 38.53
- Made IceCreamAligner JNI faster by adding functions for all alignments and adding 16-bit versions
- Fixed bugs in calcmem.sh path setting and module loading on Cori
- Automated jni library path setting (-Djava.library.path flag is no longer required)
- Disabled BBMerge attempt to load JNI libraries
- Added magic number detection for .gz files
- Disabled bgzip reading of non-bgzip .gz files, awaiting new bgzip release, because current bgzip breaks on concatenated gzip files (supposedly addressed after v1.9)

### 38.54
- Changed a method to avoid a Java 11 dependency
- Added ZMW stats to IceCreamFinder
- Added preliminary adapter detection to IceCreamFinder

### 38.55
- Fixed a JNI bug in RQCFilter with BBMerge

### 38.56
- Improved and accelerated IceCreamFinder adapter detection
- Reduced discarding of reads with adapters only at the tips

### 38.57
- Greatly improved IceCreamFinder adapter detection sensitivity by aligning to more reads
- Increased speed of adapter aligner
- Added less-specific adapter-screening phases to reduce calls to the adapter aligner
- Added ambig output stream and changed the logic for determining ambiguous inverted repeats
- Adapter-containing inverted repeats no longer go to junctions output
- Improved timeless adapter aligner and made it default
- Added start location to low bits of timeless aligner score, but it does not seem to work

### 38.58
- Fixed PreParser failure when encountering a standalone equals sign
- Fixed a bug in automatically setting Sketch blacklists for known databases
- Updated server-starting shellscripts to point to the new URLs
- Renamed Missing Adapter as Absent Adapter
- Changed ambiguity logic to better classify reads when there are 2 passes
- Adapter alignment is slightly more lenient when an inverted repeat is detected
- Slightly accelerated adapter detection by changing conditionals to array lookups in the inner loop
- SendSketch can now load TaxTree
- Increased Sketch number of comparisons returned, to compensate for potential losses during TaxFilter

### 38.59
- Added json output and stats redirection to IceCreamFinder
- Added preliminary SamStreamer support to IceCreamFinder
- SamStreamer now supports a limited number of reads

*[Changelog continues with detailed version history...]*