TextFile
Displays contents of a text file. Start line and stop line are zero-based. Start is inclusive, stop is exclusive.
Basic Usage
textfile.sh <file> <start line> <stop line>
This tool reads and displays specific line ranges from text files. The line numbering is zero-based, where the start line is inclusive and the stop line is exclusive.
Parameters
TextFile uses positional arguments rather than named parameters:
Positional Arguments
- <file>
- Input text file to read. Use "stdin" to read from standard input.
- <start line>
- Zero-based line number to start reading from (inclusive). Optional - defaults to 0 if not specified.
- <stop line>
- Zero-based line number to stop reading at (exclusive). Optional - defaults to start + 100 lines if not specified.
Special Modes
- speedtest
- When used as the second argument, runs in speed test mode to benchmark file reading performance without printing content.
Examples
Display First 100 Lines
textfile.sh myfile.txt
Displays lines 0-99 (first 100 lines) from myfile.txt.
Display Specific Line Range
textfile.sh myfile.txt 50 75
Displays lines 50-74 (25 lines total) from myfile.txt.
Read from Standard Input
cat largefile.txt | textfile.sh stdin 1000 1010
Reads from stdin and displays lines 1000-1009.
Speed Test Mode
textfile.sh largefile.txt speedtest
Benchmarks reading performance of the entire file without displaying content. Reports processing statistics including lines processed, bytes read, and throughput.
Single Line Display
textfile.sh config.txt 10 11
Displays only line 10 (zero-based) from config.txt.
Algorithm Details
File Reading Strategy
TextFile implements buffered reading using Java's BufferedReader with specific buffer configuration:
- Buffered I/O: Uses BufferedReader with 32,768-byte buffer (BufferedReader(isr, 32768))
- Line Skipping: Skips to start line using nextLine() calls in loop without storing skipped content
- Memory Efficiency: Processes files line-by-line without loading entire file into memory
- Stream Support: Can read from files, stdin, or any InputStream
Performance Characteristics
The tool handles large text files using specific memory management techniques:
- Memory Usage: Fixed 120MB maximum heap size (-Xmx120m) regardless of input file size
- Scalability: Can handle files larger than available RAM
- Speed Test Mode: Uses Timer class to measure throughput via Tools.timeLinesBytesProcessed()
- Blank Line Handling: Optionally skips blank lines during processing
Line Indexing Convention
TextFile uses zero-based line indexing with inclusive start and exclusive stop boundaries:
- Zero-based: First line of file is line 0, not line 1
- Inclusive Start: Start line number is included in output
- Exclusive Stop: Stop line number is NOT included in output
- Range Calculation: Number of lines displayed = stop - start
Error Handling
The implementation includes specific error handling mechanisms:
- File Validation: Uses file.exists() method and checks for "stdin" or "jar:" prefixes
- Stream Management: Closes BufferedReader, InputStreamReader, and InputStream via ReadWrite.finishReading()
- Exception Recovery: Catches exceptions in readLine() and reports file path, line number, and file length
- Subprocess Support: Uses ReadWrite.getInputStream() with allowSubprocess flag for decompression
Utility Methods
The TextFile class provides additional utility methods for specialized text processing:
- toStringLines(): Loads entire file into String array for in-memory processing
- countLines(): Counts total lines using nextLine() loop, then calls reset()
- doublesplitTab(): Parses tab-delimited data into 2D arrays
- doublesplitWhitespace(): Parses whitespace-delimited data
Technical Implementation
Core Reading Loop
The main reading algorithm follows this pattern:
- Stream Initialization: Opens BufferedReader with 32,768-byte buffer via open() method
- Line Skipping: Advances to start line using nextLine() calls
- Content Reading: Reads and outputs lines from start to stop
- Statistics Tracking: Counts lines and bytes processed
- Resource Cleanup: Closes streams and releases resources
Speed Test Implementation
Speed test mode measures raw I/O performance by:
- Reading entire file using nextLine() loop from first to Long.MAX_VALUE without System.out.println()
- Tracking total lines processed and bytes read
- Measuring elapsed time using shared.Timer class with start/stop methods
- Calculating throughput using Tools.timeLinesBytesProcessed(timer, lines, bytes, 8)
Input Source Flexibility
TextFile supports multiple input sources using ReadWrite.getInputStream():
- Regular Files: Direct file system access
- Compressed Files: Automatic decompression via subprocess
- Standard Input: Stream processing from pipes
- JAR Resources: Files embedded in JAR archives
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org