TestFileSystem
Logs filesystem performance by creating, deleting, and copying files. Benchmarks filesystem I/O using file copy operations, metadata operations (creating/reading/deleting 1000 small files), and directory listing tests in a continuous monitoring loop.
Basic Usage
testfilesystem.sh <in> <out> <log> <size> <ways> <interval in seconds>
All six parameters are positional arguments passed directly to the Java class. The tool requires existing test files or will create them if they don't exist.
Parameters
TestFilesystem uses positional arguments to configure the benchmarking process.
Positional Arguments
- <in>
- Input file template. Should contain the # symbol if ways > 1 to allow multiple parallel files (e.g., "testfile#.dat"). Default: "foo#.txt"
- <out>
- Output file for copy operations. Use "null" to disable output file creation. Default: "bar.txt"
- <log>
- Log file for performance metrics. Use "stdout" or "null" to output to console. Default: "log.txt"
- <size>
- Size in bytes of test files to create/copy. Large files test sustained I/O performance. Default: 5000000000 (5GB)
- <ways>
- Number of parallel test files to create. Higher values test concurrent I/O performance. Default: 40
- <interval in seconds>
- Time interval between benchmark cycles in seconds. Use 3600 for hourly monitoring. Default: 3600 seconds (1 hour)
Examples
Basic Filesystem Benchmark
testfilesystem.sh testfile#.dat copy.dat results.log 1000000000 10 60
Benchmarks filesystem with 10 parallel 1GB files, copying to copy.dat, logging results every minute to results.log
Hourly Monitoring
testfilesystem.sh data#.bin null stdout 5000000000 40 3600
Monitors filesystem performance with 40 parallel 5GB files every hour, outputting metrics to console
High-Frequency Testing
testfilesystem.sh benchmark#.tmp output.tmp perf.log 100000000 5 30
Tests with 5 parallel 100MB files every 30 seconds for high-frequency performance monitoring
Single File Mode
testfilesystem.sh singlefile.dat copy.dat metrics.txt 2000000000 1 300
Tests single-file performance with 2GB file, copying every 5 minutes
Algorithm Details
Benchmark Methodology
TestFilesystem implements three distinct performance tests executed in continuous cycles with specific measurements and data structures:
File Copy Performance
The core benchmark uses a multi-threaded producer-consumer architecture for file copying operations:
- Buffer Management: Uses 65KB read buffers with a queue-based recycling system for memory efficiency
- Threading Strategy: Separates read and write operations into different threads with ArrayBlockingQueue coordination
- Round-Robin Selection: Cycles through available input files using modulo arithmetic (iteration % ways)
- Throughput Measurement: Calculates MB/s based on file size and elapsed copy time
Metadata Operations Testing
Tests filesystem metadata performance with small file operations:
- File Creation: Creates 1000 small files in a "meta" subdirectory
- Read Operations: Performs single-byte reads from each created file
- Cleanup: Deletes all test files to measure deletion performance
- Operations Rate: Calculates operations per second for metadata-intensive workflows
Directory Listing Performance
Measures directory traversal and listing performance:
- Current Directory Scan: Uses File.listFiles() on the current working directory
- File Attribute Access: Tests canRead() method on each directory entry
- Timing Measurement: Records time required for complete directory traversal
Test File Management
Test file creation and validation with specific size checking:
- Size Validation: Checks existing files are within 5% of target size before reusing
- Random Data Generation: Uses DiskBench.writeRandomData() for realistic I/O patterns
- File Naming: Supports template-based naming with # placeholder for parallel files
- Automatic Cleanup: Recreates files that don't match size requirements
Performance Logging
Metrics logging with tab-delimited format and specific data fields:
- Timestamp: System time in milliseconds for precise timing correlation
- Copy Metrics: File size, copy time in milliseconds, and calculated throughput (MB/s)
- Metadata Metrics: Number of operations (3000: create + read + delete), time, ops/second
- Directory Listing Time: Time for ls operations in milliseconds
- Human-Readable Date: Formatted timestamp for log analysis
Memory and Resource Management
Resource utilization patterns for long-running benchmarks:
- Minimal Memory Usage: Default 50MB heap (-Xmx50m) for lightweight operation
- Buffer Recycling: ByteBuilder objects are reused between read/write cycles
- Queue-Based Coordination: ArrayBlockingQueue with capacity limits prevents memory bloat
- Thread Lifecycle Management: Proper thread termination and resource cleanup
Timing and Scheduling
Precise timing control for consistent benchmarking intervals:
- Interval Enforcement: Uses Object.wait() for precise timing between test cycles
- Minimum Wait Time: 10-millisecond minimum wait to prevent CPU spinning
- Continuous Operation: Infinite loop design for long-term monitoring
- Time Drift Correction: Calculates next execution time to maintain consistent intervals
Performance Characteristics
System Requirements
- Memory Usage: Minimal 50MB heap size for long-running operation
- Disk Space: Requires space for test files (default 5GB × ways parameter)
- I/O Capacity: Tests both sequential and random I/O patterns
- Thread Overhead: One additional thread per copy operation
Scalability
- Parallel Files: Supports 1 to many parallel test files via ways parameter
- File Size Range: Handles files from KB to multi-GB range
- Monitoring Duration: Designed for continuous long-term operation
- Log File Growth: Appends one line per interval, manageable growth rate
Use Cases
- Storage Validation: Test new storage systems before production deployment
- Performance Regression Testing: Monitor filesystem performance changes over time
- Capacity Planning: Determine I/O limits for bioinformatics workloads
- Troubleshooting: Identify storage bottlenecks in analysis pipelines
Technical Notes
Output Format
Log files contain tab-delimited data with the following columns:
#time size copyTime MB/s metaOps metaTime ops/s lsTime date
File Naming Convention
When using multiple parallel files (ways > 1), the input template must contain a '#' character that will be replaced with the file index (0 to ways-1).
Error Handling
The tool includes error handling for common filesystem issues:
- Automatic retry for interrupted I/O operations
- Graceful handling of missing directories
- Size validation and automatic file recreation
- Thread interruption recovery
Limitations
- Requires write permissions in the working directory
- Test files consume disk space during benchmark execution
- Long-running operation may be affected by system maintenance
- Results may vary based on system load and concurrent I/O
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org