TestFileSystem

Basic Usage

testfilesystem.sh <in> <out> <log> <size> <ways> <interval in seconds>

All six parameters are positional arguments passed directly to the Java class. The tool requires existing test files or will create them if they don't exist.

Parameters

TestFilesystem uses positional arguments to configure the benchmarking process.

Positional Arguments

<in>: Input file template. Should contain the # symbol if ways > 1 to allow multiple parallel files (e.g., "testfile#.dat"). Default: "foo#.txt"
<out>: Output file for copy operations. Use "null" to disable output file creation. Default: "bar.txt"
<log>: Log file for performance metrics. Use "stdout" or "null" to output to console. Default: "log.txt"
<size>: Size in bytes of test files to create/copy. Large files test sustained I/O performance. Default: 5000000000 (5GB)
<ways>: Number of parallel test files to create. Higher values test concurrent I/O performance. Default: 40
<interval in seconds>: Time interval between benchmark cycles in seconds. Use 3600 for hourly monitoring. Default: 3600 seconds (1 hour)

Examples

Basic Filesystem Benchmark

testfilesystem.sh testfile#.dat copy.dat results.log 1000000000 10 60

Benchmarks filesystem with 10 parallel 1GB files, copying to copy.dat, logging results every minute to results.log

Hourly Monitoring

testfilesystem.sh data#.bin null stdout 5000000000 40 3600

Monitors filesystem performance with 40 parallel 5GB files every hour, outputting metrics to console

High-Frequency Testing

testfilesystem.sh benchmark#.tmp output.tmp perf.log 100000000 5 30

Tests with 5 parallel 100MB files every 30 seconds for high-frequency performance monitoring

Single File Mode

testfilesystem.sh singlefile.dat copy.dat metrics.txt 2000000000 1 300

Tests single-file performance with 2GB file, copying every 5 minutes

Algorithm Details

Benchmark Methodology

TestFilesystem implements three distinct performance tests executed in continuous cycles with specific measurements and data structures:

File Copy Performance

The core benchmark uses a multi-threaded producer-consumer architecture for file copying operations:

Buffer Management: Uses 65KB read buffers with a queue-based recycling system for memory efficiency
Threading Strategy: Separates read and write operations into different threads with ArrayBlockingQueue coordination
Round-Robin Selection: Cycles through available input files using modulo arithmetic (iteration % ways)
Throughput Measurement: Calculates MB/s based on file size and elapsed copy time

Metadata Operations Testing

Tests filesystem metadata performance with small file operations:

File Creation: Creates 1000 small files in a "meta" subdirectory
Read Operations: Performs single-byte reads from each created file
Cleanup: Deletes all test files to measure deletion performance
Operations Rate: Calculates operations per second for metadata-intensive workflows

Directory Listing Performance

Measures directory traversal and listing performance:

Current Directory Scan: Uses File.listFiles() on the current working directory
File Attribute Access: Tests canRead() method on each directory entry
Timing Measurement: Records time required for complete directory traversal

Test File Management

Test file creation and validation with specific size checking:

Size Validation: Checks existing files are within 5% of target size before reusing
Random Data Generation: Uses DiskBench.writeRandomData() for realistic I/O patterns
File Naming: Supports template-based naming with # placeholder for parallel files
Automatic Cleanup: Recreates files that don't match size requirements

Performance Logging

Metrics logging with tab-delimited format and specific data fields:

Timestamp: System time in milliseconds for precise timing correlation
Copy Metrics: File size, copy time in milliseconds, and calculated throughput (MB/s)
Metadata Metrics: Number of operations (3000: create + read + delete), time, ops/second
Directory Listing Time: Time for ls operations in milliseconds
Human-Readable Date: Formatted timestamp for log analysis

Memory and Resource Management

Resource utilization patterns for long-running benchmarks:

Minimal Memory Usage: Default 50MB heap (-Xmx50m) for lightweight operation
Buffer Recycling: ByteBuilder objects are reused between read/write cycles
Queue-Based Coordination: ArrayBlockingQueue with capacity limits prevents memory bloat
Thread Lifecycle Management: Proper thread termination and resource cleanup

Timing and Scheduling

Precise timing control for consistent benchmarking intervals:

Interval Enforcement: Uses Object.wait() for precise timing between test cycles
Minimum Wait Time: 10-millisecond minimum wait to prevent CPU spinning
Continuous Operation: Infinite loop design for long-term monitoring
Time Drift Correction: Calculates next execution time to maintain consistent intervals

Performance Characteristics

System Requirements

Memory Usage: Minimal 50MB heap size for long-running operation
Disk Space: Requires space for test files (default 5GB × ways parameter)
I/O Capacity: Tests both sequential and random I/O patterns
Thread Overhead: One additional thread per copy operation

Scalability

Parallel Files: Supports 1 to many parallel test files via ways parameter
File Size Range: Handles files from KB to multi-GB range
Monitoring Duration: Designed for continuous long-term operation
Log File Growth: Appends one line per interval, manageable growth rate

Use Cases

Storage Validation: Test new storage systems before production deployment
Performance Regression Testing: Monitor filesystem performance changes over time
Capacity Planning: Determine I/O limits for bioinformatics workloads
Troubleshooting: Identify storage bottlenecks in analysis pipelines

Technical Notes

Output Format

Log files contain tab-delimited data with the following columns:

#time	size	copyTime	MB/s	metaOps	metaTime	ops/s	lsTime	date

File Naming Convention

When using multiple parallel files (ways > 1), the input template must contain a '#' character that will be replaced with the file index (0 to ways-1).

Error Handling

The tool includes error handling for common filesystem issues:

Automatic retry for interrupted I/O operations
Graceful handling of missing directories
Size validation and automatic file recreation
Thread interruption recovery

Limitations

Requires write permissions in the working directory
Test files consume disk space during benchmark execution
Long-running operation may be affected by system maintenance
Results may vary based on system load and concurrent I/O

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org