# BigGrep Rust Implementation - Usage Guide

This guide provides comprehensive examples and usage patterns for all BigGrep Rust tools.

## Table of Contents

- [Quick Start](#quick-start)
- [Tool Overview](#tool-overview)
- [rs-bgindex - File Indexer](#rs-bgindex---file-indexer)
- [rs-bgsearch - Search Engine](#rs-bgsearch---search-engine)
- [rs-bgparse - File Parser](#rs-bgparse---file-parser)
- [rs-bgverify - File Verifier](#rs-bgverify---file-verifier)
- [rs-bgextractfile - File Extractor](#rs-bgextractfile---file-extractor)
- [Complete Workflows](#complete-workflows)
- [Performance Tips](#performance-tips)
- [Troubleshooting](#troubleshooting)

## Quick Start

### Building All Tools

```bash
# Build all tools in release mode for optimal performance
cargo build --release

# Or build individual tools
cargo build -p rs-bgindex --release
cargo build -p rs-bgsearch --release
cargo build -p rs-bgparse --release
cargo build -p rs-bgverify --release
cargo build -p rs-bgextractfile --release
```

### Basic Workflow

1. **Index your files** using `rs-bgindex`
2. **Search patterns** using `rs-bgsearch`
3. **Verify results** using `rs-bgverify`
4. **Parse structured data** using `rs-bgparse`
5. **Extract files** using `rs-bgextractfile`

## Tool Overview

| Tool | Purpose | Key Features |
|------|---------|-------------|
| `rs-bgindex` | Build N-gram indexes | 3-gram/4-gram indexing, PFOR compression, parallel processing |
| `rs-bgsearch` | Search across indexes | Multi-format patterns, metadata filtering, YARA integration |
| `rs-bgparse` | Parse structured data | JSON/XML/CSV parsing, batch processing, data extraction |
| `rs-bgverify` | Verify search results | Pattern verification, checksum validation, integrity checking |
| `rs-bgextractfile` | Extract files | Archive extraction, pattern-based extraction, batch processing |

## rs-bgindex - File Indexer

Build high-performance N-gram indexes for ultra-fast pattern searching.

### Basic Indexing

```bash
# Index all files in current directory (recursive)
find . -type f | rs-bgindex -p my_corpus_index

# Index specific file types
find . -name "*.txt" -o -name "*.log" | rs-bgindex -p text_index

# Index with custom prefix and verbose output
find /var/log -type f | rs-bgindex -p system_logs -v
```

### Advanced Indexing Options

```bash
# High-performance indexing with maximum parallelism
find . -type f | rs-bgindex \
  -p high_perf_index \
  -S 8 \          # 8 shingling threads
  -C 8 \          # 8 compression threads
  -L \            # Use lockfree queues
  -v

# Mixed 3-gram/4-gram indexing for dense corpora
find . -type f | rs-bgindex \
  -p mixed_index \
  -M 1000000 \    # Max unique n-grams per file
  -O overflow.txt # Overflow files list
```

### Index File Management

```bash
# Create index with custom blocksize and exceptions
rs-bgindex -p custom_index \
  -b 64 \         # PFOR blocksize
  -e 4 \          # Max exceptions per block
  -m 8            # Min entries for PFOR

# Index with diagnostic information
find . -type f | rs-bgindex \
  -p diagnostic_index \
  -d \
  -l index_build.log
```

### File List Format

Input can be in multiple formats:

```bash
# Simple paths (auto-assigned file IDs)
echo "/path/to/file1.txt" | rs-bgindex -p index1
echo "/path/to/file2.txt" >> file_list.txt

# File with explicit file IDs
cat > files.txt << EOF
0:/path/to/file1.txt
1:/path/to/file2.log
2:/path/to/file3.bin
EOF

cat files.txt | rs-bgindex -p indexed_files
```

### Performance Optimization

```bash
# For large corpora (>1TB), use optimized settings
find /data -type f | rs-bgindex \
  -p massive_corpus \
  -S 16 \         # More shingling threads
  -C 16 \         # More compression threads
  -L \            # Lockfree queues for high throughput
  -b 128          # Larger PFOR blocksize
```

## rs-bgsearch - Search Engine

Search across BigGrep indexes with advanced filtering and verification.

### Basic Search Patterns

```bash
# ASCII text search
rs-bgsearch -a "password" -d ./indexes

# Binary hex pattern search (ZIP file signature)
rs-bgsearch -b "504b0304" -d ./indexes

# Unicode text search
rs-bgsearch -u "café résumé" -d ./indexes
```

### Directory and Recursive Search

```bash
# Search specific directory
rs-bgsearch -a "config" -d /path/to/indexes

# Recursive search across subdirectories
rs-bgsearch -a "API_KEY" -d /path/to/indexes -r

# Search multiple index directories
rs-bgsearch -a "suspicious" -d /indexes1 -d /indexes2
```

### Metadata Filtering

```bash
# Filter by file size
rs-bgsearch -a "credentials" -d ./indexes -f "size>=1024"
rs-bgsearch -a "config" -d ./indexes -f "size<65536"

# Filter by file type and timestamp
rs-bgsearch -a "password" -d ./indexes -f "type=executable"
rs-bgsearch -a "vulnerable" -d ./indexes -f "timestamp>=1640995200"

# Combine multiple filters
rs-bgsearch -a "malware" -d ./indexes \
  -f "size>=1024" \
  -f "arch=x86_64" \
  -f "os!=Windows"
```

### Verification and Validation

```bash
# Enable result verification
rs-bgsearch -a "password" -d ./indexes -v

# YARA rule-based verification
rs-bgsearch -a "suspicious" -d ./indexes -y malware_rules.yar

# Limit verification candidates
rs-bgsearch -a "pattern" -d ./indexes -v -l 5000

# Combine verification and filtering
rs-bgsearch -a "API_KEY" -d ./indexes \
  -f "size>=512" \
  -v \
  -y api_key_rules.yar
```

### Output Formats and Options

```bash
# JSON output with metadata
rs-bgsearch -a "error" -d ./indexes -o json

# CSV output for spreadsheet analysis
rs-bgsearch -a "fail" -d ./indexes -o csv

# Text output without metadata
rs-bgsearch -a "debug" -d ./indexes -o text --no-metadata

# Show timing metrics
rs-bgsearch -a "query" -d ./indexes --metrics

# Limit results
rs-bgsearch -a "pattern" -d ./indexes -l 100
```

### Performance Tuning

```bash
# Use multiple CPU cores
rs-bgsearch -a "search_term" -d ./indexes -n 24

# Shuffle index order for better load distribution
rs-bgsearch -a "pattern" -d ./indexes --index-order shuffle

# Verbose output for debugging
rs-bgsearch -a "query" -d ./indexes -V

# Log to file
rs-bgsearch -a "error" -d ./indexes --syslog
```

### Configuration File Usage

Create `bgsearch.conf`:

```toml
[search]
directories = ["/data/indexes", "/backup/indexes"]
recursive = true
verify = false
numprocs = 16
index_order = "shuffle"
candidate_limit = 100000

[logging]
level = "info"
syslog = true

[output]
format = "json"
show_metadata = true
```

Then use:

```bash
rs-bgsearch -a "pattern" --config bgsearch.conf
```

## rs-bgparse - File Parser

Parse structured data formats and extract specific information.

### JSON Parsing

```bash
# Extract specific fields from JSON files
rs-bgparse --input data.json --extract "users.*name" --format json

# Parse multiple JSON files
rs-bgparse --input *.json --extract "metadata.version" --format json

# Batch parse directory
rs-bgparse batch --input /path/to/json_files --format json --output parsed/
```

### XML Parsing

```bash
# Extract XML elements
rs-bgparse --input config.xml --extract "//user/name" --format xml

# Parse XML with namespaces
rs-bgparse --input soap.xml --extract "//soap:Envelope/soap:Body" --format xml

# Batch XML processing
rs-bgparse batch --input /data/xml/ --format xml --output extracted/
```

### CSV Processing

```bash
# Extract CSV columns
rs-bgparse --input data.csv --extract "username,email" --format csv

# Parse CSV with headers
rs-bgparse --input users.csv --extract "name,email,department" --format csv

# Process large CSV files
rs-bgparse batch --input /data/*.csv --format csv --output analysis/
```

### Custom Data Extraction

```bash
# Extract using regex patterns
rs-bgparse --input logfile.log --extract "ERROR:.*" --format text

# Multi-field extraction
rs-bgparse --input config.txt \
  --extract "server_ip:(\\d+\\.\\d+\\.\\d+\\.\\d+)" \
  --format text

# Extract and format as JSON
rs-bgparse --input data.txt \
  --extract "name=(\\w+),age=(\\d+)" \
  --format json
```

### Batch Processing

```bash
# Process entire directory
rs-bgparse batch --input /path/to/files --format auto --output /tmp/parsed

# Process specific file types
rs-bgparse batch --input /data --format json --include "*.json" --output json_out/

# Process with parallel execution
rs-bgparse batch --input /large/dataset --format xml --parallel --threads 8
```

### Format Auto-Detection

```bash
# Auto-detect file formats
rs-bgparse batch --input /mixed/formats --format auto --output extracted/

# Process mixed formats with specific outputs
rs-bgparse batch --input /data \
  --format json --include "*.json" \
  --format xml --include "*.xml" \
  --format csv --include "*.csv" \
  --output structured/
```

## rs-bgverify - File Verifier

Verify search results and check file integrity.

### Pattern Verification

```bash
# Verify search results for specific patterns
rs-bgverify --input file1.txt file2.log file3.bin \
  --patterns "password" "API_KEY" "secret"

# Verify with case sensitivity
rs-bgverify --input results.txt --patterns "Error" --case-sensitive

# Verify binary patterns
rs-bgverify --input binaries/ --patterns "DEADBEEF" "CAFEBABE" --binary
```

### File Integrity Checking

```bash
# Check file checksums against known values
rs-bgverify integrity --files *.txt --checksum-file checksums.sha256

# Verify MD5 checksums
rs-bgverify integrity --files /data/ --md5-file known.md5

# Cross-validate multiple hash types
rs-bgverify integrity --files important/ \
  --sha256-file hashes.sha256 \
  --md5-file hashes.md5
```

### Index Verification

```bash
# Verify index consistency
rs-bgverify index --index index.bgi --check-files

# Validate index structure
rs-bgverify index --index corpus.index --validate-structure

# Check index file coverage
rs-bgverify index --index large_index.bgi --check-coverage
```

### Batch Verification

```bash
# Verify directory of files
rs-bgverify batch --input /verification/set/ --patterns "malware" "backdoor"

# Parallel verification
rs-bgverify batch --input /data/ --parallel --threads 16

# Verify with reporting
rs-bgverify batch --input /files/ \
  --patterns "suspicious" \
  --report verification_report.json \
  --fail-on-match
```

### Spot Checking

```bash
# Random sampling verification
rs-bgverify spot-check --files /data/ --samples 1000 --patterns "pattern1"

# Statistical verification
rs-bgverify spot-check --index index.bgi --statistical --confidence 0.95

# Time-based sampling
rs-bgverify spot-check --files /logs/ --time-window "2024-01-01:2024-12-31"
```

### Verification Reports

```bash
# Generate detailed verification report
rs-bgverify --input test_files/ \
  --patterns "pattern" \
  --report detailed_report.json \
  --verbose

# Quick verification summary
rs-bgverify --input verification_set/ --patterns "key" --summary

# Export results to multiple formats
rs-bgverify --input data/ \
  --patterns "credentials" \
  --report json:results.json \
  --report csv:results.csv \
  --report text:results.txt
```

## rs-bgextractfile - File Extractor

Extract files from archives and containers.

### ZIP Archive Extraction

```bash
# Extract ZIP archive to directory
rs-bgextractfile zip --input archive.zip --output extracted/

# Extract specific files from ZIP
rs-bgextractfile zip --input archive.zip --output files/ *.txt *.log

# Extract with pattern matching
rs-bgextractfile zip --input archive.zip \
  --output extracted/ \
  --pattern "*.conf" \
  --pattern "*.log"
```

### TAR Archive Handling

```bash
# Extract TAR archive
rs-bgextractfile tar --input backup.tar --output restored/

# Extract TAR.GZ compressed archive
rs-bgextractfile tar --input backup.tar.gz --output extracted/

# Extract TAR.BZ2 compressed archive
rs-bgextractfile tar --input archive.tar.bz2 --output extracted/
```

### Archive Listing

```bash
# List ZIP archive contents
rs-bgextractfile list --input archive.zip

# List TAR archive contents with details
rs-bgextractfile list --input backup.tar --verbose

# Filter listing by patterns
rs-bgextractfile list --input archive.zip --pattern "*.txt"
```

### Pattern-Based Extraction

```bash
# Extract files matching patterns
rs-bgextractfile pattern --input large.zip \
  --output extracted/ \
  "*.txt" \
  "*.log" \
  "config*"

# Extract with regex patterns
rs-bgextractfile pattern --input archive.zip \
  --output selected/ \
  --regex "^user_" \
  --regex "backup.*\\.dat$"
```

### Batch Archive Processing

```bash
# Process multiple archives
rs-bgextractfile batch --input /archives/ \
  --output /extracted/ \
  --pattern "*.txt"

# Process archives in parallel
rs-bgextractfile batch --input /data/*.zip \
  --output /output/ \
  --parallel --threads 8

# Filter by archive type
rs-bgextractfile batch --input /archives/ \
  --type zip --output zip_extracted/ \
  --type tar --output tar_extracted/
```

### Permission and Metadata Preservation

```bash
# Extract with permissions preserved
rs-bgextractfile zip --input archive.zip --output extracted/ --preserve-permissions

# Extract with timestamps preserved
rs-bgextractfile tar --input backup.tar --output restored/ --preserve-timestamps

# Extract with full metadata
rs-bgextractfile zip --input archive.zip --output extracted/ --full-metadata
```

### Nested Archive Handling

```bash
# Extract nested archives recursively
rs-bgextractfile nested --input outer.zip --output deep_extracted/ --recursive

# Limit extraction depth
rs-bgextractfile nested --input archive.zip --output limited/ --max-depth 3

# Extract with content verification
rs-bgextractfile nested --input complex.zip --output verified/ --verify-content
```

## Complete Workflows

### Forensics Investigation Workflow

```bash
# 1. Index suspect files
find /evidence -type f | rs-bgindex -p suspect_index -v

# 2. Search for suspicious patterns
rs-bgsearch -a "password" -d /evidence_indexes \
  -f "size>=1024" \
  -f "timestamp>=1640995200" \
  -o json > suspicious_files.json

# 3. Verify findings
rs-bgverify --input suspicious_files.txt \
  --patterns "password" "key" "secret" \
  --report investigation_report.json

# 4. Extract relevant files
rs-bgextractfile pattern --input /evidence \
  --output /extracted_evidence/ \
  "*.conf" \
  "*.log" \
  "*.key"

# 5. Parse extracted configurations
rs-bgparse batch --input /extracted_evidence \
  --format auto --output /analyzed_evidence/
```

### Security Audit Workflow

```bash
# 1. Build comprehensive index
find /production -type f | rs-bgindex -p prod_audit -v

# 2. Search for security indicators
rs-bgsearch -a "API_KEY\|SECRET\|PASSWORD" \
  -d /production_indexes \
  -v \
  -o json > security_findings.json

# 3. Verify with security rules
rs-bgsearch -a "malware\|backdoor\|trojan" \
  -d /production_indexes \
  -y security_rules.yar \
  --report security_report.json

# 4. Extract and parse suspicious files
rs-bgparse batch --input security_findings.txt \
  --format json --output audit_analysis/
```

### Malware Analysis Workflow

```bash
# 1. Index malware samples
find /malware_corpus -type f | rs-bgindex -p malware_db -v

# 2. Search for known signatures
rs-bgsearch -b "4d5a9000030000000400" \
  -d /malware_indexes \
  -y yara_rules.yar \
  --verify > detected_malware.json

# 3. Extract embedded files
rs-bgextractfile pattern --input /malware_samples \
  --output extracted_samples/ \
  --regex "\\.exe$" \
  --regex "\\.dll$"

# 4. Analyze extracted files
rs-bgparse batch --input extracted_samples/ \
  --format auto --output malware_analysis/
```

### System Administration Workflow

```bash
# 1. Build system file index
find /etc /var/log /usr -type f | rs-bgindex -p system_files -v

# 2. Search for configuration issues
rs-bgsearch -a "debug\|trace\|verbose" \
  -d /system_indexes \
  -f "path=/etc/*" \
  > debug_configs.json

# 3. Verify configuration files
rs-bgverify integrity --files /etc/ --checksum-file known_configs.sha256

# 4. Extract log files for analysis
rs-bgextractfile pattern --input /var/log \
  --output /tmp/logs/ \
  "*.log" \
  "error*" \
  "debug*"
```

### Data Analysis Workflow

```bash
# 1. Index dataset files
find /dataset -type f | rs-bgindex -p dataset_index -v

# 2. Search for data patterns
rs-bgsearch -a "error\|fail\|exception" \
  -d /dataset_indexes \
  -o json > data_quality_issues.json

# 3. Parse structured data
rs-bgparse batch --input /dataset \
  --format json --include "*.json" \
  --format xml --include "*.xml" \
  --output parsed_data/

# 4. Verify data integrity
rs-bgverify batch --input parsed_data/ \
  --patterns "required_field" \
  --report data_validation_report.json
```

## Performance Tips

### Indexing Optimization

```bash
# Use SSD storage for indexes
rs-bgindex -p /ssd/index -v

# Increase thread count for CPU-bound workloads
find /data -type f | rs-bgindex -p optimized_index \
  -S $(nproc) \
  -C $(nproc)

# Use lockfree queues for high-throughput scenarios
find /large/dataset -type f | rs-bgindex -p high_throughput \
  -L \
  --verbose
```

### Search Optimization

```bash
# Match thread count to CPU cores
rs-bgsearch -a "pattern" -d /indexes -n $(nproc)

# Use metadata filtering to reduce candidate set
rs-bgsearch -a "query" -d /indexes \
  -f "size>=1024" \
  -f "size<=1048576" \
  -n 16

# Shuffle index order for better load distribution
rs-bgsearch -a "pattern" -d /indexes --index-order shuffle
```

### Memory Management

```bash
# For large files, use streaming instead of memory mapping
rs-bgindex -p stream_index --use-streaming

# Limit verification to prevent memory overflow
rs-bgsearch -a "pattern" -d /indexes -v -l 5000

# Use partial processing for massive datasets
rs-bgparse batch --input /huge/dataset --chunk-size 1000 --output chunks/
```

### Parallel Processing

```bash
# Enable parallel processing for all tools
rs-bgparse batch --input /data --parallel --threads $(nproc)

rs-bgverify batch --input /files --parallel --threads $(nproc)

rs-bgextractfile batch --input /archives --parallel --threads $(nproc)
```

## Troubleshooting

### Common Issues

#### Index Build Failures

```bash
# Check available disk space
df -h

# Verify file permissions
ls -la /path/to/files

# Enable verbose logging
rs-bgindex -p test_index -v -d -l build.log

# Check memory usage
free -h
```

#### Search Performance Issues

```bash
# Verify index files exist
ls -la /path/to/indexes/*.bgi

# Check index file integrity
rs-bgverify index --index /path/to/index.bgi

# Reduce search scope with filters
rs-bgsearch -a "pattern" -d /indexes -f "size>=1024" -f "type=executable"

# Monitor CPU usage
top
```

#### Memory Issues

```bash
# Check memory usage
free -m
ps aux --sort=-%mem | head

# Use streaming mode for large files
rs-bgindex -p stream_index --streaming

# Limit verification candidates
rs-bgsearch -a "pattern" -d /indexes -v -l 1000
```

### Debug Commands

```bash
# Enable debug logging
export RUST_LOG=debug
rs-bgsearch -a "pattern" -d /indexes -D

# Check index structure
rs-bgverify index --index /path/to/index.bgi --validate-structure

# Test pattern matching
rs-bgverify --input test_file.txt --patterns "pattern" --verbose

# Monitor file access
strace -e openat rs-bgsearch -a "pattern" -d /indexes
```

### Performance Monitoring

```bash
# Monitor I/O performance
iostat -x 1

# Check CPU utilization
mpstat -P ALL 1

# Monitor memory usage
vmstat 1

# Profile specific tool
perf record -g rs-bgindex -p test_index
perf report
```

For more detailed information, see the individual tool documentation and the main README.md file.