# rs-bgsearch - BigGrep Search Orchestrator

`rs-bgsearch` is the Rust implementation of the BigGrep search orchestrator tool. It provides a high-performance command-line interface for searching across BigGrep indexes with support for parallel execution, metadata filtering, and result verification.

## Features

- **Multi-format pattern support**: ASCII strings, hexadecimal patterns, and Unicode text
- **Intelligent index discovery**: Recursive directory scanning with configurable ordering
- **Metadata filtering**: Python-style filter expressions with comparison operators
- **Parallel search**: Multi-threaded search across multiple index files
- **Verification integration**: Support for bgverify and YARA rule-based verification
- **Flexible output formats**: Text, JSON, and CSV output formats
- **Configuration file support**: TOML-based configuration with command-line override
- **Performance monitoring**: Built-in metrics and timing information

## Installation

Build from source using Cargo:

```bash
cargo build --release
```

## Usage Examples

### Basic ASCII search

```bash
rs-bgsearch -a "Hello World" -d /path/to/indexes
```

### Binary pattern search

```bash
rs-bgsearch -b "48656c6c6f" -d /path/to/indexes
```

### Unicode pattern search

```bash
rs-bgsearch -u "café" -d /path/to/indexes
```

### Recursive directory search with verification

```bash
rs-bgsearch -a "password" -d /path/to/indexes -r -v
```

### Metadata filtering

```bash
rs-bgsearch -a "config" -d /path/to/indexes -f "size>=1024" -f "arch=x86_64"
```

### YARA rule verification

```bash
rs-bgsearch -a "suspicious_pattern" -d /path/to/indexes -y rules.yar
```

### JSON output with limited results

```bash
rs-bgsearch -b "504b0304" -d /path/to/indexes -o json -l 100
```

## Command-Line Options

### Search Pattern Options
- `-a, --ascii`: ASCII string pattern
- `-b, --binary`: Hexadecimal binary pattern  
- `-u, --unicode`: Unicode string pattern
- `-p, --patterns`: Direct pattern specification (auto-detect type)

### Index Discovery Options
- `-d, --directory`: Directory containing .bgi index files
- `-r, --recursive`: Recurse into subdirectories
- `--index-order`: Index search order (`alpha` or `shuffle`)

### Output Options
- `-o, --output-format`: Output format (`text`, `json`, `csv`)
- `-M, --no-metadata`: Hide metadata in output
- `--banner`: Display text file as MOTD

### Verification Options
- `-v, --verify`: Enable verification using bgverify
- `-y, --yara`: Use YARA rules file for verification
- `-l, --limit`: Candidate verification limit (0 disables, default 15000)

### Filtering Options
- `-f, --filter`: Metadata filter criteria (format: `field operator value`)
- Supported operators: `=`, `!=`, `<`, `>`, `<=`, `>=`
- Examples: `size>=1024`, `arch=x86_64`, `os!=Windows`

### Performance Options
- `-n, --numprocs`: Number of parallel searches (default 12)
- `-t, --throttle`: Buffer threshold for throttling (default 10000)

### Logging Options
- `-V, --verbose`: Verbose output (INFO level)
- `-D, --debug`: Diagnostic output (DEBUG level)
- `--syslog`: Log to syslog
- `--metrics`: Display per-directory timing metrics

### Configuration Options
- `--config`: Configuration file path
- `-h, --help`: Show help message

## Metadata Filtering

The tool supports expressive metadata filtering using comparison operators:

### Numeric Filters
- `size>=1024` - Files larger than 1KB
- `size<65536` - Files smaller than 64KB
- `timestamp>=1640995200` - Files modified after Unix timestamp

### String Filters  
- `arch=x86_64` - Architecture equals x86_64
- `os!=Windows` - Operating system is not Windows
- `type=executable` - File type is executable

### Combining Filters
Multiple filters are combined conjunctively (AND logic):

```bash
rs-bgsearch -a "config" -f "size>=1024" -f "arch=x86_64" -f "os!=Windows"
```

## Configuration File

Create a `bgsearch.conf` file for persistent settings:

```toml
[search]
directories = ["/path/to/indexes"]
recursive = true
verify = false
numprocs = 12
index_order = "shuffle"
candidate_limit = 100000

[logging]
level = "info"
syslog = true

[output]
format = "json"
show_metadata = true
```

## Integration with Other Tools

`rs-bgsearch` is designed to work seamlessly with other BigGrep tools:

- **bgparse**: Use generated index files from the parsing tool
- **bgindex**: Search indexes built by the indexing tool
- **bgverify**: Optional verification of search results
- **YARA**: Rule-based pattern verification

## Performance Considerations

- **Parallel execution**: Default uses 12 threads; adjust with `-n` based on CPU cores
- **Memory usage**: Uses memory-mapped I/O for efficient index access
- **Verification overhead**: Verification can be expensive; use `-l` to limit candidates
- **Index ordering**: Use "shuffle" for better load distribution across searches

## Output Formats

### Text Format (default)
```
/path/to/file1.txt:1024
/path/to/file2.txt [verified]
/path/to/file3.txt:2048 [YARA: rule1, rule2]
```

### JSON Format
```json
[
  {
    "file_path": "/path/to/file1.txt",
    "offset": 1024,
    "metadata": {
      "size": "2048",
      "arch": "x86_64"
    },
    "verified": true,
    "yara_matches": ["malware_signature"]
  }
]
```

### CSV Format
```csv
file_path,offset,verified,yara_matches
/path/to/file1.txt,1024,true,"rule1;rule2"
/path/to/file2.txt,,false,
```

## Exit Codes

- `0`: Successful search with results
- `1`: No results found or search failed
- `2`: Command-line usage error

## Architecture

The tool follows BigGrep's architecture:

1. **Pattern conversion**: ASCII/hex/Unicode → N-grams
2. **Index discovery**: Find and order .bgi files
3. **Parallel search**: Search across multiple indexes concurrently
4. **Result filtering**: Apply metadata filters
5. **Verification**: Optional bgverify or YARA verification
6. **Output formatting**: Format results according to specifications

## License

MIT License - see LICENSE file for details.
