Streaming
O(1) memory validation for large JSON arrays and NDJSON streams
Streaming
valrs provides streaming validation for processing large files with constant memory usage. Instead of loading entire files into memory, items are validated one at a time as they arrive from the stream.
Why Streaming?
Traditional validation requires loading the entire dataset into memory:
With streaming validation, memory usage is constant:
This enables processing files of any size, limited only by disk space or network bandwidth.
Streaming JSON Arrays
Use stream() to validate JSON arrays incrementally:
The input must be a valid JSON array (e.g., [{...}, {...}, {...}]). The parser handles arrays split across multiple chunks correctly.
Streaming NDJSON
Use streamLines() for newline-delimited JSON (NDJSON), where each line is a separate JSON object:
NDJSON format (one JSON object per line):
Stream Options
Both stream() and streamLines() accept an options object:
Options Reference
| Option | Type | Default | Description |
|---|---|---|---|
maxItems | number | Infinity | Maximum items to process |
maxBytes | number | string | Infinity | Maximum bytes to process |
timeout | number | string | Infinity | Processing timeout |
onError | 'throw' | 'skip' | 'collect' | 'throw' | Error handling strategy |
highWaterMark | number | 16 | Backpressure threshold |
Size and Duration Formats
maxBytes accepts numbers (bytes) or human-readable strings:
timeout accepts numbers (milliseconds) or duration strings:
Error Handling Modes
The onError option controls how validation failures are handled:
throw (default)
Stops processing and throws on the first validation error:
skip
Silently skips invalid items and continues processing:
collect
Continues processing but collects all errors for later inspection:
Each collected error contains:
index: The position of the failed itemerror: TheValErrorwith validation detailsrawValue: The original parsed value (if available)
Result Methods
toArray()
Collects all validated items into an array:
Using toArray() loads all items into memory, negating the streaming benefit. Use it only when you need all items at once and know the dataset fits in memory.
pipeTo()
Pipes validated items directly to a writable stream:
This is useful for ETL pipelines where you want to write validated data to a database or file without holding everything in memory.
Real-World Examples
Streaming from Fetch Response
Process a large API response without loading it entirely into memory:
Streaming from File (Node.js)
Process a large JSON file from disk:
Processing NDJSON Logs
Stream and analyze log files in NDJSON format:
ETL Pipeline with Validation
Build an extract-transform-load pipeline with streaming validation:
Supported Input Types
Both stream() and streamLines() accept multiple input types:
Memory Efficiency
Streaming validation maintains O(1) memory usage by:
- Incremental parsing: JSON is parsed character-by-character, extracting complete items as they appear
- Immediate validation: Each item is validated and yielded before the next is parsed
- No buffering: Validated items are not stored; they flow through to your processing logic
- Backpressure support: The stream respects consumer speed via
highWaterMark
This means you can process a 10GB file with the same memory footprint as a 10KB file.
When to Use Streaming
| Scenario | Recommended Approach |
|---|---|
| File < 10MB | Regular parse() is fine |
| File 10MB - 100MB | Consider streaming |
| File > 100MB | Always use streaming |
| Unknown size (API response) | Use streaming for safety |
| Real-time data (WebSocket, SSE) | Use streaming |
Testing with Mock Streams
For testing, use the provided helper functions:
Next Steps
- Getting Started - Basic valrs usage
- Custom Schemas - Create custom validation logic
- Standard Schema - Interoperability with other tools