Warnings and Exceptions¶
SplitterMR exposes a small, explicit exception hierarchy so you can reliably handle errors coming from Readers (document ingestion/conversion) and Splitters (chunking/segmentation).
Exceptions live in:
from splitter_mr.schema.exceptions import *
Why a custom exception hierarchy?¶
- Stable contracts: you can catch
ReaderExceptionorSplitterExceptionto handle “library-level” failures without depending on implementation details. - More precise handling: configuration errors vs conversion/runtime errors are separated.
- Wrapped backends: some readers wrap upstream library errors into SplitterMR-specific exceptions.
Tip
Recommended practice
- Catch specific exceptions when you can (e.g.,
ReaderConfigException). - Fall back to the base class for a broad handler (e.g.,
ReaderException/SplitterException). - Avoid catching Exception unless you are at an application boundary.
Reader exceptions¶
Readers raise ReaderException (or subclasses) when the library cannot read/convert/validate inputs.
Hierarchy
`ReaderException` (`Exception`)
├── `ReaderOutputException`
├── `HtmlConversionError`
├── `ReaderConfigException` (also `ValueError`)
├── `VanillaReaderException` (also `RuntimeError`)
├── `MarkItDownReaderException` (also `RuntimeError`)
└── `DoclingReaderException` (also `RuntimeError`)
General¶
ReaderException¶
Bases: Exception
Base exception for reader-related errors.
Source code in src/splitter_mr/schema/exceptions.py
8 9 10 11 | |
I/O and Configuration¶
ReaderOutputException¶
Bases: ReaderException
Raised when ReaderOutput has not a valid structure.
Source code in src/splitter_mr/schema/exceptions.py
14 15 | |
ReaderConfigException¶
Bases: ReaderException, ValueError
Raised when invalid parameters are passed to the Reader configuration.
Source code in src/splitter_mr/schema/exceptions.py
28 29 30 31 | |
Typical cases:
- Unsupported file extension or mode.
- Mutually incompatible flags (e.g., page-splitting options on formats that don’t support it).
- Invalid values (negative sizes, unknown enum-like strings, etc.).
Readers¶
VanillaReaderException¶
Bases: ReaderException, RuntimeError
Raised when VanillaReader–based document conversion fails. Wraps exceptions coming from vanilla_reader.exceptions.VanillaReaderError.
Source code in src/splitter_mr/schema/exceptions.py
37 38 39 40 41 | |
Note
Wraps exceptions coming from vanilla_reader.exceptions.VanillaReaderError.
Typical cases:
- A subprocess/tool invocation fails (if used internally)
- Conversion/parse errors for JSON/XML/YAML/CSV/Parquet, etc.
- Filesystem/temporary directory issues during conversion
MarkItDownReaderException¶
Bases: ReaderException, RuntimeError
Raised when MarkItDown–based document conversion fails in MarkItDownReader. Wraps exceptions coming from markitdown.exceptions.MarkItDownError.
Source code in src/splitter_mr/schema/exceptions.py
47 48 49 50 51 | |
Note
Wraps exceptions coming from markitdown.exceptions.MarkItDownError.
Typical cases:
- Backend conversion fails for a supported document type
- External dependency misconfiguration (where applicable)
DoclingReaderException¶
Bases: ReaderException, RuntimeError
Raised when IBM Docling–based document conversion fails in DoclingReader. Wraps exceptions coming from docling.exceptions.BaseError.
Source code in src/splitter_mr/schema/exceptions.py
57 58 59 60 61 | |
Note
Wraps exceptions coming from docling.exceptions.BaseError.
Typical cases:
- Docling pipeline errors while parsing PDFs or documents
- Model/runtime errors in the Docling stack
Splitter exceptions¶
Splitters raise SplitterException (or subclasses) when the library cannot construct chunks or validate splitter configuration/output.
Hierarchy
`SplitterException` (`Exception`)
├── `InvalidChunkException` (also `ValueError`)
├── `SplitterConfigException` (also `ValueError`)
│ ├── `InvalidHeaderNameError`
│ └── `HeaderLevelOutOfRangeError`
└── `SplitterOutputException` (also `TypeError`)
General¶
SplitterException¶
Bases: Exception
Base exception for splitter-related errors.
Source code in src/splitter_mr/schema/exceptions.py
71 72 73 74 | |
I/O and Configuration¶
InvalidChunkException¶
Bases: SplitterException, ValueError
Raised when chunks cannot be constructed correctly.
Source code in src/splitter_mr/schema/exceptions.py
80 81 | |
Typical cases:
- Chunk boundaries cannot be computed
- Empty/invalid intermediate structures prevent chunk creation
SplitterConfigException¶
Bases: SplitterException, ValueError
Raised when chunks cannot be constructed correctly.
Source code in src/splitter_mr/schema/exceptions.py
80 81 | |
Typical cases:
- Missing required parameters
- Invalid ranges (e.g., chunk sizes)
- Unsupported strategy options
SplitterOutputException¶
Bases: SplitterException, TypeError
Raised when SplitterOutput cannot be built or validated.
Source code in src/splitter_mr/schema/exceptions.py
88 89 | |
Typical cases:
- Output validation fails
- Inconsistent internal fields (e.g., missing chunks, wrong metadata types)
Splitters-specific exceptions¶
HeaderSplitter¶
NormalizationError
Bases: ReaderException, TypeError
Raised when Setext→ATX normalization can't be safely applied.
Source code in src/splitter_mr/schema/exceptions.py
103 104 | |
HeaderLevelOutOfRangeError
Bases: SplitterConfigException
Raised when the parsed header level is outside 1..6.
Source code in src/splitter_mr/schema/exceptions.py
99 100 | |
InvalidHeaderNameError
Bases: SplitterConfigException
Raised when a header string isn't of the expected 'Header N' form.
Source code in src/splitter_mr/schema/exceptions.py
95 96 | |
HtmlTagSplitter¶
HtmlConversionError
Bases: ReaderException
Raised when HTML→Markdown conversion fails.
Source code in src/splitter_mr/schema/exceptions.py
21 22 | |
Reference table¶
| Area | Exception | Type | Description |
|---|---|---|---|
| Reader | ReaderException |
Exception |
Base reader error |
| Reader | ReaderOutputException |
ReaderException |
Invalid ReaderOutput structure |
| Reader | HtmlConversionError |
ReaderException |
HTML → Markdown conversion failed |
| Reader | ReaderConfigException |
ValueError |
Invalid reader configuration |
| Reader | VanillaReaderException |
RuntimeError |
Vanilla conversion failed (wrapped) |
| Reader | MarkItDownReaderException |
RuntimeError |
MarkItDown conversion failed (wrapped) |
| Reader | DoclingReaderException |
RuntimeError |
Docling conversion failed (wrapped) |
| Splitter | SplitterException |
Exception |
Base splitter error |
| Splitter | InvalidChunkException |
ValueError |
Chunks cannot be constructed |
| Splitter | SplitterConfigException |
ValueError |
Invalid splitter configuration |
| Splitter | SplitterOutputException |
TypeError |
Invalid SplitterOutput |
| Header Splitter | InvalidHeaderNameError |
SplitterConfigException |
Bad "Header N" format |
| Header Splitter | HeaderLevelOutOfRangeError |
SplitterConfigException |
Header level not in 1..6 |
| Header Splitter | NormalizationError |
ReaderException |
Setext → ATX normalization failed |
| HTML Tag Splitter | InvalidHtmlTagError |
ReaderException |
Invalid/missing HTML tag |