Skip to content

Warnings and Exceptions

SplitterMR exposes a small, explicit exception hierarchy so you can reliably handle errors coming from Readers (document ingestion/conversion) and Splitters (chunking/segmentation).

Exceptions live in:

from splitter_mr.schema.exceptions import *

Why a custom exception hierarchy?

  • Stable contracts: you can catch ReaderException or SplitterException to handle “library-level” failures without depending on implementation details.
  • More precise handling: configuration errors vs conversion/runtime errors are separated.
  • Wrapped backends: some readers wrap upstream library errors into SplitterMR-specific exceptions.

Tip

Recommended practice

  • Catch specific exceptions when you can (e.g., ReaderConfigException).
  • Fall back to the base class for a broad handler (e.g., ReaderException / SplitterException).
  • Avoid catching Exception unless you are at an application boundary.

Reader exceptions

Readers raise ReaderException (or subclasses) when the library cannot read/convert/validate inputs.

Hierarchy

`ReaderException` (`Exception`)
├── `ReaderOutputException`
├── `HtmlConversionError`
├── `ReaderConfigException` (also `ValueError`)
├── `VanillaReaderException` (also `RuntimeError`)
├── `MarkItDownReaderException` (also `RuntimeError`)
└── `DoclingReaderException` (also `RuntimeError`)

General

ReaderException

Bases: Exception

Base exception for reader-related errors.

Source code in src/splitter_mr/schema/exceptions.py
 8
 9
10
11
class ReaderException(Exception):
    """Base exception for reader-related errors."""

    pass

I/O and Configuration

ReaderOutputException

Bases: ReaderException

Raised when ReaderOutput has not a valid structure.

Source code in src/splitter_mr/schema/exceptions.py
14
15
class ReaderOutputException(ReaderException):
    """Raised when ReaderOutput has not a valid structure."""

ReaderConfigException

Bases: ReaderException, ValueError

Raised when invalid parameters are passed to the Reader configuration.

Source code in src/splitter_mr/schema/exceptions.py
28
29
30
31
class ReaderConfigException(ReaderException, ValueError):
    """
    Raised when invalid parameters are passed to the Reader configuration.
    """

Typical cases:

  • Unsupported file extension or mode.
  • Mutually incompatible flags (e.g., page-splitting options on formats that don’t support it).
  • Invalid values (negative sizes, unknown enum-like strings, etc.).

Readers

VanillaReaderException

Bases: ReaderException, RuntimeError

Raised when VanillaReader–based document conversion fails. Wraps exceptions coming from vanilla_reader.exceptions.VanillaReaderError.

Source code in src/splitter_mr/schema/exceptions.py
37
38
39
40
41
class VanillaReaderException(ReaderException, RuntimeError):
    """
    Raised when VanillaReader–based document conversion fails.
    Wraps exceptions coming from vanilla_reader.exceptions.VanillaReaderError.
    """

Note

Wraps exceptions coming from vanilla_reader.exceptions.VanillaReaderError.

Typical cases:

  • A subprocess/tool invocation fails (if used internally)
  • Conversion/parse errors for JSON/XML/YAML/CSV/Parquet, etc.
  • Filesystem/temporary directory issues during conversion

MarkItDownReaderException

Bases: ReaderException, RuntimeError

Raised when MarkItDown–based document conversion fails in MarkItDownReader. Wraps exceptions coming from markitdown.exceptions.MarkItDownError.

Source code in src/splitter_mr/schema/exceptions.py
47
48
49
50
51
class MarkItDownReaderException(ReaderException, RuntimeError):
    """
    Raised when MarkItDown–based document conversion fails in MarkItDownReader.
    Wraps exceptions coming from markitdown.exceptions.MarkItDownError.
    """

Note

Wraps exceptions coming from markitdown.exceptions.MarkItDownError.

Typical cases:

  • Backend conversion fails for a supported document type
  • External dependency misconfiguration (where applicable)

DoclingReaderException

Bases: ReaderException, RuntimeError

Raised when IBM Docling–based document conversion fails in DoclingReader. Wraps exceptions coming from docling.exceptions.BaseError.

Source code in src/splitter_mr/schema/exceptions.py
57
58
59
60
61
class DoclingReaderException(ReaderException, RuntimeError):
    """
    Raised when IBM Docling–based document conversion fails in DoclingReader.
    Wraps exceptions coming from docling.exceptions.BaseError.
    """

Note

Wraps exceptions coming from docling.exceptions.BaseError.

Typical cases:

  • Docling pipeline errors while parsing PDFs or documents
  • Model/runtime errors in the Docling stack

Splitter exceptions

Splitters raise SplitterException (or subclasses) when the library cannot construct chunks or validate splitter configuration/output.

Hierarchy

`SplitterException` (`Exception`)
├── `InvalidChunkException` (also `ValueError`)
├── `SplitterConfigException` (also `ValueError`)   ├── `InvalidHeaderNameError`   └── `HeaderLevelOutOfRangeError`
└── `SplitterOutputException` (also `TypeError`)

General

SplitterException

Bases: Exception

Base exception for splitter-related errors.

Source code in src/splitter_mr/schema/exceptions.py
71
72
73
74
class SplitterException(Exception):
    """Base exception for splitter-related errors."""

    pass

I/O and Configuration

InvalidChunkException

Bases: SplitterException, ValueError

Raised when chunks cannot be constructed correctly.

Source code in src/splitter_mr/schema/exceptions.py
80
81
class InvalidChunkException(SplitterException, ValueError):
    """Raised when chunks cannot be constructed correctly."""

Typical cases:

  • Chunk boundaries cannot be computed
  • Empty/invalid intermediate structures prevent chunk creation

SplitterConfigException

Bases: SplitterException, ValueError

Raised when chunks cannot be constructed correctly.

Source code in src/splitter_mr/schema/exceptions.py
80
81
class InvalidChunkException(SplitterException, ValueError):
    """Raised when chunks cannot be constructed correctly."""

Typical cases:

  • Missing required parameters
  • Invalid ranges (e.g., chunk sizes)
  • Unsupported strategy options

SplitterOutputException

Bases: SplitterException, TypeError

Raised when SplitterOutput cannot be built or validated.

Source code in src/splitter_mr/schema/exceptions.py
88
89
class SplitterOutputException(SplitterException, TypeError):
    """Raised when SplitterOutput cannot be built or validated."""

Typical cases:

  • Output validation fails
  • Inconsistent internal fields (e.g., missing chunks, wrong metadata types)

Splitters-specific exceptions

HeaderSplitter

NormalizationError

Bases: ReaderException, TypeError

Raised when Setext→ATX normalization can't be safely applied.

Source code in src/splitter_mr/schema/exceptions.py
103
104
class NormalizationError(ReaderException, TypeError):
    """Raised when Setext→ATX normalization can't be safely applied."""

HeaderLevelOutOfRangeError

Bases: SplitterConfigException

Raised when the parsed header level is outside 1..6.

Source code in src/splitter_mr/schema/exceptions.py
 99
100
class HeaderLevelOutOfRangeError(SplitterConfigException):
    """Raised when the parsed header level is outside 1..6."""

InvalidHeaderNameError

Bases: SplitterConfigException

Raised when a header string isn't of the expected 'Header N' form.

Source code in src/splitter_mr/schema/exceptions.py
95
96
class InvalidHeaderNameError(SplitterConfigException):
    """Raised when a header string isn't of the expected 'Header N' form."""

HtmlTagSplitter

HtmlConversionError

Bases: ReaderException

Raised when HTML→Markdown conversion fails.

Source code in src/splitter_mr/schema/exceptions.py
21
22
class HtmlConversionError(ReaderException):
    """Raised when HTML→Markdown conversion fails."""

Reference table

Area Exception Type Description
Reader ReaderException Exception Base reader error
Reader ReaderOutputException ReaderException Invalid ReaderOutput structure
Reader HtmlConversionError ReaderException HTML → Markdown conversion failed
Reader ReaderConfigException ValueError Invalid reader configuration
Reader VanillaReaderException RuntimeError Vanilla conversion failed (wrapped)
Reader MarkItDownReaderException RuntimeError MarkItDown conversion failed (wrapped)
Reader DoclingReaderException RuntimeError Docling conversion failed (wrapped)
Splitter SplitterException Exception Base splitter error
Splitter InvalidChunkException ValueError Chunks cannot be constructed
Splitter SplitterConfigException ValueError Invalid splitter configuration
Splitter SplitterOutputException TypeError Invalid SplitterOutput
Header Splitter InvalidHeaderNameError SplitterConfigException Bad "Header N" format
Header Splitter HeaderLevelOutOfRangeError SplitterConfigException Header level not in 1..6
Header Splitter NormalizationError ReaderException Setext → ATX normalization failed
HTML Tag Splitter InvalidHtmlTagError ReaderException Invalid/missing HTML tag