Skip to content

Textum Documentation

Architecture

View Source: textum/app

Architecture¶

Technology Stack¶

PySide6 - Cross-platform GUI
qfluentwidgets - Modern Qt Theming Framework
Transformers - Model loading and inference
Ultralytics / YOLOv9 – Region and line segmentation
PyMuPDF (fitz) – PDF rasterization
OpenCV / scikit-image / Pillow – Image preprocessing
python-docx / Qt PDF – Export to DOCX and PDF
SQLAlchemy / SQLite - Job Database
Nuitka – Compilation to native standalone executable

Recognition Pipeline¶

Pipeline architecture diagram — Textum OCR/HTR processing pipeline architecture

Internal Data Representation¶

Jobs and their statuses are saved in an SQLite database.
Each job's content is saved as a JSON file in the working directory, including the recognized text with bounding boxes.
Both the handwritten and printed pipeline produce the same JSON format, to make post-processing easier.
The processed images are also saved in the working directory.

Job JSON Representation¶

[
  {
    "page_id": "x" | "x.1" | "x.2",
    "text": ["line 1", "line 2", ...],
    "with_regions": [
      {"text": "line 1", "bbox": [x0, y0, x1, y1]},
      {"text": "line 2", "bbox": [x0, y0, x1, y1]},
      ...
    ],
    "image_path": "<working dir>/processed_images/<job_i>/PAGE1.jpg",
    "error": null | "error string"
  },
  ...
]