zeroshotGPU

Self-hosted agentic document parser. Upload a single document, multiple documents, or a .zip of documents (PDF / Markdown / plaintext / HTML). Each parse emits canonical markdown, structured JSON, retrieval-ready chunks (multi-strategy), a quality report with GT-comparison metrics where applicable, and a SHA-256-checksummed artifact manifest. Per-file caps: 50 MB / 200 pages. Batch cap: 20 docs per request. See the [Help] tab for full instructions.

Source on Hugging Face

Pipeline
Docling + PyMuPDF runs both for the disagreement signal. Default lightweight is text + PyMuPDF only. Live GPU repair enables repair.execute_gpu_escalations=true and dispatches malformed-table / OCR / figure / reading-order issues to Qwen2.5-VL.

What this is

zeroshotGPU is an agentic document-parsing control plane. It does not rely on a single extraction engine — it profiles each document, routes pages to the best parser expert (Docling, PyMuPDF, optionally Marker / MinerU / olmOCR / PaddleOCR / Unstructured), normalizes outputs into a canonical schema, verifies quality, repairs weak regions through a bounded verify/repair loop (with optional GPU escalation), and emits retrieval-ready chunks with provenance.

How to use this Space

1. Pick a pipeline mode.

Mode What it does
Docling + PyMuPDF Default. Runs both parsers so the parser-disagreement metric has a comparison surface. Good for general-purpose parsing.
Default lightweight Text + PyMuPDF only. Fastest. Use when you just need clean text extraction.
Live GPU repair Enables repair.execute_gpu_escalations=true. Verification failures (invalid tables, OCR coverage gaps, reading-order issues, missing figure captions) are dispatched to Qwen2.5-VL-3B on the GPU. Slower; requires the GPU path to actually be hit (deterministic repair handles markdown tables before this fires).

2. Upload one or more documents. Accepts .pdf, .md, .txt, .html, or a .zip of any of those. Multi-file selection works. Per-file cap: 50 MB / 200 pages. Batch cap: 20 docs per request.

3. Click Parse. Watch the progress bar; first call may take longer if a model has to download.

What each tab shows

  • Markdown — canonical reconstruction of the parsed document. For batch uploads, this shows the first document; the full set is in the artifacts zip.
  • Run — summary, quality report, parser metrics, and artifact manifest validation. For batch uploads, Summary.batch lists every document parsed in the request with its headline metrics + an aggregate block.
  • Chunks — per-strategy chunk breakdown: total / parent / child / table-linked / figure-linked / visual-context counts, plus per-strategy blocks with token count distribution (min/median/max) and 3 sample chunks per strategy with 240-char previews.
  • Artifacts — each top-level artifact (parsed_document.json, chunks.jsonl, quality_report.json, etc.) downloadable individually. Nested asset crops (page renders, table images) stay bundled in the zip above.
  • Runtime — detected GPU runtime, planned GPU tasks, preflight report.
  • Smokes — runs the project's smoke validation suite in-Space; reports per-smoke pass/fail/skip + detail. API: /gradio_api/call/run_smokes_in_space.
  • Benchmark — two modes: against committed regression fixtures, OR against an uploaded corpus you supply. Returns headline metrics (quality score, retrieval recall, repair resolution rate, etc.) plus a per-doc breakdown. API: /gradio_api/call/run_benchmark_in_space and /gradio_api/call/run_benchmark_on_upload.

API surface

Every button is also a Gradio API endpoint, so AI agents and downstream tooling can invoke them programmatically. Discovery: agents.md at the Space root returns the calling instructions; /gradio_api/info returns the full schema.

# Parse a doc:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/parse_uploaded_document \
  -H "Content-Type: application/json" \
  -d '{"data": [{file_data}, "Default lightweight"]}'

# Run smokes:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/run_smokes_in_space \
  -H "Content-Type: application/json" -d '{"data": []}'

# Benchmark:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/run_benchmark_in_space \
  -H "Content-Type: application/json" -d '{"data": []}'

Configuration

Defaults work out of the box. To change behavior, set Space variables:

  • ZSGDP_CONFIG_PATH — point at one of configs/default.yaml, configs/docling.yaml, configs/live_gpu_repair.yaml, or your own committed YAML.
  • ZSGDP_LOG_LEVELINFO (default on Spaces), DEBUG, WARNING, etc.
  • ZSGDP_LOG_JSON1 (default on Spaces) for one-line JSON log records.
  • ZSGDP_MAX_UPLOAD_BYTES / ZSGDP_MAX_PAGE_COUNT / ZSGDP_MAX_BATCH_DOCS — abuse guards.
  • HF_TOKEN — required for gated models (jina-embeddings-v3 may need it).

Known limits

  • ZeroGPU duration cap. Each @spaces.GPU-decorated call runs in a 60s GPU slot. First-call cold-start for big models (Qwen2.5-VL-3B is ~6 GB) exceeds this on a clean cache. Subsequent calls reuse the cached weights and fit comfortably.
  • Live GPU repair only fires when the deterministic repair path can't resolve an issue. For markdown tables, the deterministic normalizer handles most malformations before GPU dispatch is needed.
  • GT-comparison metrics (layout F1, table structure score, formula CER) require labelled datasets (omnidocbench, doclaynet). Uploaded custom corpora produce all the GT-free metrics but those three.

Source

The full project source — including the multi-step spec, contributor docs, and 250+ unit tests — is at the link above. The Files tab on the Space page shows the live deploy.