zeroshotGPU
Self-hosted agentic document parser. Upload a single document, multiple documents, or a .zip of documents (PDF / Markdown / plaintext / HTML). Each parse emits canonical markdown, structured JSON, retrieval-ready chunks (multi-strategy), a quality report with GT-comparison metrics where applicable, and a SHA-256-checksummed artifact manifest. Per-file caps: 50 MB / 200 pages. Batch cap: 20 docs per request. See the [Help] tab for full instructions.
Docling + PyMuPDF runs both for the disagreement signal. Default lightweight is text + PyMuPDF only. Live GPU repair enables repair.execute_gpu_escalations=true and dispatches malformed-table / OCR / figure / reading-order issues to Qwen2.5-VL.What this is
zeroshotGPU is an agentic document-parsing control plane. It does not rely on a single extraction engine — it profiles each document, routes pages to the best parser expert (Docling, PyMuPDF, optionally Marker / MinerU / olmOCR / PaddleOCR / Unstructured), normalizes outputs into a canonical schema, verifies quality, repairs weak regions through a bounded verify/repair loop (with optional GPU escalation), and emits retrieval-ready chunks with provenance.
How to use this Space
1. Pick a pipeline mode.
| Mode | What it does |
|---|---|
Docling + PyMuPDF |
Default. Runs both parsers so the parser-disagreement metric has a comparison surface. Good for general-purpose parsing. |
Default lightweight |
Text + PyMuPDF only. Fastest. Use when you just need clean text extraction. |
Live GPU repair |
Enables repair.execute_gpu_escalations=true. Verification failures (invalid tables, OCR coverage gaps, reading-order issues, missing figure captions) are dispatched to Qwen2.5-VL-3B on the GPU. Slower; requires the GPU path to actually be hit (deterministic repair handles markdown tables before this fires). |
2. Upload one or more documents. Accepts .pdf, .md, .txt, .html,
or a .zip of any of those. Multi-file selection works. Per-file cap:
50 MB / 200 pages. Batch cap:
20 docs per request.
3. Click Parse. Watch the progress bar; first call may take longer if a model has to download.
What each tab shows
- Markdown — canonical reconstruction of the parsed document. For batch uploads, this shows the first document; the full set is in the artifacts zip.
- Run — summary, quality report, parser metrics, and artifact manifest
validation. For batch uploads,
Summary.batchlists every document parsed in the request with its headline metrics + an aggregate block. - Chunks — per-strategy chunk breakdown: total / parent / child / table-linked / figure-linked / visual-context counts, plus per-strategy blocks with token count distribution (min/median/max) and 3 sample chunks per strategy with 240-char previews.
- Artifacts — each top-level artifact (
parsed_document.json,chunks.jsonl,quality_report.json, etc.) downloadable individually. Nested asset crops (page renders, table images) stay bundled in the zip above. - Runtime — detected GPU runtime, planned GPU tasks, preflight report.
- Smokes — runs the project's smoke validation suite in-Space; reports
per-smoke pass/fail/skip + detail. API:
/gradio_api/call/run_smokes_in_space. - Benchmark — two modes: against committed regression fixtures, OR against
an uploaded corpus you supply. Returns headline metrics (quality score,
retrieval recall, repair resolution rate, etc.) plus a per-doc breakdown.
API:
/gradio_api/call/run_benchmark_in_spaceand/gradio_api/call/run_benchmark_on_upload.
API surface
Every button is also a Gradio API endpoint, so AI agents and downstream tooling
can invoke them programmatically. Discovery: agents.md at the Space root
returns the calling instructions; /gradio_api/info returns the full schema.
# Parse a doc:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/parse_uploaded_document \
-H "Content-Type: application/json" \
-d '{"data": [{file_data}, "Default lightweight"]}'
# Run smokes:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/run_smokes_in_space \
-H "Content-Type: application/json" -d '{"data": []}'
# Benchmark:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/run_benchmark_in_space \
-H "Content-Type: application/json" -d '{"data": []}'
Configuration
Defaults work out of the box. To change behavior, set Space variables:
ZSGDP_CONFIG_PATH— point at one ofconfigs/default.yaml,configs/docling.yaml,configs/live_gpu_repair.yaml, or your own committed YAML.ZSGDP_LOG_LEVEL—INFO(default on Spaces),DEBUG,WARNING, etc.ZSGDP_LOG_JSON—1(default on Spaces) for one-line JSON log records.ZSGDP_MAX_UPLOAD_BYTES/ZSGDP_MAX_PAGE_COUNT/ZSGDP_MAX_BATCH_DOCS— abuse guards.HF_TOKEN— required for gated models (jina-embeddings-v3 may need it).
Known limits
- ZeroGPU duration cap. Each
@spaces.GPU-decorated call runs in a 60s GPU slot. First-call cold-start for big models (Qwen2.5-VL-3B is ~6 GB) exceeds this on a clean cache. Subsequent calls reuse the cached weights and fit comfortably. - Live GPU repair only fires when the deterministic repair path can't resolve an issue. For markdown tables, the deterministic normalizer handles most malformations before GPU dispatch is needed.
- GT-comparison metrics (layout F1, table structure score, formula CER)
require labelled datasets (
omnidocbench,doclaynet). Uploaded custom corpora produce all the GT-free metrics but those three.
Source
The full project source — including the multi-step spec, contributor docs,
and 250+ unit tests — is at the link above. The Files tab on the Space
page shows the live deploy.