zeroshotGPU

What this is

zeroshotGPU is an agentic document-parsing control plane. It does not rely on a single extraction engine — it profiles each document, routes pages to the best parser expert (Docling, PyMuPDF, optionally Marker / MinerU / olmOCR / PaddleOCR / Unstructured), normalizes outputs into a canonical schema, verifies quality, repairs weak regions through a bounded verify/repair loop (with optional GPU escalation), and emits retrieval-ready chunks with provenance.

How to use this Space

1. Pick a pipeline mode.

Mode	What it does
`Docling + PyMuPDF`	Default. Runs both parsers so the parser-disagreement metric has a comparison surface. Good for general-purpose parsing.
`Default lightweight`	Text + PyMuPDF only. Fastest. Use when you just need clean text extraction.
`Live GPU repair`	Enables `repair.execute_gpu_escalations=true`. Verification failures (invalid tables, OCR coverage gaps, reading-order issues, missing figure captions) are dispatched to Qwen2.5-VL-3B on the GPU. Slower; requires the GPU path to actually be hit (deterministic repair handles markdown tables before this fires).

2. Upload one or more documents. Accepts .pdf, .md, .txt, .html, or a .zip of any of those. Multi-file selection works. Per-file cap: 50 MB / 200 pages. Batch cap: 20 docs per request.

3. Click Parse. Watch the progress bar; first call may take longer if a model has to download.

What each tab shows

Markdown — canonical reconstruction of the parsed document. For batch uploads, this shows the first document; the full set is in the artifacts zip.
Run — summary, quality report, parser metrics, and artifact manifest validation. For batch uploads, Summary.batch lists every document parsed in the request with its headline metrics + an aggregate block.
Chunks — per-strategy chunk breakdown: total / parent / child / table-linked / figure-linked / visual-context counts, plus per-strategy blocks with token count distribution (min/median/max) and 3 sample chunks per strategy with 240-char previews.
Artifacts — each top-level artifact (parsed_document.json, chunks.jsonl, quality_report.json, etc.) downloadable individually. Nested asset crops (page renders, table images) stay bundled in the zip above.
Runtime — detected GPU runtime, planned GPU tasks, preflight report.
Smokes — runs the project's smoke validation suite in-Space; reports per-smoke pass/fail/skip + detail. API: /gradio_api/call/run_smokes_in_space.
Benchmark — two modes: against committed regression fixtures, OR against an uploaded corpus you supply. Returns headline metrics (quality score, retrieval recall, repair resolution rate, etc.) plus a per-doc breakdown. API: /gradio_api/call/run_benchmark_in_space and /gradio_api/call/run_benchmark_on_upload.

API surface

Every button is also a Gradio API endpoint, so AI agents and downstream tooling can invoke them programmatically. Discovery: agents.md at the Space root returns the calling instructions; /gradio_api/info returns the full schema.

# Parse a doc:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/parse_uploaded_document \
  -H "Content-Type: application/json" \
  -d '{"data": [{file_data}, "Default lightweight"]}'

# Run smokes:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/run_smokes_in_space \
  -H "Content-Type: application/json" -d '{"data": []}'

# Benchmark:
curl -X POST https://arjun10g-zeroshotgpu.hf.space/gradio_api/call/run_benchmark_in_space \
  -H "Content-Type: application/json" -d '{"data": []}'

Configuration

Defaults work out of the box. To change behavior, set Space variables:

ZSGDP_CONFIG_PATH — point at one of configs/default.yaml, configs/docling.yaml, configs/live_gpu_repair.yaml, or your own committed YAML.
ZSGDP_LOG_LEVEL — INFO (default on Spaces), DEBUG, WARNING, etc.
ZSGDP_LOG_JSON — 1 (default on Spaces) for one-line JSON log records.
ZSGDP_MAX_UPLOAD_BYTES / ZSGDP_MAX_PAGE_COUNT / ZSGDP_MAX_BATCH_DOCS — abuse guards.
HF_TOKEN — required for gated models (jina-embeddings-v3 may need it).

Known limits

ZeroGPU duration cap. Each @spaces.GPU-decorated call runs in a 60s GPU slot. First-call cold-start for big models (Qwen2.5-VL-3B is ~6 GB) exceeds this on a clean cache. Subsequent calls reuse the cached weights and fit comfortably.
Live GPU repair only fires when the deterministic repair path can't resolve an issue. For markdown tables, the deterministic normalizer handles most malformations before GPU dispatch is needed.
GT-comparison metrics (layout F1, table structure score, formula CER) require labelled datasets (omnidocbench, doclaynet). Uploaded custom corpora produce all the GT-free metrics but those three.

Source

The full project source — including the multi-step spec, contributor docs, and 250+ unit tests — is at the link above. The Files tab on the Space page shows the live deploy.