[ From the team behind LlamaParse ]
Parse Any Document.
Locally. Fast.
Open-source document parsing from the team behind LlamaParse. Parsed text from PDFs, Office docs, and images. No cloud, no LLM tokens, no limits.
npm i -g @llamaindex/liteparse
lit parse anything.pdf Fully open-source
Fast local processing
All major formats
Bounding box output
How it works
How LiteParse works?
01
Input
Drop in any document: PDF, DOCX, PPTX, XLSX, or image. LiteParse auto-detects the format and selects the right parsing strategy.
02
Text Parsing + OCR
A hybrid approach: structure embedded text from files, fall back to traditional OCR for scanned regions. Both run locally, no API calls, no data leaving your machine.
03
AI Ready Output
Get clean JSON with every text element tagged by position, bounding boxes included. Ready for AI agents, citations, or downstream tasks.
One tool, every format
Stop juggling different parsers
One command handles PDFs, Office documents, images, and more. Same interface, same structured output, every time.
Precise spatial output
Know exactly where every element lives
Every parsed element comes with precise bounding box coordinates. Titles, paragraphs, tables, figures are all tagged with their exact position on the page.
-
Citations — Point users to exact locations in source documents.
-
Multimodal pipelines — Pair extracted text with visual screenshots for richer LLM inputs.
Built for agents
Runs anywhere. Integrates with any workflow.
A fast CLI and Python package designed for automation. No API keys, no cloud dependency. Parse documents in CI/CD pipelines, agent workflows, or local scripts.
-
Pipe output directly to LLMs or vector stores
-
Batch-process entire directories in seconds
-
JSON output for programmatic workflows
-
Zero configuration, just install and run
Comparison
LiteParse vs LlamaParse
Features | LiteParse | LlamaParse |
|---|---|---|
| Spatial Text Output |
|
|
| Text bounding boxes |
|
|
| Screenshot Image Capture |
|
|
| Local-Only |
|
– |
| Markdown Output |
– |
|
| Figure/Chart Understanding |
– |
|
| Scalable | Scales to number of computer cores | Cloud scaling |
| Embedded Image Extraction |
– |
|
| Image Captioning |
– |
|
| Layout Detection |
– |
|
| SOTA OCR for Scanned Docs |
– |
|
Dive Deeper
Resources