LiteParse

[ From the team behind LlamaParse ]

Parse Any Document.
Locally. Fast.

Open-source document parsing from the team behind LlamaParse. Parsed text from PDFs, Office docs, and images. No cloud, no LLM tokens, no limits.

npm i -g @llamaindex/liteparse
lit parse anything.pdf

Fully open-source

Fast local processing

All major formats

Bounding box output

How it works

How LiteParse works?

01

Input

Drop in any document: PDF, DOCX, PPTX, XLSX, or image. LiteParse auto-detects the format and selects the right parsing strategy.

02

Text Parsing + OCR

A hybrid approach: structure embedded text from files, fall back to traditional OCR for scanned regions. Both run locally, no API calls, no data leaving your machine.

03

AI Ready Output

Get clean JSON with every text element tagged by position, bounding boxes included. Ready for AI agents, citations, or downstream tasks.

One tool, every format

Stop juggling different parsers

One command handles PDFs, Office documents, images, and more. Same interface, same structured output, every time.

Precise spatial output

Know exactly where every element lives

Every parsed element comes with precise bounding box coordinates. Titles, paragraphs, tables, figures are all tagged with their exact position on the page.

Citations — Point users to exact locations in source documents.
Multimodal pipelines — Pair extracted text with visual screenshots for richer LLM inputs.

Built for agents

Runs anywhere. Integrates with any workflow.

A fast CLI and Python package designed for automation. No API keys, no cloud dependency. Parse documents in CI/CD pipelines, agent workflows, or local scripts.

Pipe output directly to LLMs or vector stores
Batch-process entire directories in seconds
JSON output for programmatic workflows
Zero configuration, just install and run

Comparison

LiteParse vs LlamaParse

Features	LiteParse	LlamaParse
Spatial Text Output
Text bounding boxes
Screenshot Image Capture
Local-Only		–
Markdown Output	–
Figure/Chart Understanding	–
Scalable	Scales to number of computer cores	Cloud scaling
Embedded Image Extraction	–
Image Captioning	–
Layout Detection	–
SOTA OCR for Scanned Docs	–

Dive Deeper

Resources

Get started  in seconds

Parse your documents with a fast, open-source solution by the LlamaIndex team. No API keys required.

Parse Any Document.Locally. Fast.