Nov 25, 2025

Announcing LlamaSheets: Turn Messy Spreadsheets into AI-Ready Data (Beta)

By

Logan Markewich

50

Introducing LlamaSheets
Technical Approach
Output Contents and Format
Use Cases
Example: Extract and Analyze in 5 Lines
Available now in beta
What's Next

Today, we are announcing the first of our dedicated API's for handling spreadsheets, available today in Beta for free!

Ready to get started with LlamaCloud?

Explore our free and paid plans today.

Learn more

Introducing LlamaSheets

Spreadsheets are everywhere. From financial models, product catalogs, and operational reports, spreadsheets exist across a wide range of formats and levels of organization.

Unlike typical unstructured documents, spreadsheets contain highly structured numerical data, complex formatting, and visual hierarchies that traditional text parsing cannot capture. LLMs and agents need to understand not just the raw cell values, but the semantic relationships, formatting patterns, and hierarchical structure encoded in these documents.

The challenge is that "messy" spreadsheets often use visual formatting (bold headers, colored cells, merged regions) to convey meaning rather than explicit data structures. Before any AI automation can happen, this normalization step,extracting structured data while preserving semantic context, is critical. That is why we’re so excited today to announce our newest product to LlamaCloud, LlamaSheets!

LlamaSheets is a new LlamaCloud API that automatically structures complex spreadsheets into AI-ready data using semantic understanding. The input is any .xlsx file, and the output is parquet files that can be used in any agent or downstream application.

Technical Approach

Our processing algorithm implements a sophisticated multi-stage pipeline:

Feature Extraction & Clustering - 40+ features per cell are extracted (position, formatting, etc.) and are then featurized for clustering
Intelligent Region Classification - Clusters are then classified into specific types of regions are classified using a combination of traditional ML techniques and agent-based processing
Adaptive Table Segmentation - A scoring system evaluates boundary quality between regions and iteratively refines boundaries
Hierarchical Structure Preservation - ****Intelligent extraction within each table is applied that preserves multi-level headers and complex table structures and preserves types where possible (dates, numbers, booleans, text)

Output Contents and Format

LlamaSheets produces multiple types of outputs, mostly as parquet files:

Table data: Clean, typed DataFrames with preserved data types (dates, numbers, strings, booleans). Column names are intelligently extracted from header rows.
Extra data: Data in your spreadsheet that doesn't explicitly belong in a structured data table (notes, titles, etc.)
Cell metadata: 40+ features per cell including formatting (font_bold , background_color_rgb ), position (row_number , coordinate ), data types (is_date_like , is_percentage ), and layout (is_merged_cell , horizontal_alignment )
Sheet context: Optional LLM-generated titles and descriptions for each worksheet and extracted table region

Use Cases

LlamaSheets enables AI automation across diverse spreadsheet workflows. Here's just a few examples:

Financial Analysis: Extract quarterly revenue tables from complex financial reports with merged headers and calculate KPIs automatically
Multi-Region Data Consolidation: Parse and combine sales data from dozens of regional spreadsheets with inconsistent formatting
Budget Parsing with Metadata: Use background colors and bold formatting to identify department groupings and category hierarchies in budget files
Automated Weekly Reports: Build end-to-end pipelines that extract, validate, analyze, and generate reports from recurring spreadsheet uploads
Custom Agent Integrations: Load extracted Parquet files into AI Agent frameworks (like LlamaIndex) for interactive data exploration, script generation, and more

Example: Extract and Analyze in 5 Lines

python

from llama_cloud_services.beta.sheets import LlamaSheets

client = LlamaSheets(api_key="llx-...")
results = await client.aextract_regions("budget.xlsx")

# Download as pandas DataFrame
df = await client.adownload_region_as_dataframe(
  results.job_id,
  results.regions[0].region_id,
  result_type=regions[0].region_type
)

# Access rich cell metadata
metadata = await client.adownload_region_as_dataframe(
  results.job_id,
  results.regions[0].region_id,
  result_type="cell_metadata"
)

Available now in beta

LlamaSheets is available today in beta through multiple interfaces:

🧩 Playground UI: Experiment with sample spreadsheets directly in the browser at cloud.llamaindex.ai
💻 Python SDK: The llama-cloud-services package provides async/sync methods for uploading files, creating extraction jobs, polling for completion, and downloading Parquet results as pandas DataFrames or raw bytes
🌐 REST API: Four-step workflow via /api/v1/beta/sheets/ endpoints: (1) Upload file → (2) Create job with parsing config → (3) Poll for completion → (4) Download Parquet files via presigned URLs
📘 Build Agents Integrate with any agent framework (LlamaIndex, Claude Code, Cursor, etc.) by loading extracted Parquet files and cell metadata into your agent's context

What's Next

During the beta period, we're focused on performance optimization, enhanced accuracy, additional output formats, and future API’s that build on the region and table extraction to provide more end-to-end experiences.

We encourage users to try the API, provide feedback on extraction quality, and help us prioritize features for the full release!

Let us know what you think:

Keep Reading

LlamaAgents Builder: Idea To Deployed Agent in Minutes
Jan 28, 2026

[ LlamaAgents ]

[ +2 ]
Building Back Office Agents with LlamaCloud & LlamaAgents
Jan 27, 2026

[ LlamaCloud ]

[ +1 ]
Announcing New LlamaCloud SDKs and Parse API v2
Jan 22, 2026

[ LlamaCloud ]

[ +1 ]