DocAI - Coming Soon
DocAI - Coming Soon

Turn complex documents into trusted structured data

Docimoto DocAI is an end-to-end information extraction pipeline for PDF, Excel, Word, and email. DocAI has built-in data provenance, confidence scoring, and human-in-the-loop review
About Docimoto
About Docimoto

Docimoto builds accuracy-focused document intelligence systemscombining vision models, open LLMs, domain ontologies and domain knowledgebases to extract data you can audit and operationalize. We also offer consulting and custom software to take pilots to production 

Docimoto builds accuracy-focused document intelligence systemscombining vision models, open LLMs, domain ontologies and domain knowledgebases to extract data you can audit and operationalize. We also offer consulting and custom software to take pilots to production 

Docimoto builds accuracy-focused document intelligence systemscombining vision models, open LLMs, domain ontologies and domain knowledgebases to extract data you can audit and operationalize. We also offer consulting and custom software to take pilots to production 

Provenance & traceability

Provenance & traceability

Confidence-driven review (HITL)

Confidence-driven review (HITL)

Domain rules & ontology hooks

Domain rules & ontology hooks

Cloud-agnostic deployment

Cloud-agnostic deployment

Open-source core

Open-source core

End-to-end process support

End-to-end process support

Provenance & traceability

Confidence-driven review (HITL)

Domain rules & ontology hooks

Cloud-agnostic deployment

Open-source core

End-to-end process support

Open Source & Research
Open Source & Research

We build open-source document intelligence systems that bridge machine learning research and production-grade engineering 

We build open-source document intelligence systems that bridge machine learning research and production-grade engineering 

We build open-source document intelligence systems that bridge machine learning research and production-grade engineering 

Our work focuses on structured information extraction from Excel, PDF, Word, and image documentsparticularly Excel, an under-explored yet business-critical format 

Our work focuses on structured information extraction from Excel, PDF, Word, and image documentsparticularly Excel, an under-explored yet business-critical format 

Our work focuses on structured information extraction from Excel, PDF, Word, and image documentsparticularly Excel, an under-explored yet business-critical format 

TableSense2

The tableSense2 visual model based on excel structure, 47 cell features include formats, formulas, merges, comments, etc.

The tableSense2 visual model based on excel structure, 47 cell features include formats, formulas, merges, comments, etc.

The tableSense2 visual model based on excel structure, 47 cell features include formats, formulas, merges, comments, etc.

Tags: Computer Vision, CNN, Transformers, PyTorch, python, Excel

Tags: Computer Vision, CNN, Transformers, PyTorch, python, Excel

Tags: Computer Vision, CNN, Transformers, PyTorch, python, Excel

TableSense2SimpleAnnotationUI

A Simple browser based UI for creating annotations to train TableSense2 models excel table detection

A Simple browser based UI for creating annotations to train TableSense2 models excel table detection

A Simple browser based UI for creating annotations to train TableSense2 models excel table detection

Tags: Javascript, python

Tags: Javascript, python

Tags: Javascript, python

DocAI

A broader end-to-end DocAI reference platform is under active development and will be released incrementally as open source

A broader end-to-end DocAI reference platform is under active development and will be released incrementally as open source

A broader end-to-end DocAI reference platform is under active development and will be released incrementally as open source

Tags: python, node, React, Kubernetes, LLM, Postgres, pgvector, Vision Models

Tags: python, node, React, Kubernetes, LLM, Postgres, pgvector, Vision Models

Tags: python, node, React, Kubernetes, LLM, Postgres, pgvector, Vision Models

Github Link: Comming soon

Github Link: Comming soon

Github Link: Comming soon

Features
Features

Built from hands‑on experience across numerous ML and AI information extraction projects

DocAI elevates your team’s productivity

Visually traceable data extraction

Every extracted value links back to the exact region in the original document, with layout and formatting preserved

Visually traceable data extraction

Every extracted value links back to the exact region in the original document, with layout and formatting preserved

Visually traceable data extraction

Every extracted value links back to the exact region in the original document, with layout and formatting preserved

Confidence levels & exceptions
Confidence levels & exceptions
Confidence levels & exceptions

Low-confidence extractions trigger a shared human-review workflow, giving all stakeholders visibility into document progress and resolution

Low-confidence extractions trigger a shared human-review workflow, giving all stakeholders visibility into document progress and resolution

Low-confidence extractions trigger a shared human-review workflow, giving all stakeholders visibility into document progress and resolution

Built for any domain
Built for any domain
Built for any domain

Our platform gives you domain‑independent, end‑to‑end workflows and plug‑and‑play hooks. You get a powerful foundation you can tailor and bring your applications to life quickly with less effort

Our platform gives you domain‑independent, end‑to‑end workflows and plug‑and‑play hooks. You get a powerful foundation you can tailor and bring your applications to life quickly with less effort

Our platform gives you domain‑independent, end‑to‑end workflows and plug‑and‑play hooks. You get a powerful foundation you can tailor and bring your applications to life quickly with less effort

Production ready platform
Production ready platform
Production ready platform

Batch processing, parallelism, and structured outputs, with built-in traceability and controls to support your SOC 2 and ISO-aligned certifications

Batch processing, parallelism, and structured outputs, with built-in traceability and controls to support your SOC 2 and ISO-aligned certifications

Batch processing, parallelism, and structured outputs, with built-in traceability and controls to support your SOC 2 and ISO-aligned certifications

Open core, Enterprise-ready
Open core, Enterprise-ready
Open core, Enterprise-ready

A transparent, open-source pipeline — supported as a documented, stable platform that brings the best community features together with enterprise-grade support and releases

A transparent, open-source pipeline — supported as a documented, stable platform that brings the best community features together with enterprise-grade support and releases

A transparent, open-source pipeline — supported as a documented, stable platform that brings the best community features together with enterprise-grade support and releases

Cloud-agnostic by design
Cloud-agnostic by design
Cloud-agnostic by design

Deploy on your existing cloud infrastructure of choice. All subsystems are open source and can run on any public or private cloud—no vendor lock-in

Deploy on your existing cloud infrastructure of choice. All subsystems are open source and can run on any public or private cloud—no vendor lock-in

Deploy on your existing cloud infrastructure of choice. All subsystems are open source and can run on any public or private cloud—no vendor lock-in

Build with the creators
Build with the creators
Build with the creators

Work directly with the creators of the platform to accelerate your project. We can train your team to get productive quickly or partner with you to design and build a custom end-to-end application

Work directly with the creators of the platform to accelerate your project. We can train your team to get productive quickly or partner with you to design and build a custom end-to-end application

Work directly with the creators of the platform to accelerate your project. We can train your team to get productive quickly or partner with you to design and build a custom end-to-end application