
DocAI - Coming Soon
DocAI - Coming Soon
Turn complex documents into trusted structured data
Docimoto DocAI is an end-to-end information extraction pipeline for PDF, Excel, Word, and email. DocAI has built-in data provenance, confidence scoring, and human-in-the-loop review





About Docimoto
About Docimoto
Docimoto builds accuracy-focused document intelligence systems—combining vision models, open LLMs, domain ontologies and domain knowledgebases to extract data you can audit and operationalize. We also offer consulting and custom software to take pilots to production
Docimoto builds accuracy-focused document intelligence systems—combining vision models, open LLMs, domain ontologies and domain knowledgebases to extract data you can audit and operationalize. We also offer consulting and custom software to take pilots to production
Docimoto builds accuracy-focused document intelligence systems—combining vision models, open LLMs, domain ontologies and domain knowledgebases to extract data you can audit and operationalize. We also offer consulting and custom software to take pilots to production
Provenance & traceability
Provenance & traceability
Confidence-driven review (HITL)
Confidence-driven review (HITL)
Domain rules & ontology hooks
Domain rules & ontology hooks
Cloud-agnostic deployment
Cloud-agnostic deployment
Open-source core
Open-source core
End-to-end process support
End-to-end process support
Provenance & traceability
Confidence-driven review (HITL)
Domain rules & ontology hooks
Cloud-agnostic deployment
Open-source core
End-to-end process support
Open Source & Research
Open Source & Research
We build open-source document intelligence systems that bridge machine learning research and production-grade engineering
We build open-source document intelligence systems that bridge machine learning research and production-grade engineering
We build open-source document intelligence systems that bridge machine learning research and production-grade engineering
Our work focuses on structured information extraction from Excel, PDF, Word, and image documents—particularly Excel, an under-explored yet business-critical format
Our work focuses on structured information extraction from Excel, PDF, Word, and image documents—particularly Excel, an under-explored yet business-critical format
Our work focuses on structured information extraction from Excel, PDF, Word, and image documents—particularly Excel, an under-explored yet business-critical format
TableSense2
The tableSense2 visual model based on excel structure, 47 cell features include formats, formulas, merges, comments, etc.
The tableSense2 visual model based on excel structure, 47 cell features include formats, formulas, merges, comments, etc.
The tableSense2 visual model based on excel structure, 47 cell features include formats, formulas, merges, comments, etc.
Tags: Computer Vision, CNN, Transformers, PyTorch, python, Excel
Tags: Computer Vision, CNN, Transformers, PyTorch, python, Excel
Tags: Computer Vision, CNN, Transformers, PyTorch, python, Excel
TableSense2SimpleAnnotationUI
A Simple browser based UI for creating annotations to train TableSense2 models excel table detection
A Simple browser based UI for creating annotations to train TableSense2 models excel table detection
A Simple browser based UI for creating annotations to train TableSense2 models excel table detection
Tags: Javascript, python
Tags: Javascript, python
Tags: Javascript, python
DocAI
A broader end-to-end DocAI reference platform is under active development and will be released incrementally as open source
A broader end-to-end DocAI reference platform is under active development and will be released incrementally as open source
A broader end-to-end DocAI reference platform is under active development and will be released incrementally as open source
Tags: python, node, React, Kubernetes, LLM, Postgres, pgvector, Vision Models
Tags: python, node, React, Kubernetes, LLM, Postgres, pgvector, Vision Models
Tags: python, node, React, Kubernetes, LLM, Postgres, pgvector, Vision Models
Github Link: Comming soon
Github Link: Comming soon
Github Link: Comming soon
Features
Features
Built from hands‑on experience across numerous ML and AI information extraction projects
DocAI elevates your team’s productivity
Visually traceable data extraction
Every extracted value links back to the exact region in the original document, with layout and formatting preserved
Visually traceable data extraction
Every extracted value links back to the exact region in the original document, with layout and formatting preserved
Visually traceable data extraction
Every extracted value links back to the exact region in the original document, with layout and formatting preserved
Confidence levels & exceptions
Confidence levels & exceptions
Confidence levels & exceptions
Low-confidence extractions trigger a shared human-review workflow, giving all stakeholders visibility into document progress and resolution
Low-confidence extractions trigger a shared human-review workflow, giving all stakeholders visibility into document progress and resolution
Low-confidence extractions trigger a shared human-review workflow, giving all stakeholders visibility into document progress and resolution
Built for any domain
Built for any domain
Built for any domain
Our platform gives you domain‑independent, end‑to‑end workflows and plug‑and‑play hooks. You get a powerful foundation you can tailor and bring your applications to life quickly with less effort
Our platform gives you domain‑independent, end‑to‑end workflows and plug‑and‑play hooks. You get a powerful foundation you can tailor and bring your applications to life quickly with less effort
Our platform gives you domain‑independent, end‑to‑end workflows and plug‑and‑play hooks. You get a powerful foundation you can tailor and bring your applications to life quickly with less effort
Production ready platform
Production ready platform
Production ready platform
Batch processing, parallelism, and structured outputs, with built-in traceability and controls to support your SOC 2 and ISO-aligned certifications
Batch processing, parallelism, and structured outputs, with built-in traceability and controls to support your SOC 2 and ISO-aligned certifications
Batch processing, parallelism, and structured outputs, with built-in traceability and controls to support your SOC 2 and ISO-aligned certifications
Open core, Enterprise-ready
Open core, Enterprise-ready
Open core, Enterprise-ready
A transparent, open-source pipeline — supported as a documented, stable platform that brings the best community features together with enterprise-grade support and releases
A transparent, open-source pipeline — supported as a documented, stable platform that brings the best community features together with enterprise-grade support and releases
A transparent, open-source pipeline — supported as a documented, stable platform that brings the best community features together with enterprise-grade support and releases
Cloud-agnostic by design
Cloud-agnostic by design
Cloud-agnostic by design
Deploy on your existing cloud infrastructure of choice. All subsystems are open source and can run on any public or private cloud—no vendor lock-in
Deploy on your existing cloud infrastructure of choice. All subsystems are open source and can run on any public or private cloud—no vendor lock-in
Deploy on your existing cloud infrastructure of choice. All subsystems are open source and can run on any public or private cloud—no vendor lock-in
Build with the creators
Build with the creators
Build with the creators
Work directly with the creators of the platform to accelerate your project. We can train your team to get productive quickly or partner with you to design and build a custom end-to-end application
Work directly with the creators of the platform to accelerate your project. We can train your team to get productive quickly or partner with you to design and build a custom end-to-end application
Work directly with the creators of the platform to accelerate your project. We can train your team to get productive quickly or partner with you to design and build a custom end-to-end application
