DOCWIRE

Get license
License
Get in touch

Extract text,
the way you like

Toss the papers and digitize your data extraction from all popular file formats. Extracting and processing both structured and unstructured data has never been so easy. It can even be automated!

Docwire SDK Product

HTML

EML

PDF

ODFXML

iWork

OOXML

ODT

ODF

PRF

PPT

XLSB

DOC

XLS

ODT

PAGES

KEYNOTE

HTML

EML

PDF

ODFXML

iWork

OOXML

ODT

ODF

PRF

PPT

XLSB

DOC

XLS

ODT

PAGES

KEYNOTE

HTML

EML

PDF

ODFXML

iWork

OOXML

ODT

ODF

PRF

PPT

XLSB

DOC

XLS

ODT

PAGES

KEYNOTE

HTML

EML

PDF

ODFXML

iWork

OOXML

ODT

ODF

PRF

PPT

XLSB

DOC

XLS

ODT

PAGES

KEYNOTE

HTML

EML

PDF

ODFXML

iWork

OOXML

ODT

ODF

PRF

PPT

XLSB

DOC

XLS

ODT

PAGES

KEYNOTE

PwC logoTausight logoHarpo logo

Process data from all popular formats

No matter if it’s scanned reports or structured excel sheets, the Docwire SDK helps you identify and extract the data you need.

Supported formats

pdf, doc, xls, ppt, odt, ods, odp, iWork, keynote, built-in OCR - scans, bmp, jpg, png, tiff, e-mails - ost, pst, eml and more!

Docwire Digital Document File Formats

Secure processing,
done right

Docwire adds a layer of security by executing all extractions locally, eliminating the dependency on 3rd party web services which significantly increases your protection against attacks. Docwire is, by default, reducing the risk of a leakage occurring compared to other text extraction solutions.

This sort of processing also allows for document scans to occur prior to leaving your custody. Scanning images and documents before they are sent through ex. email ensures that no sensitive information is being leaked through human error.

Runs perfectly on any OS

We strive to help businesses digital solution’s thrive by providing the time-saving backbone of digital document processing.

Attached files?
No problem.

Scan entire email chains & html indexes in seconds, including attachments, and extract the necessary data. EML with an attached JPG? Inbox filled with thousands of invoices? The Docwire SDK extracts and structures it all for you. The best part? It can all be automated.

The Docwire SDK comes packed with our own bespoke text identifier, significantly decreasing the time it takes to identify and extract text from images and other unstructured document sources.

Embedd in your day-to-day automation

Many data processing services & ETL’s lack a well developed solution for dealing with unstructured data. No matter how you process your data, be it through services such as Alteryx and Integromat or your own bespoke software, the Docwire extractor will always fit the bill.

Use your favourite language,
or run it straight in the CLI

Execute functions, automate routine tasks and execute faster - All whilst saving on CPU processing time and operating costs.

Common usecases

Index and extract whole email inboxes

Scan entire email chains in seconds, including attachments, and extract the necessary data. EML with an attached JPG? Inbox filled with thousands of invoices? The Docwire DocToText SDK extracts and structures it all for you. The best part? It can all be automated.

Product Info
Email text extraction
Extract text from any kind of digital documents

Scan images for text and extract data from graphical PDF's, TIFF, PNG and a whole lot more. Search for keywords and structure it in any way you like.

Product Info
Simplify the data extraction from every document format

Scanned documents or tabled XML files, pull targeted data points into a single, structured data set. Effectivize reports at any timeframe by flagging outliers and corroborating data sets to simplify your decision making.

Product info
Excel gradient
Index, scrape and extract data from any website

Crawl through any html document and extract information from it, including tables and attached files, using custom logic built for your needs.

Product Info
Dive into office documents like never before

Grab data from MS, Libre and Apple, including embedded files, and transform them in any way you see fit.

Product Info
Office document extraction
Explore how the Docwire SDK can assist you

Scratch the itch and dig into our githib for details. Don't hesitate to contact us if you have any further questions.

Documentation (Coming soon)
Github gradient