Introduction

Ultimate guide to automated data capture

Automated data capture, forms processing or even intelligent forms automation...

In this guide, we aim to simplify data capture and offer practical advice on how to successfully implement it as part of your organisation's digital transformation.

What is data capture?

Each day we complete a variety of paper forms – questionnaires, credit applications, evaluation forms and many others.

To collect the data from paper forms, organisations have two options – labour intensive manual data entry or the use of automated data capture software to extract, read and capture data using OMR, OMR and ICR document recognition technology.

Manual data capture

Forrester Research estimate 80% of all information processed is still paper-based and if you search for ‘data entry jobs’ on Google, you’ll find thousands of job opportunities.

Many people prefer to fill out paper forms rather than web forms and some may not even have access to the internet. At the same time, many businesses have a legal obligation to maintain a physical copy or a requirement to capture an ink signature.

This means paper is still pervasive. And manual data capture is still the most common method for collecting information from paper questionnaires, surveys and forms.

With manual data capture, you transcribe data from a paper form into an electronic format e.g. Microsoft Excel. The paper form could be a completed survey returned by a study participant or an internal training evaluation form.

However, manual data entry is expensive and labour intensive. It is prone to human error and can be time-consuming when re-keying data or handling paper copies.

Automated data capture

With automated data capture software, you scan the paper survey on-site using a high-speed scanner rather than key-in data manually.

As documents are scanned, automatic data capture software reads the information and extracts the data using OMR, OCR, ICR, barcode, and signature recognition technology.

If the automated data capture software is unsure about any character or field, it is held in a verification system for you to confirm or correct the data.

Once verified, the data is exported to either an existing database or common format (e.g. Excel, CSV, XML or SPSS).

Many platforms, including TeleForm, now support paper and electronic versions of a form. This simplifies the process of data collection for organisations and allows users to provide more accurate data in a secure online environment. These platforms validate information in real-time and route data to back-end databases and/or individuals for review and approval.

Why do organisations adopt automated data capture software?

There are a number of issues caused by manual data capture that are driving adoption of automated forms processing software. These include:

The amount of time spent keying (and re-keying) data.
The risk of human error (i.e. typographical) when manually transcribing data
The amount of paper stored by the business and time spent searching for paper copies and filing
The cost of outsourced paper storage and employing people to manually input data
The risk of compliance and audit breaches due to lost/missing paperwork
Employees who quickly become demotivated by the tedious, repetitive nature of manual data entry

What types of document can be scanned and captured?

The type of document suitable for automated data capture can be divided into three groups: structured, semi-structured and unstructured.

Structured
documents

Each form is identical in appearance and layout and this makes is easier to extract the data for further processing.

Common examples of structured documents include:

  • questionnaires
  • surveys
  • exam answer sheets

Semi-structured documents

Semi-structured documents include the same fields but the layout of these fields is not fixed.

Common examples of semi-structured documents include:

  • invoices
  • purchase orders
  • sales orders

Unstructured documents

Unstructured documents vary in layout making pre-defining the location of the required information impractical or impossible. However, intelligent document recognition technology automatically locates information using pattern-matching techniques.

Common examples of unstructured documents include:

  • letters
  • contracts

What are the main document recognition technologies?

Manually keying data from forms and documents into your business systems is time consuming, costly and error prone.

Our solutions use intelligent document recognition technology to extract index information from any file including semi-structured documents, such as invoices and sales orders, and unstructured documents, such as letters and contracts.

ICR (Intelligent Character Recognition)

With ICR technology, short handwritten text can be read and processed.

OMR (Optical Mark Recognition)

OMR technology identifies if a checkbox (e.g. yes/no or male/female) has been filled in by calculating the ink percentage in the field.

OCR (Optical Character Recognition)

To read machine printed text from constrained print fields (one character per box) such as names and numbers.

Barcode Recognition

To recognise all standard barcode types, including 2D matrix barcodes.

Signature detection

To check forms have been authorised or not by calculating the fill percentage of the field.

How automated data capture software works?

Build your form

Form design is an important first step in the data capture process.

In order for OMR/ICR/OCR document recognition technologies to read, interpret and extract data from your survey, questionnaire or exam paper, you will need to create pre-defined form templates, either from your existing designs or as new forms.

Ideally, you will redesign your forms to improve the robustness of identification, hand-print recognition and processing speed. Examples include constrained print fields for handwriten text and choice fields for multiple choice questions.

However, if you prefer, you can scan an existing form and overlay the structure on top to allow the automated data capture software to capture your existing designs.


Scan your form

Automated data capture software allows you to capture data from scanners, web, mobile, email and fax.

Once paper surveys have been created, printed and filled in, they need to be converted from paper to an electronic image. This is achieved by scanning the paper surveys and forms using a high-speed scanner either on-site or from remote sites and branch networks.


Evaluate and extract your data

Once the images are captured, automated data capture software will identify the paper based on the original template type.

At this stage, the image will undergo image pre-processing to confirm barcode readability, page size, presence of cornerstones etc.

After identifying a form, hand print (ICR), machine print (OCR) and checkbox (OMR) recognition technology extracts field data from a scanned document.

Simple rules such as alpha, numeric, dictionaries, date ranges, database lookups and mandatory fields will be checked at this stage with exceptions queued for human review. The entire process takes minutes and is more accurate than manual data entry.


Verify your data

If human verification is required, scanned images of any field or form needing review are displayed, with operators able to confirm or correct misread characters.

Confidence thresholds for the verification stage can be set on a form template basis. The typical confidence threshold for a production system is around 85% confidence in a character for it to bypass human verification.

Only invalid or unrecognisable data is highlighted to an operator. Once misread characters have been corrected, the data can be exported or archived.

Is this Jones or Tones?

Export your data

Once the process is complete, each form image – and all verified data - is exported to 3rd party business systems and repositories via a powerful API or to different searchable text formats including CSV, XML, SQL, Excel etc.


Archive your data

If you need to archive your data for compliance and audit purposes, verified data can be exported to records retention systems for indexing, storage and retrieval of paper and electronic forms.

Record retention systems enable auditors to perform keyword searches for files and retrieve specific documents, or parts of a document.


Electronic forms

Paper is still relevant in most environments today and distributing a paper questionnaire is still one of the best ways to reach certain demographics.

However, many organisations are undergoing a complete digital transformation and are looking for efficient ways to collect and process information. Consequently, more organisations are now seeking intelligent data capture software that support both paper and electronic forms, allowing them to take on digital transformation at their own pace.

A good automated data capture system will include options for data capture from paper and web forms.

Key features of intelligent data capture software

Form creation and distribution

Design form templates optimised for accurate data capture and high capture rates. Typical form components include constrained print fields, choice fields, signatures fields and many more.

Capture all data formats

Automated data capture software extracts all data formats e.g. handwritten text, checkboxes, machine printed text, barcodes and signatures.

Efficiently collect data

Capture important data and content from MFPs, scanners, fax, email, web and smartphones.

High volume forms processing

Capture thousands of structured/semi-structured/unstructured documents every day and scale up or down.

Web and mobile data capture

Capture documents from remote sites and/or branch networks to reduce processing delays and save money on logistics.

Supports paper/PDF forms/web forms/mobile forms

A good intelligent forms automation solution will cover the entire spectrum of forms capabilities to collect and process information using paper, PDF, electronic, and mobile forms

Accelerate digital transformation

Automated data capture software is most commonly associated with effective information collection where data is collected, verified and exported to a database. However, a good solution will support the efficient processing of the information collected by routing it for review and approval.

How to optimise paper forms for scanning

A well-designed form consisting of form elements (e.g. choice fields), cornerstones, barcodes and ID blocks improves the processing of the scanned form and reduces manual intervention.

Cornerstones

Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

Cornerstones

Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

Cornerstones

Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

Cornerstones

Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

Cornerstones

This will ensure that scanners are adjusted correctly to capture good quality images of potentially very light pencil-written responses.

Form identification block

Form identification blocks help with page rotation (where the sheet is scanned upside down) and answer sheet identification for reconciliation. It will also help with version control should future versions of the form need to be used in parallel with older but very similar forms.

Form identification block

Form identification blocks help with page rotation (where the sheet is scanned upside down) and answer sheet identification for reconciliation. It will also help with version control should future versions of the form need to be used in parallel with older but very similar forms.

Image zones

Image zones can capture machine-print text, hand-print text, barcodes and pictures. This element is often used to capture written narrative responses such as comments with ICR (Intelligent Character Recognition) technology.

Choice fields

Add a choice field to your form to accept a single choice (e.g. yes/no) or multiple choice (e.g. select option A/B/C/D).

Choice Fields consist of a single column, or row, of choices and offer an array of different styles including bubbles, boxes, underlines, responses and brackets.

To improve OMR (Optical Mark Recognition) accuracy, it is recommended to use bubbles and boxes than responses and brackets.

Constrained print fields

Handwriting is most effectively read by OCR (Optical Character Recognition) technology when it is laid out in a constrained print field, with one character per box.

Form designers can add a constrained print field to gather data such as names, dates, and numeric figures. The boxes act as guides for the person filling out the form, with one dedicated space for each letter, number or character in the response.

Signature field

Add a signature field to checks forms have been authorised. The software will calculate the fill percentage of the field to determine whether it was signed or not. Signature fields will be extracted and stored as an image.

  • Cornerstones

    Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

  • Cornerstones

    Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

  • Cornerstones

    Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

  • Cornerstones

    Cornerstones will assist with page alignment, de-skewing images and adjusting for print or scan size variations.

  • Cornerstones

    This will ensure that scanners are adjusted correctly to capture good quality images of potentially very light pencil-written responses.

  • Form identification block

    Form identification blocks help with page rotation (where the sheet is scanned upside down) and answer sheet identification for reconciliation. It will also help with version control should future versions of the form need to be used in parallel with older but very similar forms.

  • Form identification block

    Form identification blocks help with page rotation (where the sheet is scanned upside down) and answer sheet identification for reconciliation. It will also help with version control should future versions of the form need to be used in parallel with older but very similar forms.

  • Image zones

    Image zones can capture machine-print text, hand-print text, barcodes and pictures. This element is often used to capture written narrative responses such as comments with ICR (Intelligent Character Recognition) technology.

  • Choice fields

    Add a choice field to your form to accept a single choice (e.g. yes/no) or multiple choice (e.g. select option A/B/C/D).

    Choice Fields consist of a single column, or row, of choices and offer an array of different styles including bubbles, boxes, underlines, responses and brackets.

    To improve OMR (Optical Mark Recognition) accuracy, it is recommended to use bubbles and boxes than responses and brackets.

  • Constrained print fields

    Handwriting is most effectively read by OCR (Optical Character Recognition) technology when it is laid out in a constrained print field, with one character per box.

    Form designers can add a constrained print field to gather data such as names, dates, and numeric figures. The boxes act as guides for the person filling out the form, with one dedicated space for each letter, number or character in the response.

  • Signature field

    Add a signature field to checks forms have been authorised. The software will calculate the fill percentage of the field to determine whether it was signed or not. Signature fields will be extracted and stored as an image.

Key benefits of intelligent data capture software

Improve productivity

Allow your employees to focus on more productive tasks rather than time-consuming document sorting and manual data entry.

Improve data accuracy and lower risk

Intelligent OCR/OMR/ICR document recognition technology captures your data with 95% accuracy; the remaining 5% is flagged for human verification.

Reduce costs

Save up to 80% in individual departments or across the organisation by eliminating manual document sorting and reduced data entry. You will also save money on postage and transportation to scanning bureaus as well as end reliance on expensive third-party storage contracts.

Eliminate human error

Reduce typographical errors from manual data entry by extracting data from forms using OCR, ICR, OMR, barcode, and logo recognition technologies.

No lost paperwork

Questionnaires and surveys are scanned on-site with a digital copy immediately available and stored in a records retention system.

Automatic export and archival

Once content is verified, it can be exported into business repositories or third-party records retention and storage systems for archiving. Auditors can access archived information, including time-stamped approval decisions, documents and comments.

Looking for automated data capture software?

Find out more or talk to us today to discuss your requirements.

We will not share your details outside of our organisation and will only contact you to discuss your enquiry. Read our privacy policy.

© ePartner Consulting Ltd 2004-2019 | Company registration number: 05192543. | VAT number: GB842064740.

Registered address: St Ann's House, Guildford Road, Lightwater, Surrey, GU18 5RA, United Kingdom. Tel: +44 (0)3300 100 000.

Our accreditations