Home » Data on-boarding from unstructured sources: Bridging a critical gap in leveraging industry platforms

Data on-boarding from unstructured sources: Bridging a critical gap in leveraging industry platforms

Naina Sethi March 11 2021

Ingesting Unstructured data into other Platforms

Industry specific Products / Platforms like the ERP for specific functions and processes have contributed immensely to enhancing efficiency and productivity. SI partners and end-users have focused on integrating these platforms with existing workflows through a combination of customization/configuring of these platforms and re-engineering existing workflows. Data Onboarding is a critical activity however it has been restricted to integrating the platforms with the existing ecosystem. A key element that is very often ignored is integrating Unstructured Data sources in the Data Onboarding process.

Most enterprise-grade products and platforms require a comprehensive utility that can extract and process a wide set of unstructured documents, data sources and ingest the output into a defined set of fields spread across several internal and third-party applications on behalf of their clients. You are likely extracting and ingesting this data manually today, but an automated utility could be a key differentiator that reduces time, effort and errors from this extraction process.

Customers have often equated use of OCR technologies as solutions to these problems, however OCR suffers from quality and efficiency issues thereby requiring manual efforts. More importantly OCR extracts the entire document and not just the relevant Data Elements, thereby adding significant noise to the process. And finally, the task of ingesting this data into the relevant fields in the applications / platforms is still manual.

When it comes to widely used and “customizable” case management platforms for Fincrime applications, CRM platforms, or client on-boarding/KYC platforms, there is a vast universe of unstructured data that requires processing outside of the platform in order for the workflow to be useful. Automating manual extraction of critical data elements from unstructured sources with the help of an intelligent data ingestion utility enables users to repurpose critical resources tasked with repetitive offline data processing.

Your data ingestion utility can be a “bolt on” or a simple API that is exposed to your platform. While the document and data sets may vary, as long as there is a well-defined list of applications and fields that are required to be populated, there is a tremendous opportunity to accelerate every facet of client lifecycle management. There are several benefits to both “a point solution” which automates extraction of a well-defined document type/format as well as a more complex, machine learning based utility for a widely defined format of the same document type.

Implementing Data Ingestion

An intelligent pre and post processing data ingestion can be implemented in 4 stages, each stage increasing in complexity and value extracted from your enterprise platform:

Stage 1

Automate the extraction of standard templatized documents. This is beneficial for KYC and AML teams that are handling large volumes of standard identification documents or tax filings which do not vary significantly.

Stage 2

Manual identification and automated extraction of data elements. In this stage, end users of an enterprise platform can highlight and annotate critical data elements which an intelligent data extraction utility should be able to extract for ingestion into a target application or specified output format.

Stage 3

Automated identification and extraction as a point solution for specific document types and formats.

Stage 4

Using stage 1-3 as a foundation, your platform may benefit from a generic automated utility which uses machine learning to fully automate extraction and increase flexibility of handling changing document formats.

You may choose to trifurcate your unstructured document inputs into “simple, medium, and complex” tiers as you develop a cost-benefit analysis to test the outcomes of an automated extraction utility at each of the aforementioned stages.

Key considerations for an effective Data Ingestion Utility:

Your partner should have the domain expertise to help identify the critical data elements that would be helpful to your business and end users
Flexibility to handle new document types, add or subtract critical data elements and support your desired output formats in a cloud or on-premise environment of your choice
Scalability & Speed
Intelligent upfront classification of required documents that contain the critical data elements your end users are seeking
Thought leadership that supports you to consider the upstream and downstream connectivity of your business process