Has Digitalization Led To The Problem Of Unstructured Data

Madhur Gulati July 26 2018

Before we actually get into details as to how digitalization has contributed to unstructured data, we really need to understand what is meant by the terms, Digitalization and Unstructured Data.

Digitization: is the process of converting information into a digital format. In this format, information is organized into discrete units of data (called bits) that can be separately addressed (usually in multiple-bit groups called bytes).

Unstructured Data: is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. (Source: https://en.wikipedia.org/wiki/Unstructured_data)

Now to establish connections between above two, I begin with a point, that every day there is new evolution happening in Technology space, and in addition to this desire to digitalize everything around us is also gaining momentum.

However, we haven’t thought that this process will solve our problem, or will lead to a bigger problem which will be common across all the current verticals and new verticals of the future.

Actually, if we do deep thinking around this then we will realize that instead of creating a solution for the digital world or digitized economy we have actually paved the path for making data as unstructured or for that matter Semi/Quasi structured, and this heap/pile of unstructured data is growing day by day.

Certain questions crop in our minds that what are various factors which are contributing to the unstructured data pile. Some of them are mentioned below:

  1. The rapid growth of the Internet leading to data explosion resulting in massive information generation.
  2. Data which is digitalized and given some structure to it.
  3. Free availability and easy access to various tools that help in the digitization of data.

The other crucial angle for unstructured data is how do we manage it.

Some insights and facts around unstructured data problem, that stresses it is a serious affair:

  • According to projections from Gartner, white-collar workers will spend anywhere from 30 to 40 percent of their time this year managing documents, up from 20 percent of their time in 1997
  • Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data – commonly appearing in e-mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations, and Web pages

(Source – http://soquelgroup.com/wp-content/uploads/2010/01/dmreview_0203_problem.pdf)

  • Nearly 80% of enterprises have very little visibility into what’s happening across their unstructured data, let alone how to manage it.

(Source – https://www.forbes.com/sites/forbestechcouncil/2017/06/05/the-big-unstructured-data-problem/2/#5d1cf31660e0Source –)

Is there a solution to this?

In order to answer the above question, I would say data (information) in today’s world is Power, and Unstructured data is tremendous power because the essence/potential is still untapped, which when realized effectively and judiciously can turn fortunes for the organizations.

On the other hand, Organizations and business houses which are trying to extract meaning/sense out of this chaotic mess will be well-positioned to reap competitive edge and will have a competitive advantage among the peer group.

Areas to focus on addressing the problem related to unstructured data are.

  1. Raising awareness around it.
  2. Identification and location in the organization.
  3. Ensure information is searchable
  4. Make the content context and search friendly
  5. Build Intelligent content.

The good news is that we, at Magic, realized the quantum of this challenge sometime back and hence have designed a set of offerings specifically designed to solve the unstructured & semi-structured data problem for the financial services industry.

Magic FinServ focuses on 4 primary data entities that financial services regularly deals with:

Market Information – Research reports, News, Business and Financial Journals & websites providing Market Information generate massive unstructured data. Magic FinServ provides products & services to tag meta data and extracts valuable and accurate information to help our clients make timely, accurate and informed decisions.

Trade – Trading generates structured data, however, there is huge potential to optimize operations and make automated decisions. Magic FinServ has created tools, using Machine Learning & NLP, to automate several process areas, like trade reconciliations, to help improve the quality of decision making and reduce effort. We estimate that almost 33% effort can be reduced in almost every business process in this space.

Reference data – Reference data is structured and standardized, however, it tends to generate several exceptions that require proactive management. Organizations spend millions every year to run reference data operations. Magic FinServ uses Machine Learning tools to help the operations team reduce the effort in exception management, improve the quality of decision making and create a clean audit trail.

Client/Employee data – Organizations often do not realize how much client sensitive data resides on desktops & laptops. Recent regulations like GDPR make it now binding to check this menace. Most of this data is semi-structured and resides in excels, word documents & PDFs. Magic FinServ offers product & services that help organizations identify the quantum of this risk and then take remedial actions.

h

Madhur Gulati

Managing Consultant

SHARE THIS BLOG

Enterprise Data Management

Sophisticated tools, products & services to manage data repositories, organize business glossaries, improve data lineage and quality

Get insights straight into your inbox!