I recently answered a client question regarding an application that could be based on either business intelligence (BI) or text analytics technology.
Question: “We receive quarterly and yearly account statements for our investments. Each statement will contain key pieces of information, such as total contributions, expenses, interest income, etc. However, each vendor may format their account statement differently. We’d like to build/buy a solution to extract the values from the account statements based on keywords.”
Here’s how I answered it:
- the reports are available in a digital format, such as PDF, HTML, etc.
- and the numbers you are looking to extract are mostly in a table/grid format
- and the total number of different report formats is manageable (you can manually go through and map tables/numbers to the metrics you need to extract)
- then most BI vendors can do the job. (See our latest research on who’s who here and here.)
- A vendor that specializes in so-called “report scraping” is Datawatch.
- However, if:
- you need to scan/OCR these reports (where the accuracy is never 100%)
- and/or there are hundreds of different report formats
- and/or the numbers are buried in freeform text (not in tables/grids)
- then you’ll need a text analytics platform. (See our latest research on who’s who here and here.)
- And “train” that system for each document type.
In either case, an RPA tool can be used to automate the process.