Automating The
Document PRocess
The logistics industry is making the move to digital communication, but a lot of the relevant paperwork for each shipment still remains on paper. Our client wanted to revamp the way that they handle documentation that they receive from customers, carriers, and vendors, and provide a better and faster method for their employees to file paperwork.
The Problem
All employees shared one email inbox for processing customer paperwork. Each employee had a list of customers that they were responsible for overseeing. When an email came in with PDF paperwork from their customer list, an employee would download the file from email to their computer, then reupload it to their Logistics Platform web application.
The old interface for filing the documentation was cumbersome. Employees had to divide up the document by typing in page numbers to indicate where to split the PDF into sets of pages.
There was no automation to detect any of the document content. It was all visually inspected by employees and information (such as invoice bill totals) was copied in to the Logistics Platform during the filing process.
Once paperwork was filed, there were no preview images to easily see the page content attached to shipment paperwork. Users only had a list of links to PDF files which they could open in a new window to view.
“Since going live with the new document processing system in Spring 2020, our client has used the platform to process over 35,000 files representing more than 110,000 pages of paperwork.”
The Solution
We tackled this problem by building out a multipart document processing platform that integrates with their existing Logistics Platform:
A new application that uses Sendgrid Inbound Parse to accept received emails. It validates and extracts the email content and attachments, replies with an automated confirmation/rejection email to the sender, and saves the PDF, TIFF, and image files and email metadata into the system and initiates the automatic text recognition and processing system.
A file preprocessing application that creates images for each PDF or TIFF document page, to eliminate the tedious page splitting process by allowing a simpler way to select, group, and categorize pages together.
An integration with Amazon’s Textract DetectText OCR service to capture printed text and handwritten additions on scanned paperwork as well as digital files.
A new interface powered by React and GraphQL to give employees a faster and more user-friendly method of filing paperwork.
Show all Documents that have come through the system with preview thumbnails, suggested reference numbers, and filters by date, reference number, page types, and filing status to easily search and find specific documents
Documents are pre-tagged with suggestions from the OCR system for reference numbers and page categories (e.g. invoices, shipment manifests, confirmation and delivery receipt documents, and more) which allows users to simply confirm information rather than retyping it
Widgets are displayed in a sidebar allowing employees to view and update carrier and shipment information related to the documents without having to look it up separately in the Logistics Platform
Automation that extracts key values from certain documents and compares values and reference numbers with known data in the system to send invoice paperwork not in need of employee review directly to the accounting department to start the payment process
A system that learns as it goes - using Amazon’s Expense Analyze machine learning combined with employee input to recognize common page layouts and use that information to get more accurate values (such as invoice totals, dates, reference numbers) from the document content as it receives more training data.