Getting Our Act Together; Why Text Mining Matters
We are storing more digital information than we ever thought possible. Our in-house documents and data repositories have grown so large that doing effective research can easily consume days of time. Finding pertinent documents and information is challenging enough, but we also need to separate the correct from the incorrect and the outdated from the relevant. AI and modern search tools have given us a leg up, but AI is still lagging in its ability to summarize found content adequately. Several organizations and companies have been working on local Large Language Model applications to mine text. However, they are not there yet. AI can be very disappointing when summarizing content; it struggles with context—particularly when the content is spread throughout many files. Plus, we cannot upload thousands of documents to a chatbot. Consider the difficulty of a team of ten engineers, each with a unique assignment and focus, culling 2,400 documents and writing content for 25 separate watershed reports requested by the client. How long will it take to get everyone up to speed? How much will the labor cost? What happens if someone misses something important? This presentation will highlight an approach to mining documents and finding pertinent information efficiently. It will address challenges in how we find, store, retrieve, and review documents and how we can shorten the research process without sacrificing accuracy or overlooking important information.
Author Bio
John Paine, PE, PH, CFM has extensive experience in hydrology, hydraulics, and floodplain modeling. He has been active in teaching and publishing and has 40 years of progressive experience as a private-sector consultant, working for federal, state, and local government clients. He has multiple registrations as a Professional Engineer, Hydrologist, and Certified Floodplain Manager. Mr. Paine manages GKY’s Hampton Roads office.

