skip to main content

Auto-classification is much more than a security tool

by Stuart Robertson : Improve

Auto-classification is the ability to scan the contents of a document and automatically assign categories and keywords found in the document contents. If this seems an unnecessary extra feature for your content repository then consider what happens without auto-classification – either content is untagged and its purpose can only be derived from its location/filename or, metadata is manually entered by users. However, users hate repetitive tasks and are very poor at entering metadata giving rise to amusing situations arising such as the organisation that found most of its documents were classified at relating to “asbestos” … it was the default value in the classification list!

Uptake of auto-classification software has been largely driven by the need to meet GDPR legislation but does this type of tool also provide business advantage, perhaps resolving issues such as: -

  • “I know it’s on the intranet but can I find it!”
  • Search requires a very specific term or it fails to find the document.
  • Users fail to enter metadata consistently.

With the advent of artificial intelligence, one of the areas in office systems that is a prime candidate for AI is enhanced searching. If we can relate expressions or rather the concepts that are expressed in the document text, then we can start to use this logic to generate more relevant search results from our data.

We are used to internet search engines where you either achieve 10’s of thousands or no relevant results! With auto-classification it is possible to generate a more consistent and accurate search result. However, it would still be rather limited in capability if the classification and search simply relies on combinations of keywords that have been automatically identified.

What we need is a classification system that understands concepts and inter-related content. No human is going to spend the time and effort doing such a task so what is needed is an Artificial Intelligence classification system that can achieve this result in the background without placing excessive overheads and demands on the people or IT infrastructure.

Once the classification process has been enhanced to relate similar concept expressions then we move into the realm of intelligent search rather than just combined keyword searching. In this way we can deliver results that support users and make the use of our information stores far more productive and reliable.

Is this possible today? The answer is yes!

A great example we had is a global audit firm where there were issues with identification and repurposing of work and the associated quality of information. Re-inventing work product is an obvious wasted effort.

By ‘concept’ tagging and classifying content, we improve search and automatically identify information that before would not have been found. Intelligent content in context is now automatically retrieved at the point of need, increasing the access, re-purposing, and reuse of high value content.

The users are now capable of finding information based on concepts which are related to those requested. The sophistication of this technique promotes the most relevant results whilst setting thresholds to ensure that catchall results, typically produced by simplistic search techniques, are demoted.

‘Compound term processing’ automatically identifies the concepts within documents. The ability to extract meaning enables high quality content in context to rise above the multitude. Enabling our users to more readily find the high value content for use in their own work stream brings considerable advantage.

In this way Information security now has a dual purpose of both protection from risk and increased efficiency of business process.

Leave a comment