skip to main content

Managing your data with AI technology

by Becky Driscoll : Improve

Managing your data with AI technology

If you are responsible for managing your organisation’s data/information/content then you’re probably facing a pretty daunting task nowadays. There’s an increasing demand to not only protect the data, but also to maximise its usefulness, regardless of which repository it is being stored in.

To step up and meet this demand you’re going to need some quality tools to help you out. One of the disciplines of artificial intelligence (AI) is machine learning (ML) which simply defined by Google’s Yufeng Guo is:

“Using data to answer questions”

There’s a lot of buzz about AI and ML, and rightly so, but how can they be used to help you manage your data and meet these demands?

One of the common demands is simply to know:

“What is our content about?”

Meeting this demand can help you better:

- protect your content, and

- empower your end users to use your content for better collaboration.

Document Management systems have attempted to meet this demand by allowing users to tag documents with metadata. But guess what… nobody likes doing this. At best the document is tagged inconsistently, at worst, simply not tagged at all.

But what if a software program could use data to answer the question: “what is my content about?” Well, that’s basically machine learning (according to our above definition) and we’ve been working with a technology recently that does just that… here’s the context:

Our customer has around 100 million documents on network file shares, urgently they need to know which documents contain personal data so they can take action to protect those documents in line with upcoming GDPR legislation. They also want to know more generally what the documents are about so that their staff’s search experience will be both richer and faster.

To help them we are using a technology called **‘ConceptSearching’**. One of the main questions we have had to answer is:

 “What words or phrases should we look for in a document in order to decide if it should be tagged with a specific term [x]?”

In some cases this is simple, for example if we’re looking to tag our document with a term “Passport Number” we can specify that if a particular Regular Expression is found in the document then we will tag it.

But other times it’s not as simple, for example, if our term is ‘Surgical” and we found a document with the phrase “triple heart bypass” then we probably want to tag it. But we can’t afford for a human to sit down and make a list of all the possible words or phrases that could indicate we have a “Surgical” document. Even if we did, they would miss things and make mistakes.

This is where ConceptSearching can help us…

The technology collects data about the frequency of words and phrases it finds across a batch of documents. Using this data, it intelligently suggests words or phrases that are commonly found, phrases such as “triple heart bypass”, that would indicate the document should be tagged as “Surgical”. In short, the technology is using data to answer a question, and that’s machine learning.

The benefit of all this is we’re able to retrospectively tag our documents with rich metadata in a consistent, automated fashion, regardless of which repository they are stored in, on premise, or in the cloud. Ultimately empowering the organisation to know “what is our content about” and helping them to protect it and maximise its usefulness.