Data Lineage

Data Discovery & Classification

With Data Discovery & Classification, you can identify, mark, and label sensitive and non-sensitive data in your databases. In accordance with the terms defined in the business glossary, it allows to fill the gap beetween your data catalog and your business glossary.

data-classification-video.gif

Overview

The Data Discovery & Classification module offers a suite of functionalities to classify the elements of the data catalog in accordance with terms defined in the business glossary.

With Blindata, you can easily identify and classify data of interest within a database, such as email, tax codes, or PII based on specific rules that are tailored to your domain.

The output of the classification process is a comprehensive and accurate overview of the organization’s data, enabling informed decision-making and ensuring compliance with regulations.

Features

The Data Classification process starts with the extraction of metadata from various data sources, including databases, data warehouses, and cloud storage. Classification rules are then defined to assign labels to data. The process also includes the ability to define dictionaries that make it easier to identify data of interest, streamlining the classification process even further.

Automatically scan the target system’s metadata to identify the physical structures, including the names of tables, fields, and their types.

Define a set of all permissible values of a business glossary term, by manually collect a sample of data or use a query on the target system.

Define a set of customizable criteria that are used to classify data within your target system based on specific business glossary terms.

You can use Regular Expressions or Dictionary comparisons to create rules that apply to both data and metadata.

The Data Classification engine calculates the likelihood of a correct match by applying classification rules to the physical metadata or a sample of database records that you can configure.

The Data Classification engine assigns a business glossary term to the physical structure based on the classification rule evaluations and gives a score that indicates how likely the match is correct.

You can check the score manually or use predefined score thresholds.

How to

Extract Data Catalog

With Blindata’s metadata connectors, you can extract information from databases, data warehouses, and cloud storage in real-time. Keep your metadata up-to-date and accurate!

Define Dictionaries

Simply select the relevant data sources, set your query parameters, review the collected data, and select the permissible values. Save your dictionary and start classifying your data with accuracy and ease.

Validate Assignments

Review assignments and their scores, intervene manually or rely on predefined score thresholds to manage your assignments. Review the evaluated assignments and define thresholds to automate the process.

Connect Business Glossary to Data Catalog

Connect the business glossary to the data catalog either manually or automatically by using the Data Classification module.