metadata
Less than 1 minute
metadata
mindmap
auto-discovery
- auto-discovery is the process of automatically discovering data assets in a data lake.
- metadata collection should be automated as possible from sources like databases, files through crawlers. This feeds into the catalog.
auto-classification
- auto-classification is the process of automatically classifying data assets in a data lake.
- metadata should be classified automatically based on the content of the data. This helps in data governance and data security.
- using machine learning techniques to classify data assets is a good way to achieve auto-classification.
auto-tagging
- auto-tagging is the process of automatically tagging data assets in a data lake.
- identify commonly used terms, business glossary etc. to apply relevant tags to aid searchability.
- auto-tagging can be achieved using machine learning techniques.
data-lineage
- data lineage is the process of tracking the data from its source to its destination.
- the data lineage should be automatically tracked and maintained in the metadata.
- it can be achieved by injecting metadata recording layer to the data processing tasks and the data storage tasks.
user-customization
- it's important to allow admin and users to manually tag, classify assets.
- it's also the key solutions to handle the cases that auto-classification and auto-tagging can't handle.