Skip to main content
data-lake

mindmap

what's a data lake

  • A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • It's a concept, similar to cloud computing, but not a specifc technology
  • It's an architectural approach that allows enterprises consolidate large heterogeneous data assets at scale and uncover actionable insights from the consolidated data through various types of analytics

ben.wangzAbout 2 mindata-lakebig-datadata-lake
datahub

ingest metadata of datasets

  • ingest binary files from s3
  • ingest tables(parquet, csv) from s3
  • ingest custom datasets from s3
    • virtual files
    • files with special metadata contains in its content

query metadata of datasets


ben.wangzLess than 1 minutedata-lakebig-datadata-lakedatahub