Databricks Lakehouse Fundamentals — Summary, Review & Notes

Why • Who • What • How of your first step in Databricks’ Lakehouse

Aditya Bajaj
Product Person

--

Databricks Academy — Lakehouse Fundamentals — Certificate

Why do this:

Most of us, especially in the IT industry deal with the volume of data. And, that requires us to be better at managing and navigating the data landscape. Data warehouse and other tools are great first steps, but as the data grows we need better tools. Data Lakehouse can introduce us to the elevated world of better data management and most importantly using them. This course provides a very high-level overview of challenges in existing Data warehouse and Data Lake systems and the solutions to those problems by Data Lakehouse.

Who should do this:

If you deal with a large volume of data then you should do it. There are some common IT roles that can benefit from this:

  • Data Engineers
  • Data Scientists
  • Product Persons (Product Owner / Manager / Leader) — especially Data/Cloud Product Persons
  • Business / Operations Users — If you work directly with the data and can/want to query the data — build dashboards and so on

What you get:

On completion of this course, you’ll get these three things:

  1. Valuable Learning and curiosity to learn more. You’ll learn about the fundamental differences between Data Lake vs Datawarehouse — which gave birth to the idea of combining benefits to create Lakehouse.
  2. A Certificate of Completion https://api.accredible.com/v1/frontend/credential_website_embed_image/certificate/68528748
  3. Badge to show off on your professional sites — It takes a couple of hours to receive the badge on your registered email id — so don’t panic and sit and celebrate once you complete it ;-)

What is in the course:

Videos included in this training: [Source]

  • Intro to Data Lakehouse
  • Intro to Databricks Lakehouse Platform
  • Intro to Databricks Lakehouse Platform Architecture and Security Fundamentals
  • Intro to Supported Workloads on the Databricks Lakehouse Platform

My Key Takeaway Topics:

  • Differences (shortcomings) between Data Warehouse and Data Lake
  • Why Data Lakehouse (Benefits of)
  • Concepts of: Delta Lake, Unity Catalog, Delta Sharing, Photon, …
  • Identity Management, Security, & Governance
  • Data Management
  • Compute in Data Lakehouse (Classic vs Serverless)
  • Data Engineering, Data Streaming
  • Data Science & Machine Learning capabilities of Databricks Lakehouse

Quick Note from the Website: [Source]

The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance and performance of data warehouses with the openness, flexibility and machine learning support of data lakes.

This unified approach simplifies your modern data stack by eliminating the data silos that traditionally separate and complicate data engineering, analytics, BI, data science and machine learning. It’s built on open source and open standards to maximize flexibility. And, its common approach to data management, security and governance helps you operate more efficiently and innovate faster.

How to go about it:

This is fairly simple to start this learning. My recommendation is that you create an account on the Databricks Academy (rather than starting with the form on the homepage of this course learning).

Start herehttps://www.databricks.com/learn/training/login

Most likely you’ll be a “Customers and prospects”.

I made two mistakes 1. starting with the home page and filling out the marketing form because I was still required to register — So one can avoid that. And, 2. signing up as a Partner — I do not know what is the impact of it. I believe some course content can be different.

You can browse this to get an idea of the course.

My Reviews:

Pros:

  1. This course is a very good technical & functional introduction to the concept.
  2. The course is easy to navigate and follow through with the diagrams which help a lot to understand the complex concepts.

Cons:

  1. This course doesn’t teach you how to use it. I wish there was a chapter to at least try it out.
  2. At some points, the course gets too technical with the expectation of understanding the architecture.

Conclusion:

If you work with large data volume then highly recommend that you invest a few hours to familiarize yourself with the concept of Lakehouse and Databricks Lakehouse. Especially, if you are a Data Product Person (Product Owner/Product Manager/Product Leader) — this course will serve you well.

--

--

Aditya Bajaj
Product Person

Curiosity & Purpose-driven Product Builder. Passionate abt Agile, Design & Experience. I ❤ places, people, books, foods, stories. Proud Father👨‍👦