BI & Data Warehousing with your Lakehouse
Lakeview Dashboards are in public preview!
Lakeview Dashboards offer a new dashboarding experience, optimized for ease of use, broad distribution, governance and security.
In addition to a brand new UX, making it easier to plot insights, Lakeview Dashboard can be shared with users outside of your organization.
Create your first Dashboard now (video)
Governance and Unity Catalog
Discover and Organize your data in your Lakehouse
Building your semantic layer is getting easier. AI-Generated table comments automatically describe your data assets.
This will improve the new Semantic search capabilities, letting you ask questions on your lakehouse using plain text: (eg: List all the tables related to football)
Track your compute resources: Clusters and node types available as System Tables
System tables offer more insight into your lakehouse usage in plain SQL. They are available for Audit logs, Table and Column lineage, Billable usage, Pricing, Cluster and node types, Marketplace listing access, Predictive Optimization
For more information: Databricks Documentation or install the System tables demo with dbdemos
Ingestion and performances
10x faster DML Delta queries with Deletion Vectors (update, delete, merge)
Deletion vectors are going GA! Updating content in your tables doesn’t require the engine to rewrite data anymore (write amplification). Delta Lake automatically flags the deleted or updated rows as separate information, resulting in 10x operation speed!
Deletion vectors are part of Predictive I/O, bringing AI to your Lakehouse for faster queries: See Predictive I/O documentation.
Deletion vectors will be enabled by default starting in DBR14! (default behavior can be changed in your workspace settings). For more information :
Predictive Optimization : Faster queries and cheaper storage
Predictive Optimization leverages Unity Catalog and Lakehouse AI to determine the best optimizations to perform on your data, and then runs those operations on purpose-built serverless infrastructure (VACUUM, OPTIMIZE…). This significantly simplifies your lakehouse journey, freeing up your time to focus on getting business value from your data.
In just a click, you’ll get the power of AI-optimized data layouts across your Unity Catalog managed tables, making your data faster and more cost-effective.
Note: Predictive Optimization metrics are available as system tables (eg: Which tables have been optimized recently)
For more information :
ML & AI + LLMs
Foundation LLM models available in the Market
Llama 2 foundation chat models are now available in the Databricks Marketplace for fine-tuning and deployment on private model serving endpoints.
Each model is wrapped in MLflow and saved within Unity Catalog, making it easy to use the MLflow evaluation in notebooks and to deploy with a single click on LLM-optimized GPU model serving endpoints.
Deploy private LLMs using Databricks Model Serving
These endpoints are pre-configured with GPUs and accelerated to serve foundational models, providing the best cost/performance ratio. This allows you to build and deploy GenAI applications from data ingestion and fine-tuning, to model deployment and monitoring, all on a single platform. Watch the video.
Other updates
Unity Catalog: UCX – Unity Catalog Upgrade Toolkit
Need some help to upgrade your data asset to Unity Catalog? Try the new Databricks Lab project. Explore the Github Repo or Get started with a Video
Unity Catalog: Workspace-Catalog binding in Unity Catalog
While Metastore can be shared across multiple workspaces, you can now bind a catalog to a specific workspace, preventing it to be READ or WRITE from other workspace (ex: “Development” workspace can only READ the “prod” catalog)
Watch the recording to get started
Compute: Libraries are now supported in compute policies
If you are a Workspace admin you can now add libraries to compute policies. Compute that use the policy will automatically install the library. Users can’t install or uninstall compute-scoped libraries on compute that use the policy. Read the cluster policies Documentation
Workflows: Pass parameters in Databricks jobs and if/ else condition
You can now add parameters to your Databricks jobs that are automatically passed to all job tasks that accept key-value pairs.. Additionally, you can now use an expanded set of value references to pass context and state between job tasks. Read the Documentation for Parameters
DAB: Databricks Asset bundles
Bundles, for short, facilitate the adoption of software engineering best practices, including source control, code review, testing and continuous integration and delivery (CI/CD)
Demo Center : Databricks Asset Bundles Demo
In a nutshell
- Databricks Runtime 14.1 is GA: Link
- You can run selected cells in a notebook.
- Structured Streaming from Apache Pulsar on Databricks: Link
- Declare temporary variables in a session which can be set and then referred to from within queries: Link
- Arguments are explicitly assigned to parameters using the parameter names published by the function: Link
- Feature Engineering (Feature Store) in Unity Catalog is GA: Link
- On-demand feature computation is GA. ML features can be computed on-demand at inference time: Link
- Structured Streaming can perform streaming reads from views registered with Unity Catalog.: Link
- Databricks AutoML Generated Notebooks are now saved as ML Artifacts: Link
- Models in Unity Catalog is GA: Link
- You can now drop some table features for Delta tables. Current support includes dropping deletionVectors and v2Checkpoint: Link
- Partner connect now supports Dataiku , Rudderstack and Monte Carlo.
Highlights of the Databricks Blog posts