Microsoft has positioned their new product, Microsoft Fabric as the all-in-one data & analytics platform, covering everything from data movement and ETL processes to data science, real-time analytics, business intelligence, and visualization. Designed as Software as a Service (SaaS), the solution combines different areas into one integrated analytical system with OneLake as a foundational data layer. In this article, we will introduce you to the OneLake technology, its latest features, and its use cases in the analytics domain, particularly focusing on Mirroring and (multi-cloud) shortcuts.
OneLake as a central data hub
OneLake is the data layer all Fabric experiences are built upon. This layer is associated with a Microsoft Entra tenant and therefore may be used for the data storage needs of the entire organization. It can be considered the central data hub and can be compared to OneDrive in Microsoft365.
There are two central concepts:
One Format
One Copy
One Format refers to the fact that the open standard format Delta Parquet is used as the single common format for (tabular) data. That also means that all compute engines are optimized to work with Delta Parquet as their native format.
One Copy describes the principle of a unified data storage, eliminating the need for data duplication and movement within the organization’s digital infrastructure. There is one physical copy of a particular data entity that can be referenced through so-called shortcuts across capacities and workspaces within the lake. Once data is stored, it is accessible by all engines (e.g., T-SQL, Apache Spark, and KQL) without requiring any further import or export. Additionally, all data is organized in a hierarchical namespace in the OneLake data hub, similar to File Explorer. The following picture displays an overview of these concepts with the separation of storage and compute.
Updates and new features
At the Fabric Community Conference 2024 many exciting announcements about the future development of Microsoft Fabric were presented. Let’s take a closer look at two new features related to OneLake and how these add substantial value to Fabric as an all-in-one analytics solution.
Company data is often distributed or spread across different domains or departments (e.g., finance, marketing, HR etc.). These different departments, however, may need to access the same or similar data sources. If data sources are copied between domains, it may become unclear who owns the “source of truth” in a particular case. This, in turn, results in a lack of trust in your data products and teams. Additionally, maintaining the respective pipelines responsible for such data movements can cause considerable costs. These issues are addressed by the Fabric features Mirroring and shortcuts.
Multi Cloud Shortcuts
As mentioned before, OneLake acts as a unified data lake, and with shortcuts, data can be referenced in different locations while still being logically represented within the same lake. Shortcuts let you create symbolic links that point to a particular storage location. This storage location may be internal, for instance, a lakehouse in another workspace, or external, like an Azure blob storage, AWS S3 buckets, or Google Cloud Storage buckets which is now in public preview. Furthermore, Microsoft announced for Q3 of this year that it will be possible to create shortcuts to data which is stored in the Apache Iceberg format that is then accessible in Delta Parquet in your OneLake. An important aspect of reducing egress costs is cross-cloud shortcut caching. Shortcuts can be created from lakehouses or KQL databases and managed programmatically through designated APIs. One example is the dataAccessRoles API which is now generally available and provides the possibility of programmatically creating, updating and assigning Lakehouse data authorization roles to individuals or groups.
The multi-cloud shortcuts feature is a big step towards distributed ownership of data in your OneLake and helps to unify your data landscape. The Fabric community is now looking forward to the support for on-premises and network-restricted environments that Microsoft has announced to come in the future.
Database Mirroring
Another exciting feature that is now available in public preview is database Mirroring. It provides the ability to maintain data synchronization between two databases without the need to create complex ETL pipelines. That enables the replication of a database or a subset of database tables to Fabric’s OneLake, keeping it up to date in near real-time. Additionally, there are no further storage costs associated with Mirroring. This feature is implemented by reading changes from the transaction log of the source database and updating the relevant data.
The concept of having source and replica databases is nothing new and quite common. However, the ease with which this can be implemented in Fabric’s OneLake from multiple different proprietary DB formats offers a lot of potential. So why is it so exciting, and what are the benefits? One main aspect is the fact that Mirroring protects operational databases from analytical queries. With many join and grouping operations, they can be quite demanding for operational DB’s, and Mirroring can help reduce the workload. Another advantage is the ability to perform seamless cross-database querying directly in Fabric. Mirroring also supports schema evolution of tables and compatible datatype changes.
One of the most significant advantages is the potential that real-time analytics brings to the table in combination with the reduced time to value and the ability to quickly create powerful data products. Power BI can access mirrored database tables with Direct Lake mode, providing the ability to easily create dashboards and reports for near real-time monitoring. There are many use cases where real-time analytics can improve business processes and increase customer satisfaction. Currently, Microsoft offers Mirroring Azure Cosmos, Azure SQL, as well as Snowflake databases, but support for other databases has already been announced for the coming months.
Conclusion
As the heart of Microsoft Fabric and foundational data layer, OneLake offers a unified and cohesive data lake solution for all Fabric experiences. Robust data governance remains essential for organizational integrity. The highlighted two exciting new features will bring value to Fabric and its OneLake technology in various ways: (multi-cloud) shortcuts enhance accessibility and the possibility of integrating new data sources while minimizing data redundancy. Database Mirroring facilitates seamless near real-time analytics use cases and a more centralized data management. By leveraging shortcuts and Mirroring, organizations can effortlessly establish virtualized lakes, breaking down data silos across various entities, eliminating the waiting period for IT to establish and maintain pipelines, and reducing resource allocation and time for data migration. As a Fabric Featured Partner Obungi actively participates in private previews for new Fabric features and is in constant dialogue with Microsoft’s product teams.