Data Virtualization and SnowflakeDB: A Powerful Combination

The star of SnowflakeDB keeps rising. This cloud-based, analytical SQL database server offers a set of powerful features, not in the least of which is high performance, end-to-end encryption, cloud platform independency, a self-driving engine, automatic scalability (up or down), and flexibility.

Some may wonder what would be the advantage in using data virtualization in combination with SnowflakeDB. Data virtualization can extract data from other data sources, it offers transformation and data integration power, and so on, but most of these features are supported by SnowflakeDB as well. So what would be the reason for combining them? Though the two do offer some overlapping features, data virtualization also offers some features that are not supported by SnowflakeDB, or are only supported in a limited way.

Distinguishing Features

First of all, it’s not very likely that all of an organization’s data is eventually stored in SnowflakeDB. Most enterprise data isn’t currently stored in SnowflakeDB and won’t be stored there for a long time; big data can be too big to move centrally and may have to remain at remote sites; some data may be stored in cloud applications; and other data sources are highly secure, in which case the data is not allowed to be copied to another database. Still, to analyze all the data, one interface for all the data would simplify the development of reports and dashboards. Data virtualization can help by making a heterogeneous set of data sources look like one logical database. One of those data sources can be SnowflakeDB.

SnowflakeDB can access data from other data sources through external tables. However, these data sources are restricted to files and specific Hadoop-related file formats. Data virtualization, on the other hand, can access data from many more sources, including SQL databases, many file formats, applications, spreadsheets, and service busses. This technology was primarily developed to address the inherent heterogeneity of current data processing systems.

Also, data virtualization enables stakeholders to centrally manage security across diverse data sources. Instead of having to define separate security specifications for different data sources using different specification languages, all security specifications can be handled in one uniform way using data virtualization.

Another difference relates to views. With data virtualization all the transformation, integration, aggregation, and filtering specifications are defined using views. SnowflakeDB is a SQL database server and supports views as well. By the way, this is understandably the main reason why some think that data virtualization is superfluous when SnowflakeDB is in the picture; both support views.

But there are key differences. First, data virtualization processes views that join multiple sources more efficiently. Its query optimizer is built to run distributed joins. Second, data virtualization allows for the definition of metadata; views and their columns can be described, defined, and tagged. This metadata is searchable by business users and professional IT developers.

The third difference regarding views relates to lineage and impact analysis. Data virtualization can show which views are dependent on which sources and/or other views. Lineage can be leveraged to trace the source of the data used, and also what type of operations have been applied to the data over time. Essentially, lineage offers business users more transparency. Impact analysis can be used to determine what will happen to other views when the definition of a view is modified. So, the impact of such a change can be understood beforehand.

The data stored in SnowflakeDB can only be accessed through SQL. Data virtualization allows data consumers to deploy other languages or APIs. Depending on the requirements of the data consumer, this can be SQL, but it can also be REST, OData, GraphQL, SOAP, and other formats.

The last feature I want to mention here is database server independency. If an organization needs to deploy another database technology for specific applications, data virtualization can hide this. Applications and reports do not have to be changed in order to use the database technology correctly. In other words, data virtualization hides the SQL dialect of the data source in use.

The Best of Both Worlds

The advantage of this combination of technologies is the streamlined data access of data virtualization and the speed, scalability, and flexibility of the SnowflakeDB engine. SnowflakeDB can also be used by data virtualization as sources for all the cached views.

The combination of data virtualization and SnowflakeDB is a powerful one. Data virtualization may feel superfluous when SnowflakeDB is used, but that’s not the case if you consider the entire data architecture responsible for data processing, storage, and analytics. The two clearly augment each other, enabling a flexible, scalable data architecture.

Originally published at http://www.datavirtualizationblog.com on January 23, 2020.

--

--

--

We do #DataVirtualization We care about #AgileBI, #BigData #Analytics, #Dataservices, #DataManagement, Logical #DataWarehouse Web, #SaaS and #Cloud integration.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why I choose Linux over Windows

FREE TRANSACTIONS — SAVE MOWA

So what is an API and how can this 10x your business revenue ?

Selection Sort In Detail

Fun and Useful CLI Tools for Software Developers

MY GSOC 2021 ARTICLE #7

The Pros And Cons Of Using AWS Cognito For User Authentication

5 Tips to Solve a Programming Problem

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
denodo

denodo

We do #DataVirtualization We care about #AgileBI, #BigData #Analytics, #Dataservices, #DataManagement, Logical #DataWarehouse Web, #SaaS and #Cloud integration.

More from Medium

Redshift vs Snowflake: The Definitive Guide

Snowflake Micro-partitions, Data Clustering & Zero-copy Cloning

Part 1 : How to automate data loading from a Cloud datalake into Snowflake ?

Why are Data Warehouses evolving to Lake Houses? Part2 — Access raw data after ETL