In the fast-changing technological world, addressing tech debt is a significant challenge. Not only does it entail a learning curve, but it also affects the feature release cycles. Moreover, with the emergence of Big Data and Machine Learning, it is imperative to keep your technology updated owing to performance and security enhancements. Further, with the development of cloud, updates come in rapid cycles, thus making it challenging to keep up with the acceleration.
Speaking of the cloud, Microsoft Azure is one of the leading service providers with a variety of PaaS offerings to realize the big data and analytics applications; Azure Data Lake is one of them. Azure Data Lake Store is a prevalent PaaS offering from Microsoft Azure for storing big data i.e., data at different volumes, variety, and velocity. While Azure Data Lake Gen1 was an excellent service, for additional security and performance, Microsoft came up with Azure Data Lake Gen2.
Hence, it is imperative for organizations using ADLS Gen1 to migrate to ADLS Gen2. Now, let us touch base upon certain aspects of migrating from Azure Data Lake Gen1 to Gen2:
The first aspect of this migration effort includes deployment and governance of Azure Data Lake Gen2, and it’s governance. This is usually done using Powershell. The below article details the governance aspect of Azure Data Lake gen2 using Powershell:
Managing Azure Data Lake Gen2 with Powershell
We commission a Data Lake store to perform Analytics on top of the stored data. For this, Azure Databricks is the tool of choice. Please note that as opposed to Azure Data Lake Gen1, USQL is not supported any more in Gen2. Hence, it is an additional effort to convert the existing USQL code to Azure Databricks.
Nonetheless, there are a few very important changes w.r.t. Azure Databricks connection to ADLS Gen2. Here is the article which details out the same:
Azure Data Lake Gen2 and Azure Databricks
The last service related to Azure Data Lake is the Azure Data Factory. This has become the de-facto ETL/ELT tool in Azure Data stack. Thus, it is an essential tool of choice, even with Azure Data Lake Gen2.
However, there are some exciting changes to the way it connects to Azure Data Lake Gen2 i.e., using Managed Identity. This is possible since Azure Data Factory is now a trusted service in Azure Storage (of which Azure Data Lake Gen2 is a flavor). Read this article for more details:
Managed Identity between Azure Data Factory and Azure storage
Please note that the previous article details Managed Identity using RBAC. But, that is not allowed in many organizations, since they prefer using Access Control Lists. Please refer to the next article for more details on ACL’s
Azure Data Lake Gen2 Managed Identity using Access Control Lists
I hope that this article was helpful. We are neither guaranteeing its completeness or accuracy. Reader discretion is advised.
Comments are closed.