In the fast-changing technological world, addressing tech debt is a significant challenge. Not only does it entail a learning curve, but it also affects the feature release cycles. Moreover, with the emergence of Big Data and Machine Learning, it is imperative to keep your technology updated owing to performance and security enhancements. Further, with the development of cloud, updates come in rapid cycles, thus making it challenging to keep up with the acceleration.
Migrating from Azure Data Lake Gen1 to Gen2:
Speaking of the cloud, Microsoft Azure is one of the leading service providers with a variety of PaaS offerings to realize the big data and analytics applications; Azure Data Lake is one of them. Azure Data Lake Store is a prevalent PaaS offering from Microsoft Azure for storing big data i.e., data at different volumes, variety, and velocity. While Azure Data Lake Gen1 was an excellent service, for additional security and performance, Microsoft came up with Azure Data Lake Gen2.
Hence, it is imperative for organizations using ADLS Gen1 to migrate to ADLS Gen2. Now, let us touch base upon certain aspects of migrating from Azure Data Lake Gen1 to Gen2:
Deployment and Governance
The first aspect of this migration effort includes deployment and governance of Azure Data Lake Gen2, and it’s governance. This is usually done using Powershell. The below article details the governance aspect of Azure Data Lake gen2 using Powershell:
Azure Databricks for Azure Data Lake Gen1 to Gen2
We commission a Data Lake store to perform Analytics on top of the stored data. For this, Azure Databricks is the tool of choice. Please note that as opposed to Azure Data Lake Gen1, USQL is not supported any more in Gen2. Hence, it is an additional effort to convert the existing USQL code to Azure Databricks.
Nonetheless, there are a few very important changes w.r.t. Azure Databricks connection to ADLS Gen2. Here is the article which details out the same:
Azure Data Factory for Azure Data Lake Gen1 to Gen2
The last service related to Azure Data Lake is the Azure Data Factory. This has become the de-facto ETL/ELT tool in Azure Data stack. Thus, it is an essential tool of choice, even with Azure Data Lake Gen2.
However, there are some exciting changes to the way it connects to Azure Data Lake Gen2 i.e., using Managed Identity. This is possible since Azure Data Factory is now a trusted service in Azure Storage (of which Azure Data Lake Gen2 is a flavor). Read this article for more details:
Please note that the previous article details Managed Identity using RBAC. But, that is not allowed in many organizations, since they prefer using Access Control Lists. Please refer to the next article for more details on ACL’s
I hope that this article was helpful. We are neither guaranteeing its completeness or accuracy. Reader discretion is advised.Disclaimer: The Questions and Answers provided on https://www.gigxp.com are for general information purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose.