Upgrading to Microsoft.Azure.Management.DataLake.Store 0.10.1-preview to Access Azure Data Lake Store Using C#

Introduction

Microsoft recently released a new nuget package to programmatically access the Azure Data Lake Store.  In a previous post Accessing Azure Data Lake Store from an Azure Data Factory Custom .Net Activity I am utilizing Microsoft.Azure.Management.DataLake.StoreFileSystem 0.9.6-preview to programmatically access the data lake using C#.  In this post I will go through what needs to be changed with my previous code to upgrade to the new nuget package.  I will also include a new version of the DataLakeHelper class which uses the updated sdk.

Upgrade Path

Since I already have a sample project utilizing the older sdk (Microsoft.Azure.Management.DataLake.StoreFileSystem 0.9.6-preview), I will use that as an example on what needs to be modified to use the updated nuget package (Microsoft.Azure.Management.DataLake.Store 0.10.1-preview).

The first step is to remove all packages which supported the obsolete sdk.  Here is the list of all packages that can be removed:

  • Hyak.Common
  • Microsoft.Azure.Common
  • Microsoft.Azure.Common.Dependencies
  • Microsoft.Azure.Management.DataLake.StoreFileSystem
  • Microsoft.Bcl
  • Microsoft.Bcl.Async
  • Microsoft.Bcl.Build
  • Microsoft.Net.Http

All of these dependencies are needed when using the DataLake.StoreFileSystem package.  In my previous sample I am also using Microsoft.Azure.Management.DataFactories in order to create a custom activity for Azure Data Factory, unfortunately this package has a dependency on all of the above packages as well.  Please be careful removing these packages as your own applications might have other dependencies on those listed above.  In order to show that these packages are no longer needed my new sample project is just a simple console application using the modified DataLakeHelper class, which can be found here on github.

Now let’s go through the few changes that need to be made to the DataLakeHelper class in order to use the new nuget package.  The following functions from the original DataLakeHelper class will need to be modified:

create_adls_client()
execute_create(string path, MemoryStream ms)
execute_append(string path, MemoryStream ms)

Here is the original code for create_adls_client():

 private void create_adls_client()
        {
            var authenticationContext = new AuthenticationContext($"https://login.windows.net/{tenant_id}");
            var credential = new ClientCredential(clientId: client_id, clientSecret: client_key);
            var result = authenticationContext.AcquireToken(resource: "https://management.core.windows.net/", clientCredential: credential);

            if (result == null)
            {
                throw new InvalidOperationException("Failed to obtain the JWT token");
            }

            string token = result.AccessToken;

            var _credentials = new TokenCloudCredentials(subscription_id, token);
            inner_client = new DataLakeStoreFileSystemManagementClient(_credentials);
        }

In order to upgrade to the new sdk, there are 2 changes that need to be made.

  1. The DataLakeStoreFileSystemManagementClient requires a ServiceClientCredentials object
  2. You must set the azure subscription id on the newly created client

The last 2 lines should now look like this:

var _credentials = new TokenCredentials(token);
inner_client = new DataLakeStoreFileSystemManagementClient(_credentials);
inner_client.SubscriptionId = subscription_id;

Now that we can successfully authenticate again with the Azure Data Lake Store, the next change is to the create and append methods.

Here is the original code for execute_create(string path, MemoryStream ms) and execute_append(string path, MemoryStream ms):

 private AzureOperationResponse execute_create(string path, MemoryStream ms)
        {
            var beginCreateResponse = inner_client.FileSystem.BeginCreate(path, adls_account_name, new FileCreateParameters());
            var createResponse = inner_client.FileSystem.Create(beginCreateResponse.Location, ms);
            Console.WriteLine("File Created");
            return createResponse;
        }

        private AzureOperationResponse execute_append(string path, MemoryStream ms)
        {
            var beginAppendResponse = inner_client.FileSystem.BeginAppend(path, adls_account_name, null);
            var appendResponse = inner_client.FileSystem.Append(beginAppendResponse.Location, ms);
            Console.WriteLine("Data Appended");
            return appendResponse;
        }

The change for both of these methods is pretty simple, the BeginCreate and BeginAppend methods are no longer available and the new Create and Append methods now take in the path and Azure Data Lake Store account name.

With the changes applied the new methods are as follows:

        private void execute_create(string path, MemoryStream ms)
        {
            inner_client.FileSystem.Create(path, adls_account_name, ms, false);
            Console.WriteLine("File Created");
        }

        private void execute_append(string path, MemoryStream ms)
        {
            inner_client.FileSystem.Append(path, ms, adls_account_name);
            Console.WriteLine("Data Appended");
        }

Conclusion

As you can see it was not difficult to upgrade to the new version of the sdk.  Unfortunately, since these are all preview bits changes like this can happen, hopefully this sdk has found its new home and it won’t go through too many more breaking changes for the end user.