Azure Data Factory: A Vignette – Part One of the Cortana Cadence Series

By | 2015-08-21T10:30:20+00:00 August 21st, 2015|Azure, Azure Machine Learning (AML), Cortana Analytics|0 Comments

We can no longer afford to ignore the data available to us. All too often we collect it, glance at it, then shelve or destroy it. By doing this we are throwing away our competitive advantage. The good news is that as data becomes more ubiquitous, the tools available are becoming more sophisticated. Let me paint you a picture based on some of our experiences…

They had a lot of data. Terabytes of rich metrics they’ve been collecting over the past four years. They know the data has value, that’s why they’re still collecting it after all; they just don’t know how to extract it. “Wow. That’s a lot of data you’ve got there,” I said, pouring through the on-site database. “That’s why you’re here. We want to know if Azure is more cost effective than hosting it all locally,” Brad replied in a dull Dragnet tone. “Blob storage is incredibly cheap. I’m sure it will be an improvement but what are you doing with all this data?” I asked, mesmerized by the endless scroll of metrics washing over the screen. Brad comment flatly, “Nothing. Maybe we’ll do something with it someday but we’re just collecting it for now.” I turned to look at him, mouth agape. “This isn’t right. What you’re doing here isn’t right.” I declared with increasing excitement, “Brad, this data is actionable! You’re worried about storage costs but this is what’s going drive your business forward. We’ve got to process this into something usable.” Brad nodded in vague agreement and said, “I know, but we just can’t get the resources to build something out right now.” I narrowed my eyes and said sharply, “Let’s talk Azure Data Factory.”

Azure was all new to Brad so I gave him a high-level description of Azure Data Factory and how we could leverage it to get value from all that untapped data. I explained that Azure Data Factory is part of the Cortana Analytics Suite. It would allow us to create an end-to-end pipeline for his company’s data, resulting in easily consumable reports for the company’s business leaders. First, the pipeline would start by consuming the data. New and existing datasets would be dumped into Azure Blob Storage. Then we would use Azure’s implementation of Hadoop, called HDInsight, to combine, scrub, and manipulate the data. The formatted datasets would then be placed back into Blob Storage before automatically being picked up and analyzed by Azure Machine Learning. From there the results would exit the pipeline and be published as easy to read charts with Power BI.

“The business people upstairs would love to have those reports but we just don’t have the resources to build and maintain something with that type of complexity,” Brad said in a slightly defeated tone. “Brad, I’m telling you, it’s not difficult. Each piece is straight forward to implement, it’s modularized, and it all connects through JSON templates. In fact, once you’re done you’ll be able to see a visual representation of your working pipeline in the Azure Portal,” I assured him. At this point Brad tried to interject something but I knew what his next rebuff was going to be, so I continued, “It will be totally hands-off. You’ll get near real-time reports delivered automatically with no overhead.” I paused. Brad was speechless. “Brad,” I quietly confided, “once you get this Data Factory going you’ll look like a hero. This is the kind of stuff that increases profits and gets people promoted.” That’s when Brad’s eyes really lit up. “Let’s get started!” he trumpeted, leaning forward in his chair with his eyes fixed on the screen. With that we were no longer worried about the savings each Azure service individually provides. We concerned ourselves with the larger business advantages Azure Data Factory supplies by capitalizing on multiple Azure services as a pipeline in the cloud.

About the Author:

Leave A Comment