AzureBig DataCloud ComputingMicrosoft Build your Data Estate with Azure DataBricks – Part 3 – IoT February 23, 2019698 views0 Share By DataGeek Share It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change~Charles Darwin With increasing clock speeds and levels of sophistication, we have ventured into the era of the Internet of Things and real-time feeds, thus leading to the high-velocity paradigm of Big Data along with IoT. This real-time path of the lambda architecture augments a wide variety of critical applications like predictive maintenance, disaster prediction, etc. where timely actions can save assets as well as lives. Also Read: DataBricks Part 2 – Big Data Lambda Architecture and Batch Processing In Azure, there are multiple ways to realize real-time architecture, thus enabling faster analytics. Broadly it can be classified as the Infrastructure as a service (IaaS) way or the Platform as a Service (PaaS) way. With IaaS, we have Kafka in Azure to receive real-time feeds. This streaming data can then be fed into Storm (or any PaaS service like Databricks) enabling stream analytics. Although the IaaS way has its advantages, to realize the architecture in a serverless fashion, we will go PaaS way; the IoT Hub way Related: Part – 1: Build your Data Estate with Azure Databricks IoT Hub is the bidirectional messaging PaaS to communicate with your devices/sensors etc. Data from IoT hub can be processed using two PaaS services in Azure viz. Azure Stream Analytics and Azure Databricks. We want to clarify that Azure Stream Analytics is an excellent service and it is widely used in the Industry. We encourage you to go ahead and give it a try as well. However, in this article, we will stick with Azure Databricks for three reasons: It gives us an integrated platform for both Batch processing and Real-time Analytics of the Lambda Architecture. It helps us leverage the power of Spark Streaming under the hood. The cluster autoscaling feature enables us to save a lot of expenses. Roughly the Architecture looks like this: Step 1: Device to IoT hub For demonstration purpose, we will introduce a Raspberry PI simulator which will push the fabricated weather data to IoT hub. To achieve this, we need to declare a device in the IoT hub, which is the simulator in this case. Click on the add icon and mention the device name: After this click on the registered device and retrieve the Primary connection string from the device details: Paste the connection String extracted into the connection string field in the Raspberry PI simulator: A sanity check here would be the glowing of the LED in the picture. Step 2: IoT Hub to Databricks Once the IoT hub setup is ready, it is essential to read and process the streaming data. Here services like Azure Stream Analytics and Databricks comes into the picture. In Databricks, we leverage the power of Spark Streaming to perform SQL like manipulations on Streaming Data. The first step here is to establish a connection between the IoT hub and Databricks. First, we need to install the spark.eventhubs library to the pertinent cluster. After that, we need to write the below code(Scala): import org.apache.spark.eventhubs.{ ConnectionStringBuilder, EventHubsConf, EventPosition } import org.apache.spark.sql.functions.{ explode, split } import org.apache.spark.sql._ import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ // Build connection string with the above information val connectionString = ConnectionStringBuilder("<Event Hub Compatible endpoint of IoT Hub>") .setEventHubName("<IoT Hub Name>") .build val eventHubsConf = EventHubsConf(connectionString).setStartingPosition(EventPosition.fromEndOfStream); //follow by the different options usable val incomingStream = spark.readStream .format("eventhubs") .options(eventHubsConf.toMap) .option("eventhubs.partition.count", "4") .load() incomingStream.printSchema After establishing the connection, we need to define the JSON Schema to match the structure of the incoming stream. It can be achieved using the below code(Scala): import org.apache.spark.sql.types._ // https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/package-summary.html import org.apache.spark.sql.functions._ // Our JSON Schema val jsonSchema = new StructType() .add("messageId", StringType) .add("deviceId", StringType) .add("temperature", StringType) .add("humidity", StringType) // Convert our EventHub data, where the body contains our message and which we decode the JSON val messages = incomingStream // Parse our columns from what EventHub gives us (which is the data we are sending, plus metadata such as offset, enqueueTime, ...) .withColumn("Offset", $"offset".cast(LongType)) .withColumn("Time (readable)", $"enqueuedTime".cast(TimestampType)) .withColumn("Timestamp", $"enqueuedTime".cast(LongType)) .withColumn("Body", $"body".cast(StringType)) // Select them so we can play with them .select("Offset", "Time (readable)", "Timestamp", "Body") // Parse the "Body" column as a JSON Schema which we defined above .select(from_json($"Body", jsonSchema) as "sensors") // Now select the values from our JSON Structure and cast them manually to avoid problems .select( $"sensors.messageId".cast("string"), $"sensors.deviceId".cast("string"), $"sensors.temperature".cast("double") as "tempVal", $"sensors.humidity".cast("double") as "humVal" ) messages.printSchema() Once the messages are received, we create a temporary view on top of the JSON Schema in order to write SQL queries to perform advanced analytics using the function ‘createOrReplaceTempView’: messages.createOrReplaceTempView("dataStreamsView") After this your streaming data is ready for advanced analytics: P.S. This isn’t the comprehensive guide to Real-Time Analytics with Databricks. We can persist the streaming data in CosmosDB and visualize it in PowerBI using its spark connector. Disclaimer: The Questions and Answers provided on https://www.gigxp.com are for general information purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Share What's your reaction? Excited 0 Happy 0 In Love 0 Not Sure 0 Silly 0
Microsoft Microsoft 365 Family vs Microsoft 365 Personal vs Office Home & Student 2019 – Comparison By IGMarch 21, 2021
Azure SQL Server Managed Instance BYOL Hybrid Licensing & Dev/Test Options Microsoft offers SQL Server Managed Instance BYOL Hybrid Licensing for your On-Premises products and discounted ...
Azure Azure SQL Managed Instance General Purpose vs Business Critical In this article, I will try to compare the Azure SQL Managed Instance General Purpose ...
Azure Azure SQL DTU to vCore Migration Steps for PAAS Databases on Cloud In this article, we are going to talk about Azure SQL DTU to vCore Migration. ...
Azure How To Change Azure SQL DB MAXDOP Settings Using T-SQL Command In this article, we are going to discuss how to change Azure SQL db MAXDOP ...