SOUTHWORKS Dev Team
November 27, 2020
As mentioned in a previous article, the last couple of months here at SOUTHWORKS we have been involved in several Big Data projects. From building complete ETLs and processing pipelines, implementing both batch and stream ingestion mechanisms, and for the last flavor of it how have we been analyzing and taking specific actions in near-real time.
As a small sample of what we have been doing, in this article we explore how to build a monitoring system over IoT devices in near-real time. In this case we simulate and monitor a printed circuit board manufacturing system and we leverage the Azure stack as a framework to bring this to life. To play with the platform we created a console application emulates the behavior of the sensors and controllers of the printed circuit board manufacturing IoT devices.
This article describes the print circuit board manufacturing scenario and the cloud infrastructure we put in place for monitor and control it.
Before we get into the thing I would like to thank the team that created this reference implementation: Facundo Hernán Costa, Pablo Costantini, Roy Crivolotti, Tomas Ignacio Escobar, Franco Bruno Lavayen & Abel Ricardo Lozano.
We all hope you enjoy it 😃
To show you how to leverage the set of Azure technologies that can be used to receive and process information from IoT devices, we came up with a simple scenario that applies to a Printed Circuit Boards factory. The factory assembly line is composed of several stages including:
We will focus on the SMD Pick and Place stage of this process. For the sake of the sample, our factory has two areas dedicated to the SMD Pick and Place stage of different assembly lines: Areas A and B. Additionally, each of these areas requires multiple engines to operate including:
To correctly operate, these engines require a reasonable temperature (below 80 degrees Celsius) to prevent overheating and consuming a specified range of power (between 40 and 80 Watts). To be able to monitor these parameters, the factory has installed smart sensors in the engines that measure in real time their temperature and power consumed. This information is submitted to the cloud constantly.
Our goals are pretty simple:
As we already mentioned it, we implemented the solution using Azure and leveraging different resources to build the complete workflow. All these resources can be deployed on an Azure subscription.
a) An Azure Function that will dispatch to an external HTTP callback the received alarm & parameters. In a real scenario this ‘web-hook’ would send a request to an HTTP endpoint specified by an external subscriber interested on having a real-time monitoring solution on factory’s devices. In this reference implementation we just used a second Azure Function that just logs the alarm event to the console for the sake of the sample.
b) An Azure Logic App will send the alarm information to two different resources:
c) The Alarm Event Hub will use Data Capture to save all the alarm event information to an Azure Blob storage. This information will be later processed by a Databricks notebook that generates CSV reports with historic alarm events analytics.
To emulate the engine sensor measurements, the Sensor emulator application was written in NodeJS using TypeScript and the MostJS library. Each factory engine is registered as a different device in the IoT Hub and generates independent temperature and power consumed measurements every few seconds which are sent to the IoT Hub using the MQTT protocol.
To generate the random measurements in a realistic way, each sensor alternates between two different states:
The amount of time to switch between states is a random value selected within a pre-specified range. When operating in a specific state, the sensor generates measurements around a pre-specified value using a random-walk process that generates measurements that tend to go to the desired mean value.
The mean sample values and mean durations of each state can be defined on a per-sensor basis.
Below you can find examples of measurements generated by our algorithm:
As you can see, the sensor starts in stable mode with measurements in the range of 40–80 degrees Celsius and enters alarm mode at sample #17 when values start increasing until they stabilize around a mean value of 90. Afterwards the sensor goes back to stable mode by decreasing sample values.
Let’s see now how we leveraged each of the different components shown at the architecture diagram above in detail.
Once sensors are ready, we have to connect them with the rest of the architecture (meaning being able to submit the data to the cloud). To solve that we used Azure IoT Hub, which can be used to establish a publisher/subscriber communication between the cloud and IoT devices.
This bidirectional communication supports:
It supports communication with the most common IoT protocols including:
In order for a device to be able to communicate with IoT Hub, it first needs to be registered in the IoT Hub and it needs to authenticate itself with its name as key as specified here.
For our setup we send messages to the IoT Hub from the Sensor emulator via MQTT as we mentioned above.
To solve this, an Azure Stream Analytics Job subscribes to events that arrive via the IoT Hub’s internal Event Hub. Azure Stream Analytics can be used for streaming pipeline processing of input events using SQL queries with very low latency. It can be configured to digest input data from a wide array of input types, process it with pre-specified jobs and push the results to the next stage in the pipeline.
In our setup, this job reads the measurements received by the IoT Hub and generates alarms when it detects that multiple consecutive samples are outside the pre-specified safety range. Finally it sends these alarms to the Alarm Event Hub.
The Azure Stream Analytics Job receives the measurements and generates two different types of events:
These events are sent to the Alarm Event Hub and are later consumed by the rest of the processing pipeline.
Below you can see the query flow:
Below you can see an example of alarms being sent by the Azure Stream Analytics Job:
Now, we need multiple parties (an Azure Function, an Azure Logic app and an Azure Blob storage container) to receive the same events (the Alarm events generated by the Stream Analytics Job) being generated by our platform. Therefore instead of the events having to be sent to all the receiver resources separately, they are sent to the Alarm Event Hub from which all the interested parties can read directly by subscribing to the Alarm Event Hub.
Azure Event Hub can be used for establishing communication between different Azure resources in the cloud. The advantages of using Azure Event Hub over other communication methods include:
To store the events that arrive to the Event Hub in an Azure Storage container (necessary for calculating analytics on historic alarm events), the Data Capture functionality of the Event Hub is enabled.
In our platform, the Azure Functions are triggered by alarm events received by the Event Hub. To tackle this need we deployed a function that receives the alarm event from the Event Hub service, gets the alarm information and sends an HTTP request to a pre-specified HTTP endpoint.
Azure Functions is a serverless compute service provided by Azure designed to execute event-triggered code without worrying about the application infrastructure. The Azure Functions can be triggered by a wide variety of events such as HTTP request, a scheduled time and events from other Azure services such as Event Hubs.
To test that the alarm event is sent correctly, we implemented a second Azure Function as the user’s HTTP callback endpoint. This second function simply logs the received events to validate that the request was sent/received correctly completing the flow.
In the image below we can see the telemetry message that arrives at the Event Hub and the alarm event logged by the ‘callback’ Azure Function.
Azure Logic App can be used to automate business processes and workflows easily and quickly. It is intended to integrate applications, systems, and services either in the cloud or on-premises. Logic Apps provide a lot of predefined ready-to-use connectors that allow building applications that can listen for events that occurs in other resources and trigger actions to process, store and/or send event information.
Within our platform we decided to send emails and provide a near real time dashboard with the alarms generated by the EventHub. Therefore, we created a Logic App which uses three different connectors:
The Logic App can be scheduled to read incoming events periodically. In our case, we configured it to retrieve alarm events once per minute.
Below you can see the Logic App application architecture:
And the email sent by it:
As we desire to monitor the solution and quickly check how many alarms we received in the last minutes, we choose these 2 services to solve that:
We combined both services doing the following:
a) One to show the Amount of alarms occurred in a predefined amount of time grouped by Factory Area.
b) Another one to show the Amount of alarms occurred by hour.
Below you can see the Azure Monitor’s dashboard with count of alarms received in the past 24 hours, grouped by factory area and also by time of day.
Databricks is derived from Apache Spark and allows to process large quantities of information organized in data frames using a computing cluster. This can be done transparently, abstracting the user from the architectures of the computing cluster and the storage.
It is possible to operate a Databricks cluster in two different modes:
Within this platform, we used Databricks batch processing to retrieve Alarm Events stored by the Alarm Event Hub in an Azure container (via the Data capture functionality mentioned before).
Our Databricks notebook connects to an Azure container and retrieves the Alarm Events stored by the Alarm Event Hub during a specified time range. For each pair of alarm on/alarm off events occurred for the same measurement type of the same device, it calculates the alarm duration by subtracting the events timestamps.
Afterwards it generates two different reports:
Using these reports, the customer can know the frequency and average duration of the alarms to be able to measure if a specific machine type or factory area has more frequent alarms than the rest and how the average amount of time it takes the factory operators to solve the issue affects productivity. Each of these reports is saved as a CSV file in an Azure Blob container for preservation.
Additionally, the Databricks notebook generates a plot showing the correlation between the amount of alarms and their duration for each device depending on the alarm type (temperature or power).
Below you can find a simple example of the data frames that are used to generate the CSV reports for four devices and the corresponding correlation plot.
During this article, we showed you how to build an IoT device monitoring system using the Azure Data platform. Our solution saves the event stream coming into the IoT devices into a data lake and raises alerts when out of bounds conditions are detected. The implementation was able to be done by using different Azure technology by implementing the ingestion of the IoT events and sending out alerts. Anyway, our approach could work as a base for any other monitoring system with IoT, it could be extended with different kind of sensors to control different elements of our scenario.
Although this goes beyond this reference implementation, in an additional stage you could bring into the picture machine learning models to discover patterns that might be hidden the failure of some components in the chain. Of course for this you would need to introduce other sensors in the ecosystem and correlate the information to be able to discover interesting insights that at first sight cannot be uncover. Different factors as the delivery of the power network in the area, ambient temperature, humidity and other variables might be affecting factory’s devices without being noticed.
Lastly you can see, by going into the details of the reference implementation shared in this repository, that nowadays you can build a near-real time monitoring solution for physical world devices super straight forward by just integrating several cloud managed services (this time using Azure).
Originally published by Mauro Krikorian for SOUTHWORKS on Medium 27 November 2020