IoT, as we all know has taken the center stage in the past few years. Literally from many of the day to day appliances to automotive to flights, all data that the IoT devices send are taking the user experience and the analytics to the next level.
Now, for the basics, IoT deals with any device or sensor sending some details. Those details are then analyzed and processed to give any feature or functionality of what went with that device or system, or it could be to predict something for the system, based on the data.
For e.g.: Consider telematics.
In the automotive industry, there is a device that is fitted onto the vehicle which can collect data from the vehicle which includes vehicle stats as well the positional data (PVT data).
It can also sense driver behavior in the form of Harsh acceleration, Harsh Braking and Rash turning, this data can also be sent to the system which will analyze the driver behavior based on the pattern.
So, in a similar way, each IoT device in some or the other way is sending data to some system that needs to consume it, process it and give the predictive or the analytical output.
If we take the automotive example again, the device sends these data in the form of raw packets. Raw packets are nothing but a string that contains the values of various params and which usually follow some protocol. The raw data is typically sent from the vehicle at the frequency of 1 packet/ 5 sec if it’s a normal positional data and a packet almost immediately when there is an emergency or any alerts generated.
This you would think is normal as the device sends data at this frequency to the server and the data is to be processed. But in reality, there are thousands of such vehicles plying all over and each vehicle has one device fitted on it. Now just multiply the frequency with these many vehicles. The amount of data that the microservices have to analyse and the process is huge. Typically, the data would be in TB for even a single day.
Till this example, it was only about devices sending raw packets. But IoT devices can send images, recorded audios and videos too which is much space-consuming as compared to raw data.
Here is where the storage plays a crucial role.
The storage is where the raw data or the IoT data would be heading and this data would then interact with micro services or APIs, where the features of the product would be served (either predictive or analytical).
The things that would be expected from the storage in the IoT realm are:
- Cloud-based as the IoT device can access the public cloud easily and send the data
- Scalable and massive storage expected
- Saving the data in a way so that it could be accessed fast, this is important, especially for analytics. Edge storage would be mean lower latency and real-time analysis
- Data stored securely, because this data cannot be recreated in most of the cases.
What type of storage is best suited and who are the players?
Object Storage in the cloud
Since the moto of IoT is to easily put and fetch data, object storage at the edge is what is considered to be a viable option. Although cloud vendors also provide file and block options as storage software.
In object storage, the data is directly accessible through APIs and an OS is not needed. Object Storage also helps, as any kind of data be it raw packets, videos, images can be stored and the data would not be lost.
The actual hardware in the datacenter is also shifting towards NVMe (non-volatile memory express) so that the data is available to the application at top speed. This is a new upcoming technology in SSD, designed in flash. It can send more commands at a time and communicates directly with the system CPU. Overall it gives a much better IOPs and better compatibility
Microsoft Azure
Azure IoT hub has storage containers that handle the IoT data in the form of a blob.
They also have the hot and cold path analytics flow pipeline which can be set as per the desired IoT hub route.
AWS
AWS IoT analytics in a time-series data store to store the device data. The querying on this data is time interval based which makes it faster
Google offers Cloud Datastore and Firebase Realtime Database solutions, where both use NoSQL database which scales automatically to handle the load.
Meanwhile on the data center side:
Dell EMC
Dell EMC ECS is an object storage-based solution that economically stores and manages unstructured data for any length of time.
NetApp
NetApp provides a Data Fabric solution that enables the data to move quickly in and out of the cloud. The base here is the storage OS called ONTAP which enables the data movement with agility.
Other players include HPE, IBM and Pure Storage also are no behind in the IoT storage solutions.