True Story: Big Data vs Old Data

By Drew Johnson, VP-Engineering, Aeris Communications, Inc

According to Gartner, in 2014 the Internet of Things (IoT) took over from Big Data as the most hyped technology. As this technology progresses, it will impact nearly every enterprise and consumer. Our company provides Connectivity as a Service (CaaS) and Platform as a Service (PaaS) for the Internet of Things (IoT). We are lucky enough to have customers in many IoT domains … Automotive, Fleet Trucking, Healthcare, and Smart Energy to name a few. In the last few years we have been busy bringing Big Data technology to our customers and starting to really see the value we can provide them.

Connected Car

In particular, we have several automotive customers where we provide the connectivity for their connected car programs. These programs are delivering valuable information to the Original Equipment Manufacturer (OEM) on how the vehicles actually perform as a group as well as providing the vehicle owner with information and convenience for their vehicle. As connectivity has spread across brands, the number of connected vehicles has grown steadily. We provide connectivity for many millions of vehicles. Each vehicle starts and stops connectivity sessions several times per day so that over the course of a month it results in hundreds of millions of such records. There are some variations of usage which also result in another several hundred million records. Over the course of a few months this results in billions of records.

Old Data

As an example to contrast old ways to process data versus newer ways, we performed analysis of connectivity for a tier-one automotive OEM. Their current systems were analyzing connectivity across the entire fleet using older approaches. They had a good sense of how much connectivity the vehicles were using as a group each month and then could easily determine the average utilization per vehicle. This was giving them a view that connectivity utilization was relatively flat and generally the same across the entire fleet.The constraints of the old-data approach are that using highly-structured data stores, one really needs to know the questions to be asked of the data before the data is stored.

Big Data

On the other hand, we were able to utilize our unstructured Data Lake which we created using Hadoop File System (HDFS) plus our Impala analytics engine to analyze all of the billions of records to get a much finer-grained view of how groups of cars were using data over time. What we found was that the cars were using data connectivity in very different ways and that usage was starting to change in an important way. In September of 2014, 98% of the vehicles were using less than 20MB of data each month – although the distribution of data usage went from only about 1MB each month all the way up to vehicles that were using 100s of MB each month. That usage distribution started changing in an important way over the next 3 months such that by December of 2014 there was a substantial surge in vehicles using more than 20MB of data. We were then able to correlate against potentially interesting events such as holidays, weather, and code changes. It could have been that holidays or weather were driving more utilization of the connected services. However, in the end we found that the usage profile seemed to be impacted mostly by a code change which was not expected to have such an impact.Most importantly, the ability to analyze unstructured data effectively means that we can ask questions of the data which we did not think of before the data was stored.

Value … The Most Important V

Big Data started with the 3 V’s … ‘Volume’, ‘Velocity’, and ‘Variety’. There are companies racing to add more V’s to that. We’ve seen as many as 7 – adding ‘Variability’, ‘Veracity’, ‘Visualization’, and ‘Value’. Something most of us can agree on is that the most important is really ‘Value’. In this case, we were able to identify that a code change could have an impact in data usage that could potentially cost millions of dollars. That is true value.

Original Source – https://ciostory.com/bigdata/true-story-big-data-vs-old-data