Analyzing Big Data quicker and more energy efficient

November 14, 2018

Headshot of Nihata AliparmakDigging into Big Data takes a lot of computing power and features a lot of moving parts. One issue researchers face is the presence of input/output (I//O) bottlenecks, which involves how information is transferred from the storage device to the other processing unit. The recipient of a new grant for more than $478K from the National Science Foundation, Dr. Nihat Altiparmak looks to help resolve just that.  

“By Big Data, we generally mean that the amount of data is very large that it can’t fit into a single drive, it is commonly unstructured, and it generally arrives at a very high pace. As a result, it takes very long to process and extract knowledge out of it using traditional techniques,” said Altiparmak. 

To put that into perspective, the amount of data that Altiparmak and his collaborators are dealing with far exceeds a petabyte, which is approximately the content of the largest library in the world multiplied by 50.

Extracting knowledge from that much data requires two operations: frequent access to the storage units to bring the data to the processing units, and then processing it. While much research has gone into processing techniques, Altiparmak is part of a team dedicated that seeks to improve upon lower data access latency, saving time and energy in the entire Big Data analysis process.

Economy of Resources

“My part in this research is to make Big Data analysis more efficient. In terms of performance and energy,” said Altiparmak. “I have two goals: one is making automatic storage system optimizations leading to self-optimizing Big Data analysis platforms. The other one is making this infrastructure more energy efficient.”

Last year, Altiparmak received a prestigious NSF Pre-CAREER awardto help self-optimize high performance data storage systems. This current funding will allow him to perform that investigation in a large-scale distributed Big Data analysis platform.

“Data mining and machine learning comes in there, with my research,” said Altiparmak. “I need to predict what datasets will be frequently used in the future, so that I can optimize for them.”

Currently, the University of Louisville utilizes an internal system called the Cardinal Research Cluster. This NSF grant will also possibly allow updating that technology, an upgrade that will benefit researchers campus wide. The funding will include increased storage and processing power as well as updates like power meters, which can help to quantify the amount of power used. Identifying how much power will provide a valuable resource in Altiparmak's efforts to improve the energy efficiency of the infrastructure.

Additionally, the available storage devices and their interfaces to transfer data will be enhanced to include newer technologies, including the use of non-volatile memory devices. Altiparmak believes that updating the data storage technology is a critical step for satisfying his and his collaborators’ research goals.

Data Equality

Not all data is created equally. Altiparmak is part of a team that includes a variety of researchers from different fields, including biomedicine, metagenomics, public health, and multimedia. The datasets associated with each are very large, and they vary in type, adding extra complexity to the process.

“Some data can fit in a certain structure or table, but most of our datasets cannot. For example, Big Biomedical Data includes various unstructured data such as biomedical images, genomic information, chemical test results (e.g., pathology report), and observational patient data. It is very challenging to integrate these and extract knowledge in a timely manner for improved healthcare services,” said Altiparmak. “The common point to everyone involved in this project is that they have a lot of complex data to analyze.”

Altiparmak is the primary investigator for this grant, which houses five more projects under the Big Data umbrella: (i) Efficient Management and Analysis of Big Data; (ii) Big Multimedia Data Analysis; (iii) Big Biomedical Data Analysis; (iv) Big Metagenomics Data Analysis; and (v) Big Public Health Data Analysis. He is joined by Drs. Ayman El-Baz and Robert Keynton from the Department of Bioengineering; Olfa Nasraoui and Hichem Frigui from the Department of Computer Engineering and Computer Sciences; Nejat Egilmez from the School of Medicine; and Bert Little, Richard Kerber, and Karunarathna Kulasekera from the School of Public Health and Information Sciences.

“Each of my colleagues need this infrastructure so that we can perform interdisciplinary collaboration, enable new technological innovations in Big Data integration, analysis and interpretation, and accelerate the innovation process in the fields of bioengineering, medicine, public health, and computer science, " said Altiparmak. “We all work in Big Data, which is one of the 2020 strategic missions of the University.”

Their grant was a ranked #1 in the NSF panel for funding among all proposals submitted to the same division. It begins on October 1, 2018 and ends on September 30, 2020.