Archives

HDFS I/O Operations Performance Optimization


Purnachandra Rao Bobbepalli and Dr. Nagamalleswara Rao Nallamothu
Abstract

Hadoop Distributed File System is the mechanism used for processing of large datasets across the cluster of computers using the programming techniques. Online trasactions are heavily in use and the data which is being generated in the transactions need to be handled properly to track the operations. HDFS is the file system which will handle such scenarios. An HDFS contains NameNode to manage the metedada information and data nodes to store the data. The data will be managed as number of replicated copies. Data Node will be having data blocks where the real data will be copied. We can customize the replication using the programming techniques. HDFS supports common file system operations such as read and write files, create and delete directories. We can read the files from the file system and write the information to file system. In this paper we are addressing the memory access timings while interacting with Hadoop Distributed File System. Using memory organization techniques and write operation pipeline formation using fully connected graph technology. Cache memory is used to store frequently accessed data. Using the cache memory we can reduce the memory access time. In this paper we are proving that the memory access timings will be reduced by using set associative cache memory and the usage of fully connected graph technology while preparing the pipeline for write operation.

Volume 11 | 01-Special Issue

Pages: 824-836