Apache Hadoop is an open-source software framework used for distributed storage and processing of very large data sets. It consists of computer clusters built from commodity hardware. It supports various security standards and functionalities, such as SSL, Kerberos authentication, encryption at rest, role based authorization to ensure enterprise data are stored securely and accessed by permitted users only.
The base framework consists of the following modules:
– Hadoop Common – contains libraries and utilities needed by other Hadoop module
– Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing high aggregate bandwidth across the cluster
– Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users’ applications
Hadoop MapReduce – an implementation of the MapReduce programming model for large scale data processing.