Apache Flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for access by online analytic applications.
Flume lets Hadoop users ingest high-volume streaming data into HDFS:
– Ingest streaming data from multiple sources into Hadoop for storage and analysis – typical examples of such data are application logs, sensor and machine data, geo-location data etc.
– Buffer storage platform from transient spikes, when the rate of incoming data exceeds the rate at which data can be written to the destination.
– Flume NG uses channel-based transactions to guarantee reliable message delivery. When a message moves from one agent to another, two transactions are started, one on the agent that delivers the event and the other on the agent that receives the event. This ensures guaranteed delivery semantics.
Behaim’s 2 year experience includes Flume installation, setup, configuration, and production deployment. Also, Flume component implementation (sources, channels, sinks, agents etc.) and the integration with other applications.