How to calculate Hadoop cluster growth plan based on storage?
This calculation is for small 3 node Hadoop cluster assume average daily ingest rate of 10 GB per node.
Average daily ingest rate | 10 GB |
Replication factor | 3 (copies of each block) |
Daily raw consumption | 30 GB (Ingest × replication) |
Node raw storage | 600 GB (2 x 300GB SATA II HDD) |
MapReduce temp space reserve | 25% For intermediate MapReduce data |
Node-usable raw storage | 450 GB (Node raw storage – MapReduce reserve) |
1 year (flat growth) |
24 Node (Ingest × replication × 365 / node raw storage) (10 GB x 3 x 365/450 GB) |
1 year (5% growth per month) | |
1 year (10% growth per month) |