Running a Big Data Infrastructure: Five Areas That Need Your Attention
Download Free book (Don’t need to signup)
Scalability generally refers to the ability of a technology to support increasing demand. Typically, in order to scale infrastructure, more
machines are added. Elasticity refers to the ability of a technology to flex up or down depending on demand at any given time. Working on an
elastic substrate allows a company to utilize one machine for lots of operations, not just Big Data, without affecting the budget.
Business users count on the availability of data at regular intervals to efficiently run the company. Reliability begins with how effectively data
enters the infrastructure and ends with predictable data delivery. Companies, such as Facebook, utilize internal service level agreements
(SLAs) to ensure data availability across teams.
3. Self-Service Tools
Business analysts require data to perform their job functions, yet without a deep technical skill
set, quickly and easily accessing the required data can be practically impossible. In the past,
companies resolved this issue by hiring teams to act as liaisons between the data engineers and
business users. Today, self-service tools provide ready access to data through a user-friendly
interface. Productivity is increased across the board making it possible for more people in the
organization to make data-driven decisions and making it possible for data teams to support
larger and larger numbers of business users.
Active monitoring of the infrastructure minimizes issues and maximizes availability. How much
monitoring? The more, the better. With the introduction of self-service tools, more users relying on the data will be impacted by
inefficiencies or other concerns. Monitoring provides insight into how the system is operating and quickly identifies minor glitches that can be
corrected before they develop into
5. Open Source
Open source technology evolves rapidly, and new features, such as faster queries, could result
in increased productivity, improve cluster availability, or even competitive advantage in the
marketplace. Data engineers must stay on top of the latest versions to ensure that the
infrastructure runs at peak performance. Working with a knowledgeable vendor can help data
engineers mitigate the risk of missing even one update.