IT Parallel File Systems Have Evolved - And What That Means for Enterprise IT
Special Contribution for CIOReview – Big Data Special July 2016
By Daniel Chow, COO, CTO, Silicon Mechanics

The transition of HPC into the enterprise continues to face challenges, but not for reasons you might suspect. A legacy computing mindset, cost structures, and infrastructure issues have largely shouldered a majority of the blame for full adoption of HPC into the enterprise, but the reality is that the network itself is the bottleneck. In order for HPC to fully grow into an enterpriseready play, existing Ethernet-based network attached storage needs an adjustment and I’d argue that the adjustment is already available – parallel file systems.

The problem right now is that parallel file systems, especially Lustre, are largely misunderstood in the enterprise and therefore the current adoption rate is very low. According to Intersect 360 industry analyst Addison Snell, “Lustre is still deployed by only about one-tenth of companies with high-performance or
big data workloads.”

I believe that will change, and quickly. Customers on the enterprise level increasingly need storage platforms that optimize traffic from applications to the storage layer, featuring interoperability and support for complex and multi-protocol networks, especially with the growth of Big Data. And they need hardware and customized software defined storage platforms that can help them scale quickly. In fact, I believe that Big Data initiatives and the need for scalability will be the driving forces behind the enterprises fully adopting Lustre. A parallel file system easily performs the feats mentioned above, and Lustre’s list of benefits includes the following:
  • Highly scalable storage platform that can absorb almost any type of data influx
  • Stores application data persistently
  • Provides global shared namespace (files, directories)
  • Designed for high performance and to operate over high speed networks (IB, Myrinet, Portals) and optimized for I/O path formaximum bandwidth
  • Very few storage platforms have the ease of scalability and modularity of Lustre due to how it manages meta-data and how it stores the actual bits and bytes (via payload)
While I think there could be immediate benefits for an enterprise enabling Lustre, I also think CTOs should take their time in making sure an HPC environment is right for their specific enterprise. If the company is not operating in a traditional HPC environment, that doesn’t mean Lustre is entirely out of the conversation, there are considerations a CTO might want to consider when evaluating a Lustre install:
  • Cost: since Lustreis software defined reference architecture its upfront costs are lower than most generally available parallel file systems, which are appliance based.
  • Scalability: due to that flexibility of being able to build a customized storage platform, it is also “right sized” for performance and capacity. There won’t be a need to overprovision. You can achieve scalability with a scale out architecture that adapts and keeps pace with demand.
  • Consolidation: a Lustre install should lead to lower TCO by decreasing the need for additional staff to coordinate various maintenance windows during required software updates between vendors and the business units. Imagine having 3 separate silos!
  • Easy data migration: IT administrators won’t need to setup complex replication or file sync policies between Lustre and ZFS. Often times, especially with larger datasets, the backup and replication of data has historically been a challenge and often a very time consuming process. Lustre has built-in backup and recovery mechanisms via snapshotting. In addition, following best practices like backing up the OST configuration will mitigate hardware failures, unlike other traditional turn-key platforms that are built on basic RAID protection.
  • Plug-in advancements (Hadoop, ZFS): allows administrators to avoid data silo issues and assists them in managing and monitoring usage (capacity planning) from a single pain of glass. In the past an admin would have to monitor the three platforms independently, one for ZFS, one for Hadoop, and another for Intel Lustre.
The biggest question facing Lustre adoption, in my opinion, is whether or not it has evolved enough from its origins in the education and HPC community to make it enterprise ready? Lustre wasn’t considered ready for the enterprise until the most recent 2.5 release, which added some key features related to reliability and availability, including changes in the way it handles metadata and hierarchal storage management (HSM).

I’m convinced that with the new features in 2.5 Lustre has made the necessary evolution and is now ready for implementation as a parallel file system at the enterprise level. From a feed and speed perspective, Lustre has the ability to make any CTO drool. The hard part is in deploying Lustre. Start by implementing strict management controls and then slowly decommission the legacy NAS, SAN, GPFS solutions you are using now and migrate to Lustre. It will assure you are future-proofing your datasets, gaining more reliability and availability, and helping you avoid those dreaded fork-lift upgrades.