We are proud to announce that ELEMENTS has achieved the highest results in the SPEC SFS VDA storage performance benchmark. This has been done together with our new technology partner ThinkparQ and their BeeGFS file system, which is well established in High Performance Computing, AI and Deep Learning, and Science. After introducing this file system to the Media and Entertainment industry, it was vital that we proved its capabilities in video production workflows. In this article, we will explain how we have achieved the high score, examine and compare benchmark outcomes in detail and discuss what these results really mean.
The goal was to showcase our new partnering file system technology BeeGFS, to underline its future potential for the Media & Entertainment industry and compare it to some well-established technologies in this segment. The Video Data Acquisition (VDA) workload of the SPEC SFS Benchmark is designed to simulate data acquirement from a volatile video source, measuring the number of video streams at roughly 36 Mbit/s each.
The VDA workload was executed on an ELEMENTS BOLT based storage environment running the BeeGFS file system. The test environment consisted of a comparable amount of hardware to those environments behind the three highest test scores. This was a conscious decision to enable a more meaningful and fair comparison of the performance. The summary of the ELEMENTS test results:
- Highest stream count (highest throughput) – 11000 streams (50708 MB/s). This equates to a 14.58% higher result than the previous high score.
- The highest stream count per storage device – average of 76.39 streams per NVMe device.
- Best overall latency curve – the latency remains both low and stable throughout the entire test run.
- Highest storage CPU efficiency – 23% more concurrent streams per CPU core than the highest scoring competitor.
- Highest client CPU efficiency – 107 % more concurrent streams per CPU core than the highest scoring competitor.
- Highest RAM efficiency – 68% more concurrent streams per Gigabyte of RAM used than the highest scoring competitor.
The benchmark results were officially approved by the SPEC SFS committee and published on their website in September 2021.
SPEC SFS performance benchmarking
Standards Performance Evaluation Corporation (SPEC) is a non-profit corporation formed to establish, maintain and endorse standardised benchmarks and tools to evaluate performance and energy efficiency for the newest generation of computing systems. SPEC develops benchmark suites and reviews and publishes submitted results.
The SPEC SFS’ Video Data Acquisition (VDA) benchmark is a well-established benchmark workload, used to evaluate performance by measuring the maximum sustainable throughput that a storage solution can deliver. At the same time, the required response time is also being tracked. The benchmark runs on a group of workstations, also known as clients, and measures the performance of the storage solution that is providing files to the clients’ application layer. In the VDA benchmark, the workload simulates applications that store data acquired from a volatile video source. Therefore, the benchmark can be best described as a ingest workflow performance test. The business metric is the number of video streams with each stream corresponding to roughly 36 Mbit/s bitrate.
BeeGFS is a parallel file system that started life as a research project within the German Fraunhofer Center for High Performance Computing, initiated to support performance-oriented use cases, including HPC, AI and Deep Learning. It has played a central role in impressive projects such as the first ever black hole visualisation in April 2019 and has found itself forming an important part of the organisation workflows of NASA, Shell and the Max Planck Institute. It is also the file system of choice for a number of TOP500 supercomputers. This innovative file system achieves high performance by transparently striping user data across multiple storage nodes. Furthermore, it can distribute file system metadata across multiple metadata servers. In other words, BeeGFS is specifically designed for concurrent access and cluster applications, and delivers high robustness and performance in situations of high I/O loads or patterns. Its flexible architecture allows for “on the fly” scaling of capacity and performance from small clusters up to enterprise-class systems with thousands of nodes.
BeeGFS is designed for Ethernet from the ground-up and offers highest performance with the native RDMA support and a native Linux client. Several other useful features, such as storage quotas and user data / metadata mirroring are offered as well. One characteristic of BeeGFS that we find particularly useful is that it can support hybrid storage environments easier than other solutions. It will enable us to build performant, yet efficient environments by mixing different storage mediums and build futureproof hybrid cloud solutions.
BeeGFS and Cloud
Another area in which BeeGFS truly excels is cloud workflows. Besides allowing for an easy integration of cloud instances such as AWS and Microsoft Azure into a hybrid storage environment, this file system offers one awesome innovation.
BeeOND is an on-demand deployment BeeGFS that lets you spin-up a file system on any number of machines with a single command. Such BeeOND instances provide a very fast and easy to use temporary buffer while at the same time keeping I/O load away from your primary storage cluster. In other words, it will soon be possible to flexibly expand the capacity and performance of your file system and enjoy all the benefits of cloud storage. As soon as your needs are met, the BeeOND file system instances can be removed just as quickly – a true, efficient cloud on-demand solution.
The performance of BeeOND was even tested by the Azure HPC team, demonstrating the first-ever one terabyte per second cloud-based parallel file system. To be exact 1.46 TB/s of read performance and 456 GB/s of write performance.
Our test setup
In a test in which performance is the only relevant metric, a typical approach to set the new high score would be to simply use more hardware. We, however, have decided to build an environment with a comparable amount of hardware to those environments behind the three highest test scores. This allows for a more meaningful and fair comparison of the test results.
Twelve units of the all-NVMe ELEMENTS BOLT are each only half populated with NVMe devices – 12 Micron 9300 devices per ELEMENTS BOLT (instead of 24) with a grand total of 144 NVMe devices used in the whole environment. Each ELEMENTS BOLT is running the BeeGFS file system.
Each storage node is connected via a 100Gbit to a 100Gbit Mellanox switch. The same switch has 20 load generating clients (ELEMENTS GATEWAY) connected via a 50Gbit connection. Also connected via a 50Gbit connection is a prime client node (ELEMENTS WORKER), on which the benchmark application is running. For administrative and management access, all storage and client nodes are connected to a 1Gbit house network.
Who are the three best-performing competitors?
WekaIO, a US-based private company specialising in high-performance storage, uses the SPEC SFS benchmark to showcase WekaIO Matrix, their flash-native parallel and distributed, scale out file system. The file system runs on six Supermicro BigTwin chassis, each consisting of four nodes, populated with 138 Micron 9200 NVMe devices.
Quantum Corporation, a public data storage and management company that is well-known in the Media and Entertainment industry for several innovations, particularly the StorNext file system. Quantum is also a long-standing and highly valued technology partner of ELEMENTS. For the test, ten F1000 Storage Nodes with a total number of 100 Micron 9300 NVMe devices were used on the Xcellis based StorNext7 (v7.01) platform.
CeresData is a company based in Bejing which specialises in storage technology. Their test environment was built on their ten-node storage cluster D-Fusion 5000 SOC, running the CeresData Prodigy OS.
While using a comparable amount of hardware, similar to the three best-performing competitors, ELEMENTS and BeeGFS achieved a higher number of streams and thereby a higher throughput than any other environment tested in in the SPEC SFS’ VDA Benchmark. The new high score is 14.58% higher than the previous, which translates to 6.25 Gigabytes per second. When comparing the relation of the maximum stream count to the number of used storage devices, similar results arise. The ELEMENTS environment displays the most efficient utilisation with an average of 76.39 streams per NVMe device.
While the straightforward metric of maximum stream count is interesting and easy to digest, more valuable realisations can be made upon deeper inspection of the details.
Using a comparable amount of hardware as the other solutions allows us to examine how efficient the ELEMENTS BOLT running BeeGFS really is in delivering performance. One interesting aspect to analyse is of course the CPU utilisation. When broken down to its bare essence, the role of the CPU in a storage system is primarily one of data processing, making sure information is interpreted and the instructions executed in the shortest possible time. It is a key component, and its efficient utilisation can have a large impact on the overall performance. Modern CPUs can have a varying number of cores. For this reason, in chart number 4, we have set the maximum number of streams in relation to the overall number of CPU cores that the environment has employed. The ELEMENTS BOLT environment running BeeGFS has achieved the highest number of streams (throughput) per storage CPU core. This doesn’t come as a huge surprise when considering that BeeGFS was first developed for High-Performance Computing use cases.
A factor which is just as important for achieving the highest possible benchmark results is the performance of the clients running the test application. Chart number 5 shows that ELEMENTS has achieved the highest number of streams (throughput) per client CPU core. In other words, the impressive benchmark results are in no way falsified by the use of excessively powerful clients.
Another important metric for media workflows is the latency. Latency refers to the time delay, usually measured in milliseconds, between the initial data request and its eventual delivery. In media workflows it is important for the latency to be both low and stable. In other words, upon pressing play, the video playback should start as fast as possible and always at roughly the same speed. Or in the case of the VDA workload, the ingested video is captured without missing any frames of the volatile video source.
Chart number 6 displays the measured latency during the test runs of all four environments. As the number of streams rises, it is expected that the latency will also rise. This is the result of an increased load on every component within the environment. The most desirable outcome of the test should produce a steadily growing line as the stream count increases with the overall change in the Y-axis being as low as possible.
It is reasonable to claim that the latency line of the ELEMENTS BOLT environment running BeeGFS depicts the best results on average of all four contenders. This is because the latency remains both low and stable throughout the whole test. The overall significantly higher latency of WekaIO and CeresData can possibly be explained by the different use cases that these companies are focused on, while the drastic latency increase during Quantums last test runs is most probably a consequence of the environment running at its performance limit. Other than that, the very low latency of their lower stream count runs highlights the strength of iSer / RDMA based block-level access of the StorNext SAN clients.
Memory (RAM) efficiency
Over the years, memory has developed into one of the most important components to support storage performance. RAM is the fastest memory at a computers disposal and is used to hold the data gathered from the systems non-volatile storage devices (HDD, SSD, NVMe) to be used for the CPUs immediate access. Beside caching processes of the operating system and running applications, memory is also used for caching file system metadata and database queries. By replacing a portion of metadata reads with reads from the cache, applications can remove latency that arises from frequent accesses. This means that increasing memory can be an easy way to increase the overall performance of a storage solution.
An interesting fact to point out is that ELEMENTS & BeeGFS have achieved the best performance while at the same time using the lowest amount of total memory.
- WekaIO: 11712 GB
- CeresData: 10624 GB
- Quantum: 3392 GB
- ELEMENTS: 2976 GB
While using exorbitant amounts of memory is in no way against the rules of the benchmark, one might ask, how efficient it is to use 11 Terabytes of RAM on a 342.73 Terabyte file system. 😀
Chart number 7 illustrates just how memory-efficient these different environments truly are by displaying the maximum stream count per Gigabyte of RAM used. This is one more metric in which ELEMENTS BOLT and BeeGFS easily come out on top.
ELEMENTS & BeeGFS
BeeGFS is an innovative file system that is highly performant, very flexible and has proven itself in a number of performance-demanding use cases. Now, after extensive testing in the ELEMENTS ecosystem, we are very happy to introduce it to the Media and Entertainment landscape. This cooperation will allow our clients to enjoy extremely high-performance Ethernet, the option for easy, on the fly expansion and in the near future, revolutionary cloud integrations and on-demand workflows.
The SPEC SFS VDA workload is a very well-designed test that allows its participants to give their all to showcase the performance in a fair and comparable manner. By being a particularly write-heavy workload, it simulates an ingest workflow most accurately. However, post-production workflows generally tend to be more read-centric due to the real time playback requirements. Currently, we are working on releasing a custom benchmark workload that can be used together with the new SPEC Storage Benchmark 2020 to gather metrics that match the requirements of video editing and playback more closely than just focusing on ingest.
When it comes to benchmarking in general, using only the minimalistic metrics makes it easy to win by simply utilising more hardware. A standardisation of the hardware environment would allow for a more meaningful comparison of the solutions.
A second takeaway is that a much more detailed look at the benchmark particularities is needed to truly gain valuable insights. For instance, it is apparent that when the results are compared to the amount of hardware contained within the environment being tested, Quantum StorNext (overall third place) overtakes CeresData Prodigy (second place) in just about every metric but the maximum stream count.
For these reasons, when the next benchmark high score is announced, we would like to invite you to inspect the details rather than concentrate on the narrowly defined main metrics.
What do you think about the benchmark and its results? We would love to hear your opinion.
For more detailed information about the hardware, settings and test results visit the SPEC SFS website.