The enterprise storage market is huge. The the top 5 players in the space (EMC, NetApp, IBM, HP, Hitachi) sell about $20 billion a year of storage systems. Although that revenue isn’t going anywhere quickly, its interesting to look at some recent trends that are sure to impact this segment over time.
For many applications, the days of spinning hard drives are numbered. The MacBook Air I’m running on doesn’t have a spinning drive. Neither does your phone. Its all solid state storage. Server hardware is going in that direction as well with various permuations of flash/solid state memory. With falling prices per GB, increasing capacities, and higher reliability- making the case for solid-state storage is increasingly easy choice for applications that demand high throughput I/O or low-latency access to data. But its not just a raw GB cost vs performance argument. Along with the performance gains, solid state storage also gives a huge power consumption and space argument - which at scale makes a huge difference. Wired just published an article about this trend, which is worth a read
In the component market, you have “commodity” SSDs which are showing up increasingly. You then have “high end” solid state component players such as fusion-io and virident. Vendors such as Violin Memory are building out storage arrays, built only on (and optimized for) flash memory. And of course, traditional storage such as EMC are incorporating flash into their disk-based storage arrays and coming out with various flash based products, and acquiring vendors in this area.
The one place where flash adoption has lagged is in large public clouds - most likely because of cost and reliability concerns. That’s changing, however, with Rackspace, HP, and Microsoft Azure have all announced SSD-based storage options in the last few months. And its a good guess that we’ll see solid state options in most clouds over this year.
Bring processing closer to the data
Starting in the 90’s, the trend has been to centralize storage on appliances- and companies such as NetApp and EMC have benefitted greatly from this trend. From an administrative standpoint, this makes it easier to manage, backup, and provide high service levels around data. However, with data volumes growing faster than network capacity, and the necessity more real-time data processing, this trend is undergoing a small (but growing) reversal. Systems such hadoop closely couple compute and storage - as an explicit part of their design. In most cases, this means moving storage back into the server (where the compute lives), rather than try and centralize it in an appliance over the network. Some flash component vendors, such as Fusion-io, are pushing this approach as well - with specialized APIs that allow applications to treat solid-state devices as if it is an extension of RAM
Scale-out architectures & commodity hardware
Most database and file systems that have been designed in the past few years, have moved to a model that eschews using complex, high-end hardware and instead run on (possibly virtualized) commodity servers. HDFS and GlusterFS are examples in the filesystem world, and MongoDB, Cassandra, and Riak are examples of this trend in the database market. Unlike systems like Oracle RAC which require shared storage, most of these systems can work equally well (and sometimes better) on standard server hardware with direct-attached (solid-state or regular) disks. Instead of relying a single high-end server and a single centralized piece of storage hardware, these systems put the ‘smarts’ for things like high-availability into the software layer so that the database or filesystem spread across a large number of servers appears as a single unit to an application. This gives a theoretically infinite amount of scalability, as you can increase storage (and processing) capacity by adding additional servers into the clusters. This model is a very good fit for cloud-like environments where the emphasis is on standardized virtualized machines, rather than specialized hardware.
So what does this all mean? As solid-state storage becomes the default, I think we’ll see a lot more software optimized specifically for this mode of storage. With spinning disks, there is a huge difference between random and sequential I/O (for each random I/O the disks needs to physically spin to the right spot). However, with solid-state storage, this disparity is effectively eliminated. There are other differences as well for e.g. erases take an order of magnitude long than writes on most SSDs. So there is going to be a lot of innovation as software developers adapt to solid-state storage becoming the default. On the hardware side, incorporating more flash-based options into traditional storage appliances will definitely extend the capabilities of appliances. However, the trend towards coupling storage and compute on commodity hardware will pose more of challenge to enterprise storage vendors. We are still a ways away from having distributed databases or filesystems that run on commodity hardware with all the bells, whistles, tools, and management capabilities that the current generation of storage appliances have - but the this area is rapidly evolving.
What is your take on these trends?
blog comments powered by Disqus
- Its been an interesting year for open source software
- Open source advantages - lessons from 10gen and MongoDB
- Why I work at 10gen on MongoDB (and why you probably should too)
- Some (unscientific) evidence of NoSQL adoption