7 Impacts the Explosion in Genomic Data Had on IT

by Daniel Chow

The New York Times published a story about the dramatic explosion in genomic sequencing data taking place over the past decade, and offered the following dramatic statement, “DNA sequencing is becoming faster and cheaper at a pace far outstripping Moore’s law. . .”

For the unaware, Moore’s law was written in 1965 by Gordon Moore, co-founder of Intel, and stated that, “the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit was invented, and that this trend would continue for the foreseeable future.”

In essence, Moore’s law described the rate at which computing becomes faster and cheaper. And when he wrote it, I seriously doubt Moore had genomic data in mind, considering the first genome sequence was not conducted until 2003 for a mere $3 billion. The cost of sequencing a human genome — all three billion bases of DNA in a set of human chromosomes — has since plunged to roughly $2,000 today, depending upon your information source.

That is a decline by a factor of more than 1.5 million-percent over a decade! By contrast, computing costs would have dropped by perhaps a factor of four in that time span, so in essence genome data is growing at such a rate that the solutions to this growth are actually outpacing Moore’s law by a ridiculous amount. What impact has this rapid growth had on IT departments at universities and medical organizations?

1) Data archiving needs are growing exponentially and this has spawned a new need for commodity storage, basically what a system provider like Silicon Mechanics would call, ‘off the shelf’ hardware

2) Storage subsystems are still a laggard in terms of development and cannot readily meet the demands created from genomic sequencing data at the appropriate price point

3) Networking is becoming increasingly more and more complex in an effort to support these large volumes of transient data

4) There has been a movement of late to reduce costs by allowing scientists and researchers to create and manage traditional IT architecture, but the problem with this trend is that most scientists and researchers don't have the appropriate expertise for that type of effort

5) The desire to share large datasets to promote collaboration is often overlooked in the architecture phase and not addressed until it is in the execution or post-implementation phase

6) Data Integrity (bit rot) is not being accounted for up front, and typically not until data corruption occurs, which is too late

7) There is a growing need in IT departments for experts in the areas of data analysis and network security As genome sequencing continues to drop in price and the data continues to increase, IT departments, especially those in the research arena, will continue to struggle to find cost effective solutions that allow for commoditization and customization. One researcher, however, has found a working solution that has not only saved his organization a considerable amount of money, but also the headaches that come with forklift upgrades popular with traditional hardware makers. Please take a moment to review the attached video detailing this installation.