FIRE WIRE
There is a Lot More Data Out There...
Several years ago I wrote a column for Law Practice Today about the volume of data the world was creating. It was based on a 2002 study released by UC Berkeley’s School of Information Management and Systems. As I noted then, the study reached some startling conclusions:
- The world produced as much as 5 exabytes of new information in 2002.
- That volume had doubled in the preceding three years.
- Almost 18 exabytes of new information flowed through electronic channels in 2003.
How big is an exabyte? A billion gigabytes or a trillion megabytes. The progression goes from megabytes to gigabytes to terabytes to petabytes to exabytes in multiples of 1,000. That’s ten to the power of 18.
To put things in perspective, it would take over 12 million desktop computers to hold an exabyte of data (if we assumed that each computer had an 80 gigabyte hard drive, which is pretty generous sizing even today).
Put another way, if you scanned all of the books in the Library of Congress you would come up with approximately 20 terabytes of digital data. Using that as a reference point, the world created enough data to fill approximately 250.000 Libraries of Congress in 2002. The figure simply boggles the mind.
IDC’s 2006 Study
It gets worse. Last month IDC released the results of a 2006 study focusing on the same topic. They estimate that the world created 161 exabytes of information in 2006. That reflects a 32 fold increase over 2002. If my math is right, that equates to an 800% annual growth rate. That really is a lot of data.
By way of analogy, if you could print 161 exabytes, you would end up with 12 stacks of paper extending from the earth to the sun. That would come to about 6 tons of paper for every person on the planet. It would be enough paper to wrap the Earth four times over.
The authors reckon that the trend will continue, with the amount of data being created in 2010 growing to 988 exabytes. That represents another 6 fold increase. This time the resulting stack of paper could go from the sun to Pluto and back.
What is causing this growth? The Internet is certainly at the center of it. Companies like YouTube didn’t exist a few years ago. Today YouTube hosts 100 million video streams a day. But corporate data is growing too. According to the CIO of Chevron, his company is accumulating data at the rate of about 2 terabytes a day. And Wal-Mart, another example cited by the authors of the IDC study, has a database of customer transactions that had grown to 500 terabytes by 2004.
Why is this important? Because a lot of that data could be discoverable. The Radicati Group estimates that the average corporate email user sends and receives a total of 133 messages, or about 16.4 megabytes of data, per day. In a different report, IDC estimated that the volume of business email sent annually worldwide now exceeds 3.5 exabytes, a figure that doubled in the last two years. I have also seen reports suggesting that daily email traffic is reaching 60 billion. Even if you cut any of these figures by half, you still have a lot of email out there that might need to be collected, reviewed and produced. This doesn’t come as a surprise to any corporate IT executive.
IDC suggested that 20% of the data being created is subject to compliance rules and standards such as Sarbanes, HIPAA and various SEC and other government regulations. That suggests that corporate legal has a substantial amount of data to worry about whether for litigation holds or just implementing a records retention program. It also means that the market for electronic discovery providers has not yet peaked.
Litigation 2.0
Welcome to Litigation 2.0, a term Dennis Kennedy recently coined in his blog.
Litigation 2.0 is a necessary outgrowth of the Internet Age and the resulting explosion of electronic content. It requires new tools and methods to handle discovery and compliance activities. Gone are the days when the lead trial lawyer could review the file before trial and be familiar with every document in the case. Likewise, gone are the days when a couple of paralegals could man the war room and flag hot docs with yellow stickies. They don’t make enough stickies for this job even if you could field the team to stick them on the pages.
Instead, Litigation 2.0 requires database and full text search tools and teams of reviewers that may be spread across the country (or around the world). Concept search, visual analytics, and near de-duplication are all part of a new approach that accepts the fundamental tenant that nobody has the time, nor can they afford, to at every document produced in a case. It is about litigation that may occur in multiple locations, documents and data of multiple types and in every possible language.
We are just beginning to explore the contours of Litigation 2.0, and I don’t intend to take it any farther in this column. But with 161 exabytes looming in the wings (or even a lot less than 161 exabytes), you know the journey will be interesting. As Yogi Berra once said: “The future ain’t what it used to be.”
The IDC study is presented by EMC
This helpful chart came from the Berkeley study:
| Table 1.1: How Big is an Exabyte? | |
|---|---|
|
Kilobyte (KB) |
1,000 bytes OR 10 3bytes |
|
Megabyte (MB) |
1,000,000 bytes OR 10 6 bytes |
|
Gigabyte (GB) |
1,000,000,000 bytes OR 10 9 bytes |
|
Terabyte (TB) |
1,000,000,000,000 bytes OR 10 12 bytes |
|
Petabyte (PB) |
1,000,000,000,000,000 bytes OR 10 15 bytes |
|
Exabyte (EB) |
1,000,000,000,000,000,000 bytes OR 10 18 bytes |
Original Source: Many of these examples were taken from Roy Williams "Data Powers of Ten" web page at Caltech.
About the Author
John C Tredennick Jr
EmailEditor in Chief
John Tredennick spent more than 20 years as a nationally-recognized trial lawyer and litigation partner with Holland & Hart in Denver Colorado. One of the early pioneers in litigation technology, John published the ABA bestselling books Winning with Computers, Volumes 1 and 2 in 1990 and 1991. Since then he has authored two other book on litigation technology along with scores of articles and columns for the leading legal publications. He also regularly speaks at legal technology conferences around the world.
In 2000, John founded Catalyst Repository Systems (formerly CaseShare Systems). Catalyst provides secure, online repository systems to help professional teams manage large volumes of electronic documents and work together on complex legal,financial and insurance matters. A pioneer in the industry, Catalyst is used by many of the largest corporations and law firms in the world.
Technology Calendar
Upcoming Technology Events
Conference
ABA TECHSHOW 2009
American Bar Association
Law Practice Management Section
April 2-4, 2009











