|
Discovering electronic evidence, particularly from
a large organization, can be more time-consuming and
expensive than a litigator might imagine. Unless a litigator
understands the many places where electronic evidence
may be found, and how information flows within an organization,
electronic discovery will likely be an unproductive,
but expensive, hit or miss affair. This article proposes
some thoughts about making discovery of electronic evidence
a systematic and increasingly efficient process. Given
that the information to be discovered is, by definition,
already electronically searchable and subject to relatively
easier organization before trial, systematic electronic
discovery has a high payoff.
Although the nature and flow of information is frequently
idiosyncratic, varying greatly from organization to
organization, it's safe to generalize that the flow
of information between different components of any organization
is its life blood. When confronted with major litigation
that demands sophisticated discovery, whether electronic
discovery or traditional paper document discovery, identifying
and modeling the manner in which information moves within
your target organization is a key aspect of knowing
what documents and information to look for, who the
major actors are, where to look for that evidence, and
how to secure it. Even if, by some quirk, your own client
has unlimited finances and the ability to sift through
every single record in any organization, there are some
obvious drawbacks. Firstly, this would take years and
your case might never get to trial or you'll run up
against discovery deadlines. Secondly, it would be so
expensive as to make the cost disproportionate to almost
any litigation advantage. Thirdly, of course, a court
would realistically enter a protective order barring
that sort of discovery. Finally, even if you could surmount
these problems, it's most unlikely that you would be
able to sort through the masses of essentially irrelevant
information to effectively find and use the few gems
that will make or break your case before a jury. Intelligence
agencies have a similar problem - sorting through mountains
of contradictory, secondary and marginal data in order
to find the few gems that show what's really happening.
It's fair to say that gaining an early, effective,
and systematic approach to your electronic discovery
efforts can make or break your case. Achieving that
focus requires that you understand how information flows
within your client's organization and your opponent's
organization and understand some basic theoretical concepts
about information.
Understanding how information flows within a particular
organization is useful because it helps us identify:
- The types of data that are collected and the form
and method by which such data is collected.
- The types of formal computer data structures and
paper records that might be retained.
- Where we are most likely to find pertinent records
and data.
- The identity of the principal actors, not all of
whom may be obvious from an organizational chart.
- The types of informal records we might expect to
find that are maintained idiosyncratically by involved
individuals.
- The most likely areas of concentration, which I'll
call "nodes," where particularly rich concentrations
of useful documents might be found, which in turn
may help us further focus our efforts in a more productive,
cost-effective manner that in turn sharpens our litigation
strategy.
- Where potential breakdowns in communication are
most likely to occur either in our client's organization
or in the target organization, which assists us in
understanding what may have gone wrong, or conversely
what did not go wrong - a crucial part of any plaintiff's
or defendant's case.
- Determining which participants seem to have the
greatest affinity for particular types of records
and who is talking to whom about the issue at hand,
helping us to hone in on particularly rich lodes of
electronic discovery.
Organizing and Using the Fruit of Electronic Evidence
Discovery
In order to track, analyze, and use this sort of data,
a litigation database of some sort is almost mandatory.
For cases in which you might expect to find tens or
hundreds of thousands of documents, Summation is usually
the program of choice. For small to medium cases, though,
I personally prefer CaseMap 4, along with its associated
time line program TimeMap. These programs are particularly
flexible means of organizing discovery, understanding
significant time lines, and using the resulting data
at trial. You can download 30 day evaluation copies
from www.casesoft.com.
After 30 days, the full-featured evaluation version
won't work unless you purchase a permanent activation
code from CaseSoft. CaseSoft charges $495 for one user,
with discounts for multiple-user purchases, a cost that
I consider worthwhile given the power and value that
this program brings to small to mid-sized litigation.
If you have scanned and imaged any discovered electronic
and paper discovery using a standard program like Adobe
Acrobat, then you'll be able to directly associate a
PDF file of each discovered document directly with its
associated CaseMap or TimeMap entry and call up the
imaged document with a single click within CaseMap -
a fast and neat way to work with the discovered documents.
You'll need the full Acrobat program, not the limited
feature data reader available over the Internet. Plan
on spending about $270 for Acrobat.
If you're dealing with many possible actors and large
quantities of information, then an industrial strength
management system such as IBM's Lotus Discovery Server
2.0 will greatly reduce the manual effort otherwise
needed to identify and model information flow within
a large entity. Programs like Discovery Server 2.0 automatically
sort through the discovery target's electronic systems
for relevant information, identify potentially rich
"nodes," which may be particular authors,
recipients, departments, or document types, and then
map the relationships between the potentially most profitable
targets for more focused discovery. The resources to
do this sort of highly automated discovery will be expensive
and will require counsel to formulate reasonable ground
rules, probably incorporated into a discovery order
using a neutral third party to conduct such discovery
and protect privileged documents. If you're not using
an integrated knowledge management program like Discovery
Server, then you'll also need an advanced indexed search
program with highly specific Boolean search functions,
such as Concordance or DT Search.
Data mining software, often but not always used on
mainframe computers, is also potentially useful in making
sense out of large masses of otherwise undigested data.
Some well-established data mining software, such as
SPSS (now in version 11.5), works statistically with
numerical data. Other knowledge management programs,
such as Lotus Discovery Server, discover and group related
textual documents, explicitly mapping the links between
specific authors and various clusters of potentially
interesting documents. The ability of Discovery Server
to map the relationships between individuals and clusters
of pertinent documents potentially makes it a very powerful
electronic discovery tool in highly complex corporate
and organizational situations, but setting up and using
this tool requires experience and technical savvy that's
beyond most attorneys.
Concept searching may be considered another form of
textual data mining. Some very basic concept searching
is employed with Internet meta-search engines. You'll
be able to find a somewhat more complex and effective
example of concept searching at the National Criminal
Justice Reference System web site,
Using NCJRS's mainframe-based Internet search engine,
I did a quick concept search looking for studies that
measured the accuracy of psychological evaluations in
predicting future violence, a fairly complex concept,
and was very impressed by the precision of the weighted
search results, which were exactly on point. Copernic
Enterprise Server ( www.copernic.com
) has similar capabilities: it's designed to work with
all common file formats and across an entire company.
Copernic Enterprise Server has real potential as a low
cost and relatively simple electronic evidence concept
search tool.
Why Study Your Opponent's Information Patterns
The purpose of understanding how information flows
within an organization is to make your discovery a deft
and swift scalpel rather than a blunt instrument. Because
data and the manner in which information flows may vary
widely from organization to organization, you will need
to have a good understanding of the organization's informational
content, the people who create it, the types of documents
that are used, the numerous categories and subcategories
that define the information kept by an organization,
how and where the information flows, and where the organization's
information is stored, indexed, backed up, or otherwise
maintained. It's also important to be able to quickly
spot the most significant and important information
(and significant gaps in that information), rather than
be bogged down by reviewing, and possibly being distracted
by, numerous documents and records that have only a
tangential relationship to the issue at hand. Finally,
you'll also need to understand the relationship between
people who create the information, the documents and
information they create, and the end users of those
documents and information. You'll need to understand
who is an internal expert, or at least highly knowledgeable,
about topics critical to your search, and this may not
always be obvious from organizational tables. You'll
want to identify who is most often handling certain
types of information. Finally, you'll also need to understand
when and how data errors and distortions occur within
a particular organization's internal data flow, because
not all communication within an organization is clear,
concise, and accurate.
Getting Some Litigation Guidance From Information
Theory
Information theory is a theoretical mathematical description
that models how electronic communication works. Although
primarily applicable to assessing electronic communication
systems, information theory and related cybernetics
theories include several theoretical concepts that provide
useful analogies to the discovery process, such as the
"noise" inherent to any communication process
and "feedback loops." Information theory concepts
are now used much more broadly - indeed, even the National
Institute of Health funds entire laboratories devoted
to applying information theory to the biology of living
organisms.
As analogies, information theory and cybernetic concepts
have several important lessons for litigators struggling
to undertake very large electronic discovery efforts
in a systematic and productive manner. Luckily, because
electronic evidence discovery is typically already in
a searchable format that gives us the ability to zero
in upon, and bring us closer to, the original data,
information theory concepts are particularly applicable
to electronic data discovery have the potential to greatly
sharpen our discovery efforts.
Here are some crude, but real world, examples of how
our thinking can be focused and our discovery efforts
sharpened. These concepts apply whether you are asserting
discovery demands or attempting to comply with reasonable
discovery and disclosure while protecting your client
from over-reaching. When we talk of "information"
within an organization, we refer to any records and
human communication, whether or not that communication
contains factually accurate data. Indeed, there's a
lot of inaccurate information floating around.
Because the author typically represents plaintiffs against
larger entities and corporations, these concepts are
phrased from the point of view of the party seeking
discovery.
- As information is repeated within an organization,
its meaning often becomes more and more diffuse until
much of the original information becomes highly uncertain
and very possibly lost. This is not some esoteric
scientific concept - it is simply the underlying basis
for the hearsay rule, stated more specifically and
theoretically.
- It is not possible to readily reverse the noise
process and to filter out any "noise" in
the communications process. That means that we cannot
work backwards and arrive at a completely accurate
and certain knowledge of the original information.
That's do to the human imprecision and uncertainty
typically introduced into information as it is repeated
throughout an organization. Thus, it is by far the
most accurate, of course, to obtain documents that
have been written directly by the person whose actions
are being questioned or who recorded the original
data.
- As information is repeated, it tends to become distorted
and thus more difficult to ascertain the basis upon
which an organization did or did not act. The information
ultimately motivates and controls an act or omission
becomes increasingly uncertain as we rely upon diffused
data and secondary sources. Thus, again, it is by
far the most accurate and powerful when the documents
upon which you might later rely in litigation have
been written personally by the actor whose acts are
being questioned. Bill Gates's emails come to mind.
- Even when the information is transmitted in a relatively
certain and unambiguous manner, the recipient may
not understand ambiguous content because of his or
her own experiences, biases and idiosyncratic use
of language. Thus, extra weight should be placed upon
clear declarations of intent and knowledge when made
by directly involved actors, particularly when the
document's recipients then act in conformance with
clear directives and statements. An estoppel situation
may have occurred.
- As "noise" and blurring increase, the
resulting imprecision tends to drown out the true
message, the "signal." In electronic communications,
such as long distance radio or telephone relays, communications
engineers use a concept of a signal to noise ratio.
When the signal is high and the noise is low, then
there is a great deal of certainty about the true
data state. When noise predominates, then the signal
to noise ratio drops and there is much more uncertainty
about what is in fact being said and done. Thus, when
only one or two persons are speaking for an organization,
particularly when such people have been conferring,
much higher reliability may be placed upon their statements
as reflecting the true state of organizational intent
and action. We have a high signal to noise ratio.
On the other hand, when there are many different actors
and persons influencing the process, and they are
all producing documents that may be pertinent to a
questioned transaction, or when they are issuing contradictory
statements about what has been happening and why,
then the organization's signal to noise ratio becomes
very low. As a result, it's harder to understand what's
important, what's not, what really happened, and why.
- As uncertainty increases, our ability to discern
what is true and important in proving what actually
occurred (the crux of any litigation) decreases. Thus,
we should look, for example, for a series of documents
written by a single major actor which state a consistent
theme either throughout or alternatively which initially
state one consistent theme or position and then sharply
deviate to a new direction, corporate position, and
theme. In the latter case, the actual reason for the
sudden change may be very interesting and probative.
- There will always be some uncertainty about finding
specific information, although the amount of uncertainty
can be considerably narrowed, indeed calculated with
a fair degree of precision, as we get more experience
searching through large amounts of data. Standard
information theory equations can help you decide when
further discovery efforts or efforts to find data
and comply with discovery requests will likely be
unproductive, or, conversely, when further discovery
will probably be money well-spent. Particularly when
defending against accusations that discovery compliance
is inadequate, uncertainty calculations showing the
low probability of finding any more discoverable data,
even with massive and costly efforts, may be very
useful in establishing a well-founded basis for a
protective order or resisting further discovery attempts.
- Look for feedback loops. Communications and documents
do not exist in a vacuum. They're usually made for
a specific purpose and will typically elicit one or
more rounds of responses and comments by recipients
and other interested persons. The concept of feedback
has been corrupted in the popular mind to something
akin to interpersonal communication, but it's much
more. The American Heritage Dictionary defines
a feedback loop as "The section of a control
system that allows for feedback and self-correction
and that adjusts its operation according to differences
between the actual output and the desired output."
Although that definition may seem most akin to the
thermostats that keep our homes at a constant temperature
or an aircraft's autopilot, the concept of feedback
loops has surprisingly strong analogies in everyday
communication. Look, for example, at email message
threads and replies - these are verbal feedback loops
where the initial authors and recipients regularly
exchange roles, clarify concepts and intents, and
expand or narrow a topic of discussion. Consider the
documents and counter-documents drafted and circulated
back and forth within a business that's trying to
decide some issue crucial to your litigation - for
example, whether to correct a known product defect.
For that matter, look at the sequence of summary judgment
motion practice: Motion, opposition, reply, oral argument,
decision, appeal. Built into these procedures is an
implicit feedback mechanism designed to ensure that
faulty evidence, arguments, and decisions are ultimately
corrected. Particularly in larger corporations and
other entities, finding these feedback loops identifies
the important actors and their relationship to each
other and to the issue at hand, helping you focus
your discovery efforts. The concept of feedback is
useful to litigators in other, more direct, ways as
well. For example, if you're trying to prove deliberate
intent, what stronger evidence than plotting out a
time line showing the incriminating documents and
responses forming a feedback loop? Again, certain
knowledge management programs like Lotus Discovery
Server are specifically tuned to map out these relationships.
The Problems Inherent In Language Usage
Communication problems are not unique to the litigation
process: It's true of all human endeavor and all human
interaction. Even in physics, the hardest of the "hard
sciences," discerning which scientific experiments
should be relied upon and which contradictory data should
be discarded has always been a fundamental challenge.
One of the best approaches to reducing the noise inherent
to the litigation discovery process is the use of a
software program which acts as a filter to help us sharply
focus upon what we have in our case while reducing noise,
redundant or distracting documents and information.
Filtering out discovery "noise" was hard work
for anyone doing traditional manual discovery. If you
had a few hundred thousand documents to review, the
process might take years and it is still highly likely
that you would either overlook the most important documents
or perhaps miss their significance and relationship
to other discovery. Modern electronic discovery tools,
using indexed search programs with highly specific Boolean
search functions and thesaurus-based searching tremendously
speed our search process while assuring a much more
comprehensive search. Similarly, advanced knowledge
management programs, such as Lotus Discovery Server
2.0, discern the central themes to any organization's
information, organize it into appropriate subcategories
by content, ascertain which people have particular affinities
for what sorts of information and documents, determine
which documents are the most important or frequently
used within an organization, and generally relate people
and discoverable documents. Although complex to set
up, an automated knowledge management tool like Lotus
Discovery Server can be the filter to bringing your
discovery into razor-sharp focus.
Of course, using a crude electronic filter to sort
through electronic discovery has its own drawbacks.
Sometimes, such a filter is too selective, reducing
the serendipitous but crucial discoveries. And, it's
highly probable that searching for specific words or
phrases will miss some very important evidence because
human beings are not entirely predicable in their use
of language and idioms. Our written and spoken words
are imprecise from a computer's almost inhumanly precise
point of view. For example, while you or I would understand
from its context an ungrammatical conversation or document,
or one filled with idioms, pronouns and synonyms, searching
such materials electronically would probably miss many
important documents and concepts. Indexed search engines
can partially overcome this problem by searching with
common synonyms as well as the original search term.
For example, if you were deposing expert witnesses in
an airplane crash, the witness might refer to the "aircraft,"
a vocabulary term that might not be found by a relatively
simple search engine looking for "airplane"
or "plane." Yet, the witness's phrasing is
entirely understandable. A good synonym-based search
program would build a thesaurus of synonymous terms
such as "airplane," "plane," "aircraft,"
"Boeing," "727", or "B727"
or "airliner", all of which would realistically
relate to the same concept, a Boeing 727 aircraft that
crashed.
As a result, although a good search program can partially
correct vocabulary variations and grammatical imprecision,
there will be inevitable noise that causes at least
some imprecision in the original document or transcript
and also in any later searches. Further, you, the searcher,
are also human and have your own implicit search terms
and concepts in mind, which arise at least in part from
the culture in which you were raised and educated and
which may or may not match the words used by the original
author or witness or by anyone who has previously prepared
the litigation database or indexed any documents in
it. One solution to this potential mismatch is to first
review the raw discovery product using a concordance
program and then, before actually indexing the discovery
documents, work out a very carefully controlled indexing
and litigation database vocabulary, carefully training
all indexers and users.
Even then, the electronic litigator must be prepared
to cope with human imprecision and linguistic variations.
Some years ago, I did an experiment where I used a number
of different legal research tools to look for leading
Alaska Supreme Court decisions relating to slip and
fall accidents. I already knew two leading cases but
wanted to see how readily legal research tools would
help a novice find the two leading cases. Searching
through a legal research database seemed to be the tightest
possible test - generally, West's attorneys, practicing
lawyers and Alaska Supreme Court Justices seem much
more likely to use and reuse the same learned vocabulary
and concepts to describe similar situations, at least
compared to how an average corporate officer might phrase
documents.
One would think that a generally precise and consistent
database of this sort would produce nearly identical
search results no matter how the electronic search is
conducted or phrased and yet, despite my own expectations,
each different search method (full text, key number,
searching for specific words or phrases, natural language
searching, and Boolean searching) produced very different
results. None of them found the leading case or alternatively
returned so many hundreds of cases as to lose the desired
case in the background noise. Had I been a litigator
who did not already know that leading case, I would
not have found it and my briefing would be far more
precarious.
Thus, it would seem that even relatively seasoned lawyers
and State Supreme Court Justices do not always use identical
words and phrasings and therein lie lessons for the
electronic litigator. Noise is an inevitable concomitant
of human communication. As a result, a highly automated,
narrowly focused brute force electronic search will
likely not find all critical documents and data - the
breadth or narrowness of your electronic search will
inevitably balance the convenience and speed of a tightly
focused search against the increasingly greater manual
effort inherent to increasing the probability of finding
every critical document. Even when a search is broadened
greatly, a litigator cannot be assured of finding everything
- he or she can only be assured of finding more data
to review and consider. Thesaurus-based weighted searches
using concept searching and lists of search term synonyms
seem to be the most effective single search method but
multiple searches using varying search phrasing and
differing search methods increase your chance of finding
the smoking gun or leading case.
Most estimates of business recordkeeping suggest that
more than 90% of all primary data now resides in computer
systems rather than on paper. Because technology now
makes creating and replicating that data very quick
and easy, the amount of potentially discoverable data
has increased dramatically, probably exponentially,
over the past twenty years or so. As a result, our discovery
burdens are potentially much greater, whether we are
asserting discovery or seeking to comply with reasonable
demands.
Effective technology, used in a systematic and well
thought out manner, is the only way to deal effectively
with today's information overload and the need to find
the critical items that make or break our case.
Top
Joseph Kashi is an attorney
and litigator living in Soldotna, Alaska, who is active
in the Law Practice Management Section and a technology
editor for Law Practice Today. He has written regularly
on legal technology for the Law Practice Management
Section, Law Office Computing Magazine and other
publications since 1990. He received his B.S. and M.S.
degrees from MIT in 1973 and his J.D. from Georgetown
University in 1976, and is admitted to practice in Alaska,
Pennsylvania, and the Ninth Circuit and the U.S. Supreme
Court.
|