Failing to Understand Your Data Before You Begin Review is Perhaps the Costliest “Gotcha” of All


As we discussed last time, not understanding what “done” looks like can be the biggest “gotcha” associated with document review. Without defining “done,” you may never get there. But there is another “gotcha” that can sometimes be even costlier – not understanding your data before you begin document review. With business data doubling every 1.2 years, the importance of understanding the data within the collection (especially the portion of the data that doesn’t need to be reviewed) is more important than ever to keep review costs manageable.

While understanding your data in the era of “Big Data” may seem like a daunting task, the good news is that the right combination of expertise and technology today can help you understand your data as early as possible in the investigation to minimize the number of documents that actually need to be reviewed. The key word in that sentence is “early.”

The Importance of Early Case Assessment

Early case assessment (ECA) is about estimating risk (i.e., the cost of time and money) to prosecute or defend a legal case and a big part of that risk assessment today has to do with the ESI associated with the case. ECA has evolved to not only evaluate the risk, but also reduce that risk by enabling the review team to understand the collection they are reviewing by making key decisions about which documents are: 1) likely to be important to the case, 2) potentially important, and 3) clearly not important.

The ability to stratify your collection into these three categories early enables the review team to prioritize the review on documents that are most likely responsive, while eliminating other potentially large groups of documents from review altogether, ultimately saving on review costs while improving the ability to meet review deadlines.

Three Components of ECA Analysis

Just as there are three categories of documents to be classified, there are at least three components of ECA analysis to categorize those documents. They are: 1) Culling Unwanted Documents, 2) Identifying Key People and 3) Identifying Important Topics.

Culling Unwanted Documents

During ECA, there are a lot of documents that can be culled from the potential review population without any review required at all – they can simply be culled based on the metadata associated with the documents. Here are examples of metadata fields that can be used to cull unwanted documents:

Date Range: Documents outside the relevant date range should obviously be culled from review.
De-Duplication: Generating a Hash value digital fingerprint for each file and then automatically excluding additional documents with the same Hash value is one of the best methods to cull a considerable number of unwanted documents.
Sender Domain: Domain categorization to identify emails from non-responsive domains is another way to cull unwanted documents quickly and effectively.
File Type: Depending on the issues of the case, certain file types can also be excluded. For example, technical file types like AutoCAD files could be safely culled if there are no computer drawings at issue.
Key Terms: The presence or absence of key search terms could be the final method of culling unwanted documents, but it’s also the least predictable for a variety of reasons, so it’s important to test the results and what’s not retrieved to ensure that the terms are appropriately scoped.

Identifying Key People

While you may know some of the important people involved in the case, others may not be readily apparent until you look at the communication patterns of other key custodians. A communications analysis widget within an ECA tool can help identify those communication patterns that might lead to identification of other key custodians that might otherwise be missed. Drilling into communications between people to quickly analyze those communications can enable groups of documents to be quickly classified.

Identifying Important Topics

Conceptual clustering is another way of identifying important topics that lead to documents likely to be important to the case. A cluster wheel within an ECA tool can quickly identify additional concepts that may also be important to your case (which may have been missed when identifying key search terms. Your ability to drill into the cluster wheel enables you to quickly mark groups of documents that are clearly responsive or non-responsive, saving them from a linear review process.


ECA technology today provides terrific tools to quickly cull unwanted documents and prioritize what’s left, but without the expertise to maximize the effective use of those tools, employing repeatable templates to manage workflows, you can only go so far. Leveraging that expertise to create those templates using these tools to streamline the ECA-to-review workflow is the key to being able to understand your data.

Early case assessment involves leveraging technology and expertise to minimize the risk and cost associated with document review by understanding your data before you start review – but only if you assess your data early! Data may double every 1.2 years, but budgets don’t!

For more information about Sandline’s Managed Review services, click here.