Traditional document review has often involved manual review of sets of boxes of paper documents in the lawyer’s office. The advent of information technology has resulted in exponentially increased volume of electronic (as opposed to paper) documents. While the advent of information technology has come with great deal of benefits, it has also compounded the burden of discovery as a result of increased number of documents subject to review in a typical discovery process. Thus the traditional method of discovery and document review find little applicability in the new age of information technology.
With the gradual proliferation of electronic documents, document review has also taken on an electronic format involving keyword search. Keyword search has been (and still is) the primary method used for searching electronic documents in e-discovery document review. Keyword search incorporate the use of connectors and wildcard to locate documents containing relevant search terms. For example, a keyword search using “auto*” will locate terms such as auto, automatic, automobile, automate, autonomy, automatism etc. Thus the use of keyword search in litigation document review would require the lawyer to craft search terms that would assist in locating documents relevant to the litigation.
While keyword search is cost effective and efficient where the size of the document to be reviewed in not large, there are problems associated with keyword search. First is the problem of under or over-inclusiveness which may result in the search capturing very few relevant document or a large set of irrelevant documents. Keyword search is suitable where the search is for specific word(s) in a document irrespective of the context in which the word is used. Thus the use of the keyword “auto*” for search in the context of automobile will eventually capture words such as autonomy or automatism which are not necessarily related to automobile. This will result in over-inclusiveness. Also, the use of the keyword “car’ may not capture “automobile” thus resulting in under-inclusiveness.
Another problem with keyword search is that it is ineffective where the document set is very large. In this regard, it has been noted that:
Data volumes are quickly becoming such that even with the best keyword search terms and an army of reviewers, it could still take months or years to sift through all the data and there would still be no guarantee of satisfactory results. (Lemieux and Baron).
The problems with keyword search is now advancing the push for the application of predictive coding, an AI technology, in e-discovery document review. Predictive coding is based on concept searching and has been described as the “next generation of technology for electronic discovery.”
How predictive coding works
Probably, the easiest way to explain predictive coding to a layperson is to use the idea of a sniffer dog. Training a dog to identify a substance by smell would require exposing the dog to the smell of the substance. Upon such exposure, the dog becomes capable of identifying the substance even from a larger crowd of dissimilar substances. Similarly, predictive coding entails ‘exposing’ the computer software to the ‘smell’ of documents it should ‘sniff out’ from a larger document set.
The first step in use of predictive coding for document review would require developing a “seed set”. This is a set of documents judgmentally selected as sample from the entire document set to be reviewed. A person very knowledgeable with the litigation (usually a senior lawyer) would then review each of the document in the seed set and code them accordingly. The coded documents from the seed set are then feed into the predictive coding software to “train” the software. The software analyses the seed set for common concepts. From this analysis, it develops an internal formula for future prediction.
The software is then made to apply the algorithm in coding documents from the universal set. Samples from the computer coded documents are then reviewed by the lawyers, corrected and feed back into the system. The “training” of the software continues with further coding and feeding of documents until the software “learns” sufficiently to achieve a desired or acceptable rate of accuracy. The software is then made to apply the algorithm to the entire document set, coding documents and classifying them accordingly.
Predictive coding is now being touted as a veritable tool for advancing electronic discovery reform. One advantages of predictive coding lies in its ability to “filter out large swathes of documents that are likely to be irrelevant so that the attorney does not have to waste limited cognitive resources analyzing them.” Hence, predictive coding is more likely to return consistent and accurate result than keyword search and linear review. It has been argued that the use of predictive coding allows for significant cost reduction in the document review process especially where the document size is extremely large. Though there are scanty empirical evidence to prove it.
In the face of technology’s changing landscape as well as increasing number of electronic documents in civil litigation discovery process, it is now important for the legal community to embrace this aspect of AI technology in law practice especially in the area of e-discovery document review. While this new technology holds a great deal of promise for civil discovery reform, its adoption in civil litigation discovery has been so slow that sometimes it has to be figuratively forced down the throat of unwilling litigants via a court order. Factors responsible for the slow adoption of this AI technology in e-discovery include “lack of adequate technical understanding by lawyers, lack of transparency of the process, and concern about accuracy of results and … “the uncertainty of judicial acceptance.””
NB: The next edition of this blog will examine judicial approach in various jurisdictions to the use of predictive coding for document review by litigants in civil litigation.