Advice from counsel can predictive coding deliver on its. Predictive coding in the disclosure process global law. The first round of document training might be called the seed set, if you wish. The value of predictive coding in internal investigations the task of collecting, identifying and properly using key documents is the foundation of a successful internal investigation. Built in close collaboration with ediscovery leaders in both corporations and law firms, ringtails predictive coding delivers a practical and polished workflow. Opinion highlights questions surrounding proper predictive. Use small, statistically sound samples to get started quickly. Based on that, coding decisions are suggested for the remaining document. The rationale for disclosing seed sets seems to be that the seed set is the input to the predictive coding system that determines which documents will be produced, so it is reasonable to ask for it to be.
Empirical evaluations of seed set selection strategies for. Consilio first took a random sample of the entire corpusabout 1,400 documents to create the control set for the predictive coding software. Reviewers code label each document in the seed set as responsive or unresponsive and input those results into the predictive coding software. Predictive coding starts by training software with a seed set of data. Federal court declines to require disclosure of seed set for predictive coding. How predictive coding makes ediscovery more efficient thomson. Opinion highlights questions surrounding proper predictive coding protocols. Ballard spahr new york federal court issues landmark. Reviewers then test as described above to verify accuracy and. Some courts and commentators have taken the view that counsel should identify seed set. Seed set training data, aka training documents, tagging, tagging data, seed set.
The documents in the seed set may be selected based on random sampling or judgmental. Attorneys most familiar with the substantive aspects of the litigation code each document in the seed set responsive or nonresponsive as appropriate. A seed set is a sample of documents pulled from the entire group of documents that needs to. The ai or machine learning software analyzes the seed set and creates an algorithm for predicting the responsiveness of future documents. Thereafter in all subsequent training rounds, when step four. What is predictive coding, and how does it apply to. That subset of the collection is often referred to as the seed set of documents. A curated ediscovery case law database of predictive coding, tar, cal and machine learning related. The methods for use of predictive coding software have always been built into the. The plaintiffs asked the defendants to disclose the documents that were part of the seed set. Predictive coding electronic discovery best practices.
Predictive coding type software analyzes whole documents in a dataset, not just keywords, and uses advanced mathematics. After the seed set is created, it will be used to train the software and the true predictive coding process will begin. Predictive formula generated the software analyzes the seed set and creates an internal algorithm formula for predicting the responsiveness of future documents. Sample and refine users sample the results of the algorithm on additional documents and refine the algorithm by continually coding. Based on that, coding decisions are suggested for the remaining document universe, so you can immediately begin qc to refine the systems understanding. Once seed sets are established and trained, predictive coding technology can. Reviewers code each document as relevant or not relevant to the case and input this information into the predictive coding software. Since there is limited research on this important component of predictive coding, the authors of this paper set out to identify strategies that consistently perform well.
The original version is known as predictive coding, a type of artificial. A beginners guide to predictive coding knowledge base. A defensible process will look something like this. Visual predictive coding ringtail ediscovery software. Federal court approves the use of predictive coding. Foschio dismissed without prejudice the plaintiffs motion to compel the defendant to meet and confer to establish an agreed protocol for implementing the use of predictive coding software. Seed set creation and cost one of the critical issues in predictive coding is how the technology identifies the sample set that is going to be used by attorneys to train the technology for use with the remaining majority of the data population. The attorneys code seed sets on an iterative basis to further teach the predictive coding. The rationale for disclosing seed sets seems to be that the seed set is the input to the predictive coding system that determines which documents will be produced, so it is reasonable to ask for it to be disclosed so the requesting party can be assured that they will get what they wanted, similar to asking for a keyword search query.
Because predictive coding is still a nascent practice, it may be necessary to defend any use of the technology in litigation. Relativityone leverages a seed set of humancoded documents to train the. Repeatable, defensible workflow means everyday use of predictive coding. This seed set of documents would then be used to train the predictive coding software. While there are various ways to implement predictive coding, the process generally involves. That said, it is not a magic bullet on its own, according to one of the predictive coding users. Similarly, predictive coding software analyses multiple electronic documents that. Seed set definitions the initial training set provided to the learning algorithm in an active learning process.
How predictive coding makes ediscovery more efficient. Predictive coding workflow attorneys with substantive knowledge of the matter create a test set of documents, used to inform the predictive coding engine on the baseline of responsiveness. The predictive coding software in this step is analyzing all of the document categorizations made in step three for the initial run, the seed set. In in re biomet m2a magnum hip implant products liability litigation, no. Analysis of important new case on predictive coding by a. Technology shifts threatening to upend the legal industry. Where the predictive coding cognoscenti and courts disagree is whether a lawyers selection of seed set documents is protected from discovery by the attorney work product doctrine. This seed set is then fed into the predictive coding software, which trains the software to determine which documents are relevant, while suggesting other documents that may also be relevant. The rationale for disclosing seed sets seems to be that the seed set is the input to the predictive coding system that determines which documents will be produced, so it is reasonable to ask for it to be disclosed so the requesting party can be assured that they will get what they wanted, similar to asking for a keyword search query to be disclosed. The seed sets are then each applied to the relevant category, which begins the software training process meaning the software uses the seed. Predictive codinga dimension of ediscoveryis a process whereby attorneys train computer programs to identify potentially relevant documents within a large body of documents. Empirical evaluations of seed set selection strategies for predictive.
Predictive coding, seed sets, and the work product doctrine. What can a litigator do when there are hundreds of. The software analyzes the seed set and creates an internal algorithm formula for predicting the responsiveness of future documents. Essentially, you will define a model using the available ratings, codes. Essentially, you will define a model using the available ratings, codes, and document attributes in your project. The future of predictive coding rise of the evidentiary. Do you understand predictive coding software and how it helps legal teams.
21 576 1034 218 1053 865 858 275 1551 161 1299 790 1221 430 742 217 1 619 605 334 616 112 741 127 1056 873 68 432 1117 463 519 991 1202