E-Discovery Basics for IT Teams

This article explains IT compliance and security in a specific industry or context. It is not professional compliance advice. Consult with professionals for guidance specific to your situation.


E-discovery is the process of identifying, collecting, reviewing, and producing electronically stored information in litigation. For IT, it's where compliance meets obligation — where the routine operations of email, file storage, and data management become evidence in legal disputes. If your firm hasn't been through significant litigation, e-discovery feels abstract. If you've been through it, you understand that e-discovery is one of the most expensive and operationally disruptive parts of modern litigation, and IT's role in making discovery feasible or infeasible is more important than most executives realize.

The scale of the problem comes from the sheer volume of electronically stored information modern organizations generate. A litigation over a product dispute might require reviewing millions of emails, documents, instant messages, and database records. The cost of reviewing all of that information for relevance and privilege can exceed the value of the dispute. This is why e-discovery has become one of the central issues in modern litigation — how do you handle the volume? How do you keep costs from being prohibitive? What technology can help? The answers to these questions determine not just the cost of defending the litigation, but whether settlement is even possible.

E-Discovery as a Formal Process

E-discovery follows a structured sequence that starts long before documents are produced to the other side. The sequence is preservation, identification, collection, processing, review, and production. Each phase has distinct IT requirements, and each phase is opportunity for things to go wrong — to accidentally destroy information, to collect the wrong information, to miss critical data, to produce privileged information.

Preservation is the foundation, discussed in detail in the article on legal hold. Without preservation, information disappears and discovery is incomplete. The moment litigation is anticipated, the duty to preserve begins, and IT must stop normal data destruction practices.

Once preservation is in place, the next phase is identification — figuring out what information exists that's potentially relevant to the dispute. This requires understanding the scope of the dispute, identifying the custodians whose information might be relevant, and identifying the systems and data sources that contain that information. The litigation team works with IT to map out where the relevant information lives. Employee email. File servers. Databases. Instant messaging systems. Phones. Cloud storage. Backup systems. Each of these is a potential source of discoverable information.

Data Collection and the Completeness Challenge

After identification comes collection — actually extracting the potentially relevant data from the systems that contain it. This is where IT work becomes tangible. Collection is never as simple as "give us all the emails." It's extracting email from specific mailboxes for a specific date range, exporting files from specific file servers, querying databases for records matching specific criteria, extracting deleted items from backup tapes.

The challenge in collection is completeness combined with defensibility. You need to collect all relevant information so the other side gets everything they're entitled to, but you need to be able to document what you collected, how you collected it, and prove that the collection process didn't alter or corrupt the information. This is where metadata preservation becomes critical. When you export information from email systems or file servers, you need to preserve all the metadata — dates, creators, modification history, access logs — that establishes authenticity and completeness.

Many organizations use specialized e-discovery vendors for collection. These vendors have tools designed to extract information from email systems, databases, and file servers while preserving metadata and maintaining chain of custody. They understand the quirks of various platforms — how to properly export from Exchange, Outlook, Gmail, Slack, and dozens of other systems in ways that preserve all relevant data. They can handle the scale of modern data volumes and can document the collection process in ways that hold up to legal scrutiny.

The collection process creates the e-discovery dataset — typically measured in gigabytes or terabytes of extracted information that now needs to be reviewed. The size of this dataset is one of the main cost drivers in discovery. A litigation where ten custodians generated 500 gigabytes of potentially relevant information might require reviewing 500 gigabytes of documents. At typical review rates, this might take hundreds of thousands of dollars in attorney time. Scale this to hundreds of custodians and terabytes of data, and review costs can reach millions.

Processing and Early Analysis

Before human review begins, the data usually goes through a processing phase where it's normalized into formats suitable for review and analysis. Processing includes deduplication (removing duplicate copies of the same email or document), encoding (converting documents into formats that review software can handle), and metadata extraction (pulling out creation dates, authors, recipients, and other metadata for indexing and searching).

Processing reduces the volume of data that requires human review. If an email was sent to ten people and all ten recipients have their copies in the collection, deduplication eliminates the nine duplicate copies, leaving one representative copy. This can reduce review volume by 30 to 50 percent without losing information. The processing also prepares the data for search and analysis — applying optical character recognition to scanned documents, extracting text from databases, creating searchable indices.

This phase also applies early analytical tools. Conceptual search and machine learning can identify documents that are likely relevant or likely privileged before human review begins. This can further reduce review volume by identifying categories of documents that clearly don't require individual review. Calendar invitations that don't mention the disputed topic, routine administrative emails, duplicate copies of already-reviewed documents — these can be filtered out, again reducing cost.

The filtering and analysis happen under attorney supervision. The goal is to reduce the volume of documents requiring human review without accidentally filtering out relevant information. Getting this balance right is partly technology and partly attorney judgment about what matters to the dispute.

Review and Privilege Protection

The most expensive phase of e-discovery is review — lawyers reading through documents to assess whether each one is relevant to the dispute and whether it's privileged. A typical review team consists of senior attorneys making privilege decisions and junior attorneys or contract reviewers assessing relevance. The review process requires reading each document, understanding its content, making judgments about its relevance to the dispute, and flagging any privileged information.

Relevance is a legal determination. Information is relevant if it has a tendency to make a fact more or less probable. This sounds clear until you're actually reviewing and you encounter a document that's tangentially related to the dispute, or that might become relevant depending on how the case develops, or that's part of a chain of communications some of which are relevant and some of which aren't. The judgment calls add up, and different reviewers sometimes make different calls about the same documents.

Privilege is similarly complex in review. Is a document attorney work product? Is it a confidential communication with the client for the purpose of getting legal advice? Or is it business information that happens to have been shared with a lawyer but is discoverable? Inadvertently producing privileged information is a serious mistake — it can waive privilege in that information and in related communications. It can also expose client strategies and advice to the opposing side. The careful review process is designed to catch privileged documents before they're produced.

Many organizations use privilege logs to document what information was withheld from production as privileged. The privilege log lists withheld documents, describes them in a way that doesn't disclose their content, and states the basis for the privilege claim. The opposing side can then challenge the privilege claim if they believe the withholding is improper. The privilege log is the record that shows the firm took privilege assertions seriously and made thoughtful decisions about what to withhold.

Technology assistance in review includes privilege filters that identify documents containing terms commonly associated with legal advice, and workflows that route potentially privileged documents to senior attorneys for privilege decisions. But technology cannot replace attorney judgment. Someone with legal training needs to make final determinations about privilege.

Production and the Format Question

Once documents have been reviewed and privilege has been asserted, the next phase is production — providing the non-privileged documents to the opposing side. Production happens in a specified format and with specified metadata. The format matters because different formats preserve different information. A PDF of an email removes metadata that the native email would preserve. A paper copy of a document loses all digital information. The production format is often specified in a discovery agreement or by court order.

Many discovery agreements specify that documents be produced in native format — the format the document was originally in — with metadata preserved. Native production of email includes all email headers and metadata. Native production of databases includes the structure and the data within it. This preserves maximum information but is also the most complex to produce and for the receiving side to process.

Other times, documents are produced in image format — TIFF or PDF — with the metadata provided separately in a CSV file or database. This is less information-rich but is often more manageable for the producing side and receiving side to handle. The tradeoff is that some metadata is separated from the documents and some information is lost.

The production process itself is where IT infrastructure comes into play. Large productions involve transferring terabytes of information. This might happen on encrypted external drives that are physically shipped, through secure file transfer, or through specialized discovery platforms. The production needs to be defensible — there needs to be documentation showing what was produced, when, and in what format. Chain of custody needs to be maintained so the receiving side can trust that the information hasn't been altered.

Cost and Proportionality Challenges

E-discovery cost is one of the biggest stressors in modern litigation. The expense of collecting, processing, reviewing, and producing millions of documents can exceed the amount in controversy in the case. This creates pressure to find ways to reduce costs without sacrificing adequacy of discovery.

Federal Rule of Civil Procedure 26(b)(1) requires that discovery be proportional to the needs of the case, considering the importance of the issues, the amount in controversy, the parties' resources, and the importance of the discovery in resolving the issues. Rule 26(b)(2)(B) addresses proportionality of electronically stored information specifically, allowing courts to limit discovery of electronically stored information if the burden or expense outweighs the likely benefit.

This creates a framework where parties can challenge discovery requests as disproportional — when the cost of producing the information exceeds the benefit of having it. An email search across ten years of archive tapes might find a handful of relevant emails at a cost of hundreds of thousands of dollars. A court might find that discovery disproportional and either deny it or require the seeking party to pay the costs.

The practical implication is that early in litigation, parties negotiate the scope of discovery — what custodians will be searched, what date ranges will be covered, what search terms will be used, what systems will be searched, and what kind of limitations will apply. These negotiation can dramatically affect the cost and burden of discovery. A production covering three custodians and a two-year date range is very different in scope and cost from a production covering fifty custodians and ten years of data.

IT's role in this negotiation is to provide realistic assessments of the volume of data that exists, the effort required to collect and process it, and the feasibility of various approaches. If the opposing side wants to search every backup tape going back ten years, IT needs to explain what that means — the number of tapes, the effort to restore them, the resulting volume of data, the cost. This information shapes the proportionality arguments and the scope of discovery that parties actually agree to.

Vendor Involvement and Third-Party Considerations

Most organizations use specialized e-discovery vendors for at least part of the discovery process. E-discovery vendors provide technology platforms that are designed for managing large volumes of data, searching, de-duplicating, processing, and creating production sets. They maintain chain of custody documentation. They handle formats and metadata preservation. They often provide staffing for review teams or manage the review logistics. Their involvement creates additional layers in the discovery process but also creates structured processes that are more defensible than ad-hoc internal approaches.

The vendor relationship creates its own set of obligations. Vendor agreements need to specify what data the vendor will have access to, what the vendor is allowed to do with that data, what security measures the vendor will maintain, and what happens to the data after discovery concludes. Vendors are often handling sensitive information including privileged materials, so the agreement needs to specify that the vendor will maintain confidentiality and will restrict access to the information to personnel who need it for the specific discovery task.

If the vendor loses information or breaches confidentiality, the law firm may be responsible to the client and the court for the vendor's failures. Vendor selection and vendor management are therefore important parts of the discovery process. This doesn't mean you need the fanciest platform or the most expensive vendor. It means you need a vendor with adequate technical capabilities, adequate security practices, and adequate documentation practices to support the discovery process defensibly.

Technology and the Defensibility of Process

The technical choices made during discovery affect not just the efficiency of the process but the defensibility of the results. If you use a search tool to identify relevant documents but the search tool makes mistakes, you might accidentally produce information you should have withheld or fail to produce information you should have produced. If you use deduplication and the deduplication process is flawed, you might have gaps in your production. If you use machine learning to filter documents, you need to validate that the machine learning is working correctly.

This is why discovery platforms keep detailed logs and documentation. They document what search terms were used, what documents matched, what was deduplicated and why, what was filtered and why, what was reviewed and what privilege determinations were made. These logs and documentation become part of the record if the discovery process is challenged. They show that the process was systematic and defensible.

Technology also plays a role in managing the ongoing discovery obligation. In litigation, discovery isn't just a one-time production. As new documents are created, as new custodians are identified, as the scope of discovery evolves, there are supplemental productions and updated disclosures. This requires ongoing integration between IT systems and discovery — new emails need to be searched, new files need to be scanned, new databases need to be queried. Some organizations integrate their e-discovery platform with their IT systems so that as new data arrives, it automatically flows into the discovery process.

The alternative — handling discovery as a series of separate collections and productions — is inefficient and error-prone. New data gets overlooked. Previous searches need to be re-run. Privilege determinations need to be made again for new data. A systematic approach where technology manages the ongoing integration of new data into discovery is far more defensible and more manageable.

Building for Discoverability

The practical lesson for IT is that discovery readiness should be built into your data management practices long before litigation arrives. This means email systems that support searching and exporting by custodian and date range. It means file systems that track creation dates and authors. It means backup systems that can restore specific data ranges. It means databases that can be queried for specific records. It means processes for identifying custodians and understanding the systems they use. It means procedures for responding to discovery requests.

Some organizations take this further and implement information governance programs that classify information by sensitivity and potential litigation relevance, establish retention schedules, and build technical controls that support eventual discovery. These programs make the discovery process far more manageable because the organization understands its information, has controls in place, and can respond to discovery requests efficiently.

The organizations that handle discovery most smoothly are the ones that have thought about discovery in advance, have tested their ability to search and export their data, and have trained their teams on discovery procedures. When litigation arrives, they can respond to discovery requests methodically and defensibly. The organizations that struggle are the ones that are figuring out discovery for the first time when litigation arrives, discovering that their systems can't be searched the way discovery requires, that data is missing or fragmented, and that responding to discovery requests is chaotic and uncertain.


Fully Compliance provides educational content about IT compliance and cybersecurity. This article reflects general information about e-discovery processes as of its publication date. E-discovery rules and practices continue to evolve. Consult with qualified legal counsel for guidance specific to your litigation.