E-Discovery Basics for IT Teams

Reviewed by Fully Compliance editorial staff

E-discovery is the process of identifying, collecting, reviewing, and producing electronically stored information in litigation. IT's role is critical at every phase — from preserving data when litigation is anticipated, to collecting from email and file systems with metadata intact, to supporting the technology platforms that process and review millions of documents. The cost of e-discovery frequently exceeds the value of the underlying dispute, making IT's ability to provide realistic volume assessments and efficient collection the difference between manageable litigation and financial disaster.

IT Determines Whether Discovery Is Feasible or Financially Ruinous

E-discovery is where compliance meets obligation — where the routine operations of email, file storage, and data management become evidence in legal disputes. If your firm hasn't been through significant litigation, e-discovery feels abstract. If you've been through it, you understand that e-discovery is one of the most expensive and operationally disruptive parts of modern litigation, and IT's role in making discovery feasible or infeasible is more important than most executives realize.

The scale of the problem comes from the sheer volume of electronically stored information modern organizations generate. A litigation over a product dispute requires reviewing millions of emails, documents, instant messages, and database records. According to the ABA's 2023 Litigation Survey, e-discovery costs now account for an estimated 20-50% of total litigation spend in complex commercial cases. The Ponemon Institute found that the average cost per gigabyte of data processed through e-discovery ranges from $1,500 to $3,000 when factoring in collection, processing, and review. A production covering ten custodians and 500 gigabytes of data can run hundreds of thousands of dollars in attorney review time alone. Scale this to hundreds of custodians and terabytes of data, and review costs reach millions.

The Formal E-Discovery Process

E-discovery follows a structured sequence: preservation, identification, collection, processing, review, and production. Each phase has distinct IT requirements, and each phase is an opportunity for things to go wrong — to accidentally destroy information, to collect the wrong information, to miss critical data, to produce privileged information.

Preservation is the foundation. Without preservation, information disappears and discovery is incomplete. The moment litigation is anticipated, the duty to preserve begins, and IT must stop normal data destruction practices.

Once preservation is in place, the next phase is identification — figuring out what information exists that's potentially relevant to the dispute. This requires understanding the scope of the dispute, identifying the custodians whose information is relevant, and identifying the systems and data sources that contain that information. The litigation team works with IT to map out where the relevant information lives — employee email, file servers, databases, instant messaging systems, phones, cloud storage, backup systems. Each of these is a potential source of discoverable information.

Data Collection and Metadata Preservation

After identification comes collection — actually extracting the potentially relevant data from the systems that contain it. This is where IT work becomes tangible. Collection is never as simple as "give us all the emails." It's extracting email from specific mailboxes for a specific date range, exporting files from specific file servers, querying databases for records matching specific criteria, extracting deleted items from backup tapes.

The challenge in collection is completeness combined with defensibility. You need to collect all relevant information so the other side gets everything they're entitled to, and you need to document what you collected, how you collected it, and prove that the collection process didn't alter or corrupt the information. Metadata preservation is critical — when you export information from email systems or file servers, you need to preserve all the metadata (dates, creators, modification history, access logs) that establishes authenticity and completeness.

Many organizations use specialized e-discovery vendors for collection. These vendors have tools designed to extract information from email systems, databases, and file servers while preserving metadata and maintaining chain of custody. They understand the quirks of various platforms — how to properly export from Exchange, Outlook, Gmail, Slack, and dozens of other systems in ways that preserve all relevant data. They handle the scale of modern data volumes and document the collection process in ways that hold up to legal scrutiny.

Processing Reduces Volume Before Human Review

Before human review begins, the data goes through a processing phase where it's normalized into formats suitable for review and analysis. Processing includes deduplication (removing duplicate copies of the same email or document), encoding (converting documents into formats that review software can handle), and metadata extraction (pulling out creation dates, authors, recipients, and other metadata for indexing and searching).

Processing reduces the volume of data that requires human review. If an email was sent to ten people and all ten recipients have their copies in the collection, deduplication eliminates the nine duplicate copies. This can reduce review volume by 30 to 50 percent without losing information. The processing also prepares the data for search and analysis — applying optical character recognition to scanned documents, extracting text from databases, creating searchable indices.

This phase also applies early analytical tools. Conceptual search and machine learning can identify documents that are likely relevant or likely privileged before human review begins, further reducing review volume by filtering out calendar invitations that don't mention the disputed topic, routine administrative emails, and duplicate copies of already-reviewed documents. The filtering and analysis happen under attorney supervision, balancing volume reduction against the risk of accidentally filtering out relevant information.

Review, Privilege Protection, and Production

The most expensive phase of e-discovery is review — lawyers reading through documents to assess whether each one is relevant to the dispute and whether it's privileged. A typical review team consists of senior attorneys making privilege decisions and junior attorneys or contract reviewers assessing relevance. Relevance sounds clear until you're actually reviewing and you encounter documents that are tangentially related to the dispute, or that are part of a chain of communications where some are relevant and some aren't.

Privilege is similarly complex. Is a document attorney work product? Is it a confidential communication with the client for the purpose of getting legal advice? Or is it business information that happens to have been shared with a lawyer but is discoverable? Inadvertently producing privileged information is a serious mistake — it can waive privilege in that information and expose client strategies to the opposing side. Many organizations use privilege logs to document what information was withheld and why, and technology assistance includes privilege filters that identify documents containing terms commonly associated with legal advice and workflows that route potentially privileged documents to senior attorneys.

Once documents have been reviewed and privilege has been asserted, production follows — providing the non-privileged documents to the opposing side in a specified format. Many discovery agreements specify that documents be produced in native format with metadata preserved. Others specify image format (TIFF or PDF) with metadata provided separately. Large productions involve transferring terabytes of information on encrypted external drives, through secure file transfer, or through specialized discovery platforms. The production needs to be defensible with documentation showing what was produced, when, and in what format.

Cost, Proportionality, and IT's Role in Negotiation

E-discovery cost is one of the biggest stressors in modern litigation. Federal Rule of Civil Procedure 26(b)(1) requires that discovery be proportional to the needs of the case, considering the importance of the issues, the amount in controversy, the parties' resources, and the importance of the discovery in resolving the issues. Rule 26(b)(2)(B) addresses proportionality of electronically stored information specifically, allowing courts to limit discovery if the burden or expense outweighs the likely benefit.

This creates a framework where parties can challenge discovery requests as disproportional. An email search across ten years of archive tapes finding a handful of relevant emails at a cost of hundreds of thousands of dollars is discovery a court may deny or require the seeking party to pay for. Early in litigation, parties negotiate the scope of discovery — what custodians will be searched, what date ranges will be covered, what search terms will be used, what systems will be searched, and what limitations will apply. A production covering three custodians and a two-year date range is very different in scope and cost from fifty custodians and ten years of data.

IT's role in this negotiation is to provide realistic assessments of the volume of data that exists, the effort required to collect and process it, and the feasibility of various approaches. If the opposing side wants to search every backup tape going back ten years, IT needs to explain what that means — the number of tapes, the effort to restore them, the resulting volume of data, the cost. This information shapes the proportionality arguments and the scope of discovery that parties actually agree to.

Vendor Management and Building for Discoverability

Most organizations use specialized e-discovery vendors for at least part of the discovery process. These vendors provide technology platforms designed for managing large volumes of data, searching, deduplicating, processing, and creating production sets. They maintain chain of custody documentation and handle formats and metadata preservation. The vendor relationship creates its own obligations — vendor agreements need to specify what data the vendor will access, what they can do with it, what security measures they maintain, and what happens to the data after discovery concludes. If the vendor loses information or breaches confidentiality, the law firm is responsible to the client and the court for the vendor's failures.

The practical lesson for IT is that discovery readiness should be built into your data management practices long before litigation arrives. This means email systems that support searching and exporting by custodian and date range, file systems that track creation dates and authors, backup systems that can restore specific data ranges, databases that can be queried for specific records, processes for identifying custodians and understanding the systems they use, and procedures for responding to discovery requests.

Some organizations take this further and implement information governance programs that classify information by sensitivity and potential litigation relevance, establish retention schedules, and build technical controls that support eventual discovery. These programs make the discovery process far more manageable because the organization understands its information, has controls in place, and can respond to discovery requests efficiently. The organizations that handle discovery most smoothly are the ones that have thought about discovery in advance, have tested their ability to search and export their data, and have trained their teams on discovery procedures. The organizations that struggle are the ones figuring out discovery for the first time when litigation arrives, discovering that their systems can't be searched the way discovery requires.


Frequently Asked Questions

When does the duty to preserve data for e-discovery begin?
The duty to preserve begins when litigation is reasonably anticipated — not when a lawsuit is filed, but when an event occurs that makes litigation likely. This could be receiving a demand letter, learning of a government investigation, or becoming aware of facts that will likely lead to a claim. IT must stop routine data destruction for relevant custodians and systems from that point forward.

How much does e-discovery cost?
Costs vary enormously based on scope. Ponemon Institute research estimates $1,500 to $3,000 per gigabyte when factoring in collection, processing, and attorney review. A moderately complex case with ten custodians and 500 gigabytes of data can cost $200,000 to $500,000 or more in review alone. The largest cost driver is attorney review time, which is why early volume reduction through deduplication and analytical filtering is critical.

What is metadata and why does it matter in e-discovery?
Metadata is data about data — creation dates, authors, modification history, recipients, file paths, and access logs embedded in documents and emails. Metadata establishes when a document was created, who created it, when it was modified, and who received it. Preserving metadata during collection is essential because it establishes authenticity and completeness. Collection methods that strip metadata can render documents inadmissible or create disputes about their origin.

What is a privilege log?
A privilege log is a document listing materials withheld from production because they are protected by attorney-client privilege or work product doctrine. Each entry describes the document without disclosing its content and states the basis for the privilege claim. The opposing side can challenge privilege assertions on the log. Preparing the privilege log is time-consuming but essential for demonstrating that privilege was asserted thoughtfully.

Can machine learning or AI reduce e-discovery costs?
Yes. Technology-assisted review (TAR) and predictive coding use machine learning to identify relevant and privileged documents, reducing the volume that requires human review. Studies have shown that TAR can achieve accuracy comparable to or better than human-only review at a fraction of the cost. Courts have increasingly accepted TAR as a defensible review methodology. The technology works under attorney supervision, with attorneys training the model and validating results.

What should IT do to prepare for e-discovery before litigation?
Build email and file systems that support searching and exporting by custodian and date range. Ensure backup systems can restore specific data ranges. Document your data retention policies and stick to them consistently. Maintain an understanding of where data lives across the organization. Establish a process for responding to legal hold notices. The organizations that handle discovery best are those that treat discovery readiness as an ongoing IT discipline rather than a crisis response.