Showing posts with label LLM Application. Show all posts
Showing posts with label LLM Application. Show all posts

Saturday, September 27, 2025

The Ultimate Guide to Semiconductor Patent Analysis Using LLMs for In-House Counsel

An abstract image of a semiconductor chip with glowing circuits, representing technology and analysis.

 

Blogging_CS (Expert Contribution) · · Approx. 15 min read

Beyond speculation to scientific evidence: Unlocking a new paradigm in patent infringement analysis with AI.

Semiconductor patent litigation must be fought with evidence, not intuition. Reverse engineering (RE) a complex semiconductor chip is a costly and time-consuming process. But what if you could revolutionize it using Large Language Models (LLMs)? This guide presents a step-by-step analysis methodology and LLM prompt strategies that in-house patent teams can use to build a robust evidentiary framework for the courtroom.

 

Introduction: The Strategic Importance of Reverse Engineering in Patent Litigation

Patent litigation is a legally demanding process that consumes significant time and resources. Before filing a lawsuit, a plaintiff is obligated to present a ‘reasonable basis’ for believing their patent is being infringed upon by a defendant's product. At this stage, reverse engineering becomes the most powerful tool for demonstrating a concrete possibility of infringement based on scientific analysis, rather than mere speculation. This is especially true before the discovery phase, where direct evidence from the defendant's confidential materials is not yet available; one must often rely solely on RE.

The initial findings from RE are crucial for establishing the validity of a lawsuit, formulating a litigation strategy, and even encouraging an early settlement. A lawsuit initiated without solid RE faces a high risk of dismissal due to insufficient evidence, which can lead to substantial financial losses.

⚠️ Legal Disclaimer
This document is for informational and educational purposes only. The content herein does not constitute legal advice, and you must consult with an independent legal professional before taking any legal action.

Overview of the Complete Reverse Engineering Workflow

Semiconductor reverse engineering is not random disassembly; it is a highly controlled and systematic forensic investigation. The process generally follows a ‘funnel’ workflow, where the precision, cost, and level of destructiveness gradually increase. Each step is organically linked, using information from the previous stage to define the objectives and methods for the next.

  • Non-destructive Analysis: The initial reconnaissance phase to understand the internal structure of the chip in its packaged state without causing damage.
  • Sample Preparation: The process of exposing the target die and precisely sectioning a specific area for analysis.
  • Structural & Compositional Analysis: The core phase of observing micro-structures with microscopes and analyzing the materials of each component.
  • Specialized Analysis: Analyzing properties not visible with standard microscopy, such as doping concentrations or crystal structures.

The ultimate goal of this entire process is to complete a Claim Chart, a document that provides a clear, one-to-one comparison between the patent claims and the analytical results. The claim chart is the final deliverable that translates all scientific evidence gathered during RE into a legal argument.

Step 1: Strategic Analysis Planning and LLM Utilization

Before beginning the analysis, it is essential to review legal risks and design the most efficient analysis roadmap tailored to the patent claims. An LLM can serve as an excellent strategist in this process.

๐Ÿค– LLM Prompt Example: Legal Risk Assessment


# Role: Intellectual Property Legal Expert
# Task: Assess legal risks of semiconductor RE analysis

Please assess the legal risks for the following analysis plan and propose necessary preliminary measures:
- Target of Analysis: [Competitor's Semiconductor Product Name]
- Proposed Analysis Methods: Decapsulation, FIB-SEM, TEM, SIMS
- Jurisdiction: South Korea, USA, Japan

# Output Format:
{
  "legal_risks": ["List of risk factors"],
  "required_actions": ["Mandatory preliminary steps"],
  "documentation": ["List of necessary documents"],
  "approval_timeline": "Estimated approval timeframe"
}
        

๐Ÿค– LLM Prompt Example: Creating an Analysis Roadmap


# Role: Semiconductor Analysis Strategist
# Task: Create an efficient RE analysis roadmap

# Patent Claim:
[Insert the full text of the patent claim to be analyzed here]

# Competitor Product Information:
- Product Name: [Product Name]
- Publicly Available Technical Specs: [Specifications]
- Estimated Manufacturing Process: [Process Node]

# Requirements:
1. Set analysis priorities for each limitation of the claim.
2. Propose a cost-effective analysis sequence (from non-destructive to destructive).
3. Evaluate the probability of securing evidence at each stage.
4. Develop a risk-mitigation plan for the analysis.

# Output: A detailed analysis roadmap in JSON format.
        

Step 2: Non-Destructive Analysis - Chip Reconnaissance

This initial stage is crucial for understanding the overall architecture of the device, identifying potential manufacturing defects, and strategically planning the subsequent destructive analysis phases. The information gathered here forms the basis for managing risks and maximizing efficiency throughout the entire project.

2.1 SAM (Scanning Acoustic Microscopy) Analysis

  • Purpose: To verify the physical integrity of the product and detect internal defects (e.g., gaps between the chip and its package) to ensure the reliability of subsequent analyses.
  • Principle: Uses ultrasound waves that are directed at a sample. The acoustic waves reflected from internal interfaces or defects are detected to create an image of the internal structure. The C-Scan mode, which provides a planar image at a specific depth, is commonly used.
  • Results Interpretation: Dark or irregular patterns in the image indicate internal defects like voids or delamination. This information serves as a critical warning for areas to be cautious of during subsequent processes like decapsulation.

๐Ÿค– LLM Prompt Example: SAM Image Analysis


# Role: SAM Image Analysis Expert
# Input: [Upload SAM C-Scan Image]

# Task:
1. Classify the defect patterns visible in the image and mark their locations.
2. Determine whether each defect is likely a manufacturing issue or damage from the analysis process.
3. Suggest areas to avoid during the subsequent FIB analysis.
4. Evaluate the impact of the defect density on product quality.

# Output Format:
{
  "defect_classification": {...},
  "analysis_safe_zones": [],
  "quality_assessment": "..."
}
        

2.2 3D X-ray CT (Computed Tomography) Analysis

  • Purpose: To understand the 3D architecture of the chip package (e.g., die stacking, TSV arrays) and to set precise coordinates for subsequent high-precision analysis.
  • Principle: A 3D volumetric dataset is generated by computationally reconstructing numerous 2D X-ray transmission images taken from multiple angles as the sample is rotated 360 degrees.
  • Results Interpretation: The reconstructed 3D model allows for a direct comparison between the patent drawings and the actual product's structure. For instance, if a patent claims an 'eight-layer stacked memory die,' the CT image can verify if eight dies are indeed stacked. This 3D data serves as a crucial navigation map for FIB processing.

๐Ÿค– LLM Prompt Example: Comparing 3D Structure to Patent Drawings


# Role: 3D CT Data Analysis Expert
# Input: [A series of slice images from the 3D volume data]

# Analysis Requirements:
1. Identify and count the Through-Silicon Via (TSV) structures.
2. Analyze the die stack structure (number of layers, thickness, spacing).
3. Analyze the wire bonding/flip-chip bump pattern.
4. Compare the structural similarity with the patent drawings.
(Specifically, reference drawing: [Attach Patent Drawing])

# Target Structures:
- "8-layer stacked memory die"
- "Vertical through-electrode structure"
- "Symmetrical bonding pad layout"

Describe the analysis results in connection with the patent claims.
        

Step 3: Precision Sample Preparation - A Nanoscale Surgery

To directly observe the micro-circuitry inside the chip, the outer protective layers must be removed and the specific area of interest precisely exposed. Every action in this stage is irreversible, making it a high-stakes procedure akin to delicate surgery where evidence preservation is the top priority.

๐Ÿ’ก A Note on Evidence Integrity
Every step of the analysis must be conducted with the expectation of court submission. Adopting the concept of a Minimal Viable Evidence (MVE) package is critical. An MVE should include:
  • Original Sample Information: Photos of the original chip, serial numbers, and the SHA-256 hash if it's a file.
  • Chain of Custody Log: Model names of all equipment, software versions, and the exact commands and settings used.
  • Data Integrity: Hash values (SHA-256) of all raw data (images, logs, pcap files) must be recorded with UTC timestamps to prove they have not been altered.
  • Analyst's Declaration: A signed affidavit from the analyst attesting that all procedures were followed correctly.
This rigorous documentation ensures the credibility and reproducibility of the evidence.

3.1 Decapsulation

  • Purpose: To cleanly and safely expose the surface of the silicon die for analysis.
  • Principle: The Epoxy Molding Compound (EMC) protecting the chip is removed using methods such as chemical etching, laser ablation, or plasma etching. The best method is chosen based on the chip's characteristics.

๐Ÿค– LLM Prompt Example: Determining Optimal Process Conditions


# Role: Semiconductor Packaging Process Expert
# Task: Select a decapsulation method that minimizes damage

# Product Information:
- Package Type: [BGA/QFN/etc.]
- Wire Material: Pd-coated Cu wire (assumed)
- EMC Material: Epoxy Molding Compound
- Target Analysis Area: Metal interconnect layers on the die surface

# Technical Literature Search Request:
1. Find chemical decapsulation conditions that are non-corrosive to Cu wires.
2. Compare the pros and cons of plasma etching vs. chemical etching.
3. Recommend relevant process parameters (temperature, time, concentration).
4. For each method, assess the expected level of damage and its impact on analysis reliability.

Please provide answers based on the latest academic papers and technical notes.
        

3.2 FIB (Focused Ion Beam) Precision Cross-Sectioning

  • Purpose: To obtain a clean, flat cross-section suitable for SEM or TEM analysis, enabling accurate examination of material interfaces, cracks, metal layer thicknesses, and more.
  • Principle: This technique uses a highly focused beam of heavy ions, such as Gallium (Ga+), accelerated at high energy to mill away material from a specific point on the sample, atom by atom (a process called sputtering).
  • Results Interpretation: FIB is essential when a patent claim specifies a feature in a microscopic area, such as the ‘spacer structure between the gate and source/drain of a FinFET.’ It allows for the precise isolation and preparation of that exact location for analysis.

๐Ÿค– LLM Prompt Example: Drafting a FIB Milling Script


# Role: FIB Processing Optimization Expert
# Input: 3D CT coordinate data + target transistor location

# Task:
Draft a FIB milling script that meets the following conditions:
- Target Coordinates: X=1250 ยตm, Y=890 ยตm, Z=15 ยตm (relative to die surface)
- Target Structure: Gate cross-section of a FinFET transistor
- Required Resolution: <5 nm
- Milling Depth: Approx. 2 ยตm

# Script Requirements:
1. A multi-step approach for coarse and fine milling.
2. Optimized ion beam voltage/current conditions.
3. Logic for real-time SEM image feedback during milling.
4. Final polishing conditions to achieve atomic-level surface flatness.

# Output: A script for the FIB machine with detailed comments for each step.
        

Step 4: High-Resolution Structural & Compositional Analysis

This is the core of the reverse engineering process, where the prepared sample's cross-section is examined under high-magnification microscopes to directly verify the physical structures and material compositions specified in the patent claims. The images and data obtained here become the most direct and powerful evidence in the claim chart.

4.1 SEM/EDS Analysis

  • Purpose: To visually confirm nanoscale microstructures, measure critical dimensions like circuit line widths and thin-film thicknesses, and simultaneously analyze the elemental composition.
  • Principle: A SEM (Scanning Electron Microscope) scans the sample surface with an electron beam and detects secondary electrons to generate a high-resolution 3D topographical image. An EDS (Energy Dispersive X-ray Spectroscopy) detector, often attached to the SEM, analyzes the characteristic X-rays emitted from the sample when struck by the electron beam to identify the elements present and their relative amounts.
  • Results Interpretation: SEM images can be used to measure the fin height or gate length of a FinFET. EDS results are typically presented as a spectrum, which identifies elements by their characteristic energy peaks, and an elemental map, which visualizes the distribution of each element with different colors. For example, if a map of a gate structure shows a concentration of Hafnium (Hf) and Oxygen (O) in a specific layer, it provides strong evidence that the layer is HfO₂.

๐Ÿค– LLM Prompt Example: Comprehensive SEM/EDS Data Analysis


# Role: SEM/EDS Data Analyst
# Input: [SEM image + EDS elemental mapping data]

# Analysis Task:
1. Identify each layer of the High-K Metal Gate structure.
   - Measure the thickness of the gate dielectric (HfO₂).
   - Confirm the presence of the barrier metal layer (TiN).
   - Analyze the structure of the gate electrode (W).
2. Differentiate materials based on the Backscattered Electron (BSE) image contrast.
3. Interpret the quantitative results from the EDS analysis.
4. Evaluate the consistency with the patent claim.

# Patent Claim: "A transistor structure comprising a High-K dielectric layer with a thickness of 2-3nm and a metal gate electrode."

Objectively evaluate for potential infringement based on the measured values.
        

๐Ÿค– LLM Prompt Example: Automated Analysis of Large Image Sets


# Role: Pattern Recognition and Statistical Analysis Expert
# Input: [Folder containing 2000 SEM images]

# Automated Analysis Request:
1. Automatically identify FinFET patterns in each image.
2. Automatically measure the Gate Pitch and Fin Width for each identified FinFET.
3. Calculate the statistical distribution of the measured values (mean, standard deviation, min/max).
4. Detect and classify any anomalous patterns (defects).

# Target Accuracy: >95%
# Output: A Python pandas DataFrame and visualization charts.

Evaluate the results in relation to the patent claim for a "regular array of fin structures."
        

4.2 TEM Analysis

  • Purpose: To precisely measure the thickness of ultra-thin films at the atomic layer level, analyze the interface structure between different materials, and determine the material's crystalline structure (crystalline/amorphous).
  • Principle: Unlike SEM, a TEM (Transmission Electron Microscope) obtains an image by passing an electron beam *through* an extremely thin sample (typically under 100nm). The contrast in the resulting image is determined by the sample's density, thickness, and the degree of electron scattering and diffraction by its crystal structure.
  • Results Interpretation: TEM offers the highest spatial resolution, allowing direct observation of atomic columns. It can provide irrefutable proof for claims such as "a 2nm thick hafnium oxide layer formed on a silicon substrate." Furthermore, if features characteristic of a specific deposition method, like the excellent thickness uniformity and conformal coverage of Atomic Layer Deposition (ALD), are observed, it strongly supports the argument that said process was used.

๐Ÿค– LLM Prompt Example: TEM Lattice Image Analysis


# Role: TEM Lattice Fringe Analysis Expert
# Input: [High-Resolution TEM Image]

# Task:
1. Measure the lattice fringe spacing and identify the crystal structure via FFT analysis.
2. Analyze the characteristics of the interface between different materials.
3. Check for evidence of an Atomic Layer Deposition (ALD) process.
4. Differentiate between crystalline and amorphous regions.

# Analysis Tools:
- Fast Fourier Transform (FFT) analysis
- Lattice spacing measurement algorithm
- Interface roughness quantification

# Patent Relevance:
Substantiate the claim of a "uniform thin-film interface formed by atomic layer deposition" with evidence from the TEM image.

# Output: Image annotations + measurement data + interpretation report
        

Step 5: Specialized Analysis - Measuring the Invisible

This step analyzes the 'unseen' factors that determine the core electrical properties of a semiconductor, which cannot be observed with conventional electron microscopy. This provides direct evidence of 'how a device was designed to operate.'

5.1 SIMS (Secondary Ion Mass Spectrometry) Analysis

  • Purpose: To quantitatively measure the depth profile of dopants (e.g., Boron, Phosphorus), which are key elements determining the device's performance.
  • Principle: A primary ion beam continuously sputters the sample surface. The ejected secondary ions are then guided into a mass spectrometer, which separates and detects them to analyze elemental concentration by depth, down to the parts-per-billion (ppb) level.
  • Results Interpretation: The output is a log-linear graph with depth on the x-axis and concentration on the y-axis. This allows for precise determination of peak concentration, junction depth, and the overall shape of the doping profile. A patent claim for a "Lightly Doped Drain (LDD) structure" can be proven by showing a SIMS profile with a specific graded concentration near the source/drain regions.

๐Ÿค– LLM Prompt Example: Interpreting SIMS Data


# Role: SIMS Data Interpretation Specialist
# Input: [SIMS depth profile graph]

# Analysis Requirements:
1. Accurately identify the p-type/n-type doping junction location.
2. Determine if a Lightly Doped Drain (LDD) structure exists.
3. Calculate the dopant concentration gradient.
4. Assess the need for matrix effect correction.

# Patent Claim: "A transistor comprising a lightly doped region between the source/drain and the channel."

# From the graph analysis, determine:
- Dopant concentration in the LDD region: ___ atoms/cm³
- Length of the LDD: ___ nm
- Concentration gradient: ___ atoms/cm³/nm

Provide a comprehensive assessment, including measurement uncertainty and correction methods.
        

5.2 EBSD (Electron Backscatter Diffraction) Analysis

  • Purpose: To analyze the microstructure of polycrystalline materials like metal interconnects, determining the size, shape, and orientation distribution of crystal grains.
  • Principle: Performed within an SEM, an electron beam hits a crystalline sample, causing electrons to diffract off the atomic lattice. Some of these backscattered electrons form a distinct geometric pattern known as a Kikuchi pattern, which contains unique information about the crystal structure and orientation at that point.
  • Results Interpretation: The primary output is a crystal Orientation Map, where each grain is colored according to its crystallographic orientation. If most grains share a similar color, it indicates the film has a preferred orientation or texture. This can be used to prove a claim like "a copper interconnect with a preferred (111) orientation for enhanced electrical reliability."

๐Ÿค– LLM Prompt Example: Generating an EBSD Data Analysis Script


# Role: EBSD Data Processing and Visualization Expert
# Task: Write a script for statistical analysis of crystal orientation.

# Requirements:
1. Extract crystal grains with (111) orientation from raw EBSD data.
2. Calculate the percentage of the total area occupied by (111) oriented grains.
3. Generate a histogram of grain size distribution.
4. Visualize the orientation map.

# Input Data: EBSD file in .ang format
# Target Output:
- Statistical report (PDF)
- High-resolution orientation map image
- Analysis results in a CSV file

# Patent Relevance: Provide quantitative data to substantiate the claim of "(111) preferred orientation of copper interconnects."

Write a complete Python script and add comments to major functions.
        

Step 6: LLM-Powered Claim Chart Drafting Strategy

All reverse engineering efforts culminate in the creation of a legally persuasive claim chart. A well-crafted claim chart translates complex technical data into a clear, logical argument that a judge or jury can understand.

๐Ÿ’ก Key Strategies for a Strong Claim Chart
  • Select the Best Evidence: Use the most direct and irrefutable data to prove each claim element (e.g., TEM images for nanometer-scale thickness, EDS data for material composition).
  • Clear Annotation: Use arrows, labels, and scale bars on analytical images to explicitly show where the claim elements are met. Leave no room for interpretation.
  • Objective and Factual Narration: Describe the evidence factually, such as, "The TEM image shows a layer with a thickness of 2.1 nm." Avoid subjective or conclusive language like, "The TEM image clearly proves infringement." Argumentation is the attorney's role; the claim chart is the collection of facts supporting that argument.

๐Ÿค– LLM Prompt Example 6.1: Automating Evidence-to-Claim Mapping


# Role: Patent Claim Chart Specialist
# Task: Convert technical evidence into legal document format.

# Input Data:
- Patent Claim: "A transistor having a plurality of fin structures formed on a substrate, wherein each fin has a width of 7nm or less."
- Analytical Evidence:
  - SEM Measurements: Average fin width of 6.2 nm ± 0.3 nm (n=500).
  - Statistical Distribution: 99.2% of fins are 7nm or less.
  - Image Evidence: [SEM Image A, B, C]

# Requirements:
1. Use objective, fact-based language.
2. Include measurement uncertainty.
3. Specify statistical confidence.
4. Adhere to a formal legal tone and style.

# Output Format:
"The accused product meets the 'fin width of 7nm or less' element of the claim as follows: [Evidence-based description]"

Exclude any emotional or speculative language; state only the pure facts.
        

๐Ÿค– LLM Prompt Example 6.2: Auto-generating Image Annotations and Descriptions


# Role: Technical Image Annotation Specialist
# Input: [SEM-EDS Elemental Mapping Image]

# Task:
Identify the distribution areas of the following elements and link them to the patented structure:
- Hf (Hafnium): Gate dielectric
- Ti (Titanium): Barrier metal layer
- W (Tungsten): Gate electrode
- O (Oxygen): Oxide layer

# Output Requirements:
1. Color-coded annotations for each elemental region.
2. Indication lines for measuring layer thickness.
3. Explanation of the structural correspondence with the patent drawings.
4. A high-quality image layout suitable for court submission.

# Image Caption: "Confirmation of High-K Metal Gate structure via EDS elemental mapping. Physical evidence for claim element (c) of the patent."
        

Step 7: Expert Verification and Legal Validation

Any output generated by an LLM must be verified by a human expert. Furthermore, systematic evidence management is essential to ensure the credibility of the entire analysis process.

7.1 Cross-Verifying LLM Outputs

It's crucial not to rely on a single LLM. Using multiple models (e.g., Claude, ChatGPT, Gemini) to cross-verify results can help filter out biases or errors specific to one model.

๐Ÿค– LLM Prompt Example: Cross-Verification Request


# Role: Analysis Results Cross-Verifier
# Task: Verify the technical accuracy of results generated by another LLM.

# Targets for Verification:
1. Draft of a claim chart written by Claude.
2. SEM image interpretation analyzed by ChatGPT.
3. Image annotations generated by Gemini.

# Cross-Verification Method:
- Confirm consistency between interpretation and raw data.
- Perform an independent re-analysis using a different LLM.
- Detect technical errors and logical fallacies.
- Review the accuracy of legal terminology.

# Output: Verification report + recommended revisions.
        

7.2 Assembling the MVE (Minimal Viable Evidence) Package

In litigation, the integrity and chain of custody of evidence are paramount. The Minimal Viable Evidence (MVE) package is a systematic collection of documents that records and preserves every step of the analysis to establish its legal admissibility. An LLM can be used to generate and manage a tailored MVE checklist for each project.

๐Ÿค– LLM Prompt Example: Generating an MVE Checklist


# Role: Forensic Evidence Management Specialist
# Task: Generate a checklist of MVE components.

# Analysis Project Information:
- Project Name: [Project Name]
- Analysis Period: [Start Date] to [End Date]
- Primary Analysis Methods: SAM, CT, FIB-SEM, TEM, SIMS, EBSD

# Requirements:
Generate a detailed MVE checklist including the items below, and specify the required documents and retention period for each.
- Original sample information and hash values
- Calibration certificates for all analysis equipment
- Raw data files and backup locations
- Full LLM interaction logs (prompts and responses)
- Analyst identity verification
- Record of analysis environment and conditions (temperature, humidity, etc.)
- Certificate of compliance with quality management standards
        

Frequently Asked Questions (FAQ)

Q: Is there a risk of the LLM misinterpreting analysis results?
A: Absolutely. LLMs can be prone to ‘hallucinations’ or may miss subtle technical nuances. Therefore, any LLM-generated response must be cross-verified by a human expert against the original data (e.g., SEM/TEM images, numerical data). It's critical to remember that the LLM is a tool to assist the analyst, not the final decision-maker.
Q: How much does semiconductor reverse engineering typically cost?
A: Depending on the depth and scope of the analysis, costs can range from tens of thousands to hundreds of thousands of dollars. Atomic-level analyses like TEM and SIMS are particularly expensive due to the required equipment and specialized personnel. Therefore, it's vital to assess the likelihood of finding a ‘smoking gun’ with preliminary, less expensive methods (like non-destructive and SEM analysis) and to plan the analysis based on a cost-benefit evaluation.
Q: Our company doesn't have the necessary equipment. How can we conduct RE?
A: Most companies outsource semiconductor RE to specialized third-party labs. The key is to clearly define, manage, and oversee the analysis: what to analyze, in what order, and under what conditions. The workflow and LLM strategies in this guide can be invaluable for defining technical requirements and effectively reviewing the results when collaborating with external labs.
Q: If the chip is damaged during analysis, does the evidence lose its validity?
A: This is a critical point. It's precisely why a Minimal Viable Evidence (MVE) package and meticulous documentation are necessary. Before analysis, the state of the original sample should be documented with photos and videos. Every step of the analysis must be recorded, and all outputs (images, data) should be timestamped and hashed to prove the chain of custody. This process ensures that even destructive analysis can be accepted as admissible evidence in court.
Q: How can I write the most effective LLM prompts?
A: Great prompts have three key elements: a clearly defined 'role,' specific 'context,' and a request for a 'structured output format.' For instance, instead of just saying, “Analyze this image,” a more effective prompt would be, “You are a materials science Ph.D. Analyze this SEM image to measure the gate length of the FinFET. Report the result to two decimal places and mark the measurement location on the image.” Being specific is always better.

Conclusion: The Optimal Synergy of Human Experts and AI

Leveraging LLMs for semiconductor reverse engineering is an innovative methodology that goes beyond simple efficiency improvements to achieve a quantum leap in analytical quality and the strength of legal evidence. However, the most important principle to remember is that the ultimate responsibility for all technical interpretations and legal judgments still rests with human experts.

Core Principles for Successful LLM Integration
  1. Clear Division of Labor: LLMs handle data processing and drafting; humans handle verification and final judgment.
  2. Multi-Model Approach: Strategically use different LLMs based on their strengths for specific tasks.
  3. Rigorous Verification: Always cross-reference LLM outputs with the original source data.
  4. Legal Safeguards: Ensure evidence integrity by compiling a comprehensive MVE.

Ultimately, the success of this process depends on close collaboration between technical and legal experts. The legal team must clearly define the key elements of the patent claims, and the technical team must present analytical results as clear, objective data linked to those legal issues. When scientific evidence and legal logic are combined in this way, data from the lab can become the most powerful and persuasive weapon in the courtroom. If you have any questions, feel free to ask in the comments! ๐Ÿ˜Š

Must-Read for In-House Patent Teams: Proving Software Patent Infringement Without Source Code – A Practical A-to-Z Guide to AI-Assisted Software Analysis

 

Software Patent Infringement: How Do You Prove It? This guide combines the latest reverse engineering techniques with Large Language Models (LLMs) to uncover crucial evidence within unseen code and create legally sound claim charts, all from the perspective of in-house patent experts.

 

Hello, patent professionals! Have you ever felt stuck, suspecting a competitor’s software infringes on your patent but having no way to prove it without the source code? Software patent infringement analysis is often compared to an investigation without a crime scene. You have to trace back the technical secrets using only one clue: the executable file distributed to the market.

Traditionally, this process required a massive amount of time and a high level of expertise. But now, Large Language Models (LLMs) are changing the game. LLMs are more than just assistants; they can be expert analytical partners with their own strengths—Claude for structuring vast documents, Gemini for multimodal analysis, and ChatGPT for drafting logical arguments.

This guide isn’t about turning patent attorneys or in-house counsel into reverse engineers. Instead, the goal is to provide a deep understanding of the process, enabling you to communicate effectively with technical experts and manage the quality of evidence that will ultimately decide the outcome of a lawsuit. So, shall we dive into the world of patent infringement analysis with AI? ๐Ÿ˜Š

Notice: Guidance and Disclaimers
  • This guide is for educational purposes only and does not constitute legal advice. Before beginning any analysis, you must consult with an intellectual property attorney in your jurisdiction.
  • The legality of reverse engineering varies by country and is subject to laws and contractual agreements (like EULAs). Always confirm the applicable regulations with your legal team in writing beforehand.
  • Do not send confidential code or assets to external LLM services. If unavoidable, proceed only after implementing safeguards like on-premise solutions, Data Loss Prevention (DLP), access controls, and a Data Processing Agreement (DPA).
  • LLM outputs may contain errors or hallucinations. Treat any reasoning from a model as unverified information until it has been independently confirmed by an expert and corroborated with technical evidence.

 

Analysis Scenario: A Hypothetical Patent Infringement Case

To illustrate the process, let’s set up a fictional patent and an accused product.

Case Overview

  • Fictional Patent: U.S. Patent No. 15/987,654, “Method for Data Processing and Transmission for Efficient File Synchronization.”
  • Core Technology: A sequential process that ① detects file changes in real-time, ② compresses the data, ③ encrypts it with AES-256, and then ④ transmits it to a server.
  • Target for Analysis: The Windows client for a cloud service called ‘SyncSphere,’ `SyncSphere.exe`.

 

Step 1: Legal & Forensic Pre-flight

Before any technical analysis begins, it’s crucial to establish the legal and procedural legitimacy of the entire process. The credibility of the evidence gathered in this stage will determine the direction of the entire case.

⚖️ Legal Pre-flight: Essential Checklist
  • Authorization: Review the software’s End User License Agreement (EULA) to assess the validity and legal risks associated with any clauses prohibiting reverse engineering. (Must verify against local laws like the DMCA in the U.S. or the Copyright Act in South Korea).
  • Attorney-Client Privilege: Clearly establish that the analysis is being conducted as part of legal counsel in anticipation of litigation. This helps protect the materials generated during the analysis.
  • Counsel Sign-off: Obtain written approval from legal counsel before conducting legally sensitive actions, such as network traffic interception or memory dumps, which may be subject to communication privacy laws.
  • Data Privacy: Evaluate the risk of collecting Personally Identifiable Information (PII) during dynamic analysis and establish measures to minimize or anonymize it in compliance with regulations like GDPR or PIPA.

Once the legal review is complete, begin the ‘Chain of Custody’ procedures, a fundamental principle of forensics. Calculate the SHA-256 hash of the `SyncSphere.exe` file to secure its “digital fingerprint” and meticulously document the versions of all analysis tools and the OS environment. All this information is recorded in a ‘Forensic Manifest’, which is the first step in ensuring the integrity and reproducibility of your evidence.

 

Step 2: Static Analysis – Uncovering the Code’s Blueprint

Static analysis involves dissecting the program’s internal structure without actually running it. This step helps verify if the program has the ‘capability’ to perform the patented technology and to form an infringement hypothesis.

Initial Reconnaissance

Before diving into the code, we use three reconnaissance techniques to set the direction of our analysis.

  1. String Extraction: Use the command strings -a SyncSphere.exe > strings.txt to extract all hardcoded text from the file. Keywords like “zlib”, “AES”, and “OpenSSL” are strong initial clues that suggest the presence of compression and encryption functionalities.
  2. PE Structure Analysis (PE-bear): Open `SyncSphere.exe` with a PE analysis tool to inspect the Import Address Table (IAT). The IAT is a list of external function dependencies, showing what functions the program borrows from Windows. File APIs from `kernel32.dll` (e.g., `CreateFileW`) indicate a capability for file detection (claim element a), while crypto APIs from `advapi32.dll` (e.g., `CryptEncrypt`) suggest an encryption capability (claim element c).
  3. Library Signature Scanning (signsrch): If libraries like zlib or OpenSSL were statically linked (i.e., included directly in the code), they won’t appear in the IAT. A tool like signsrch can identify them by scanning for their unique code patterns (signatures).

๐Ÿ“ Note: Advanced Use of LLMs in the Reconnaissance Phase

Initial static analysis (reconnaissance and hypothesis formation) is about gathering clues to direct the analysis before deep-diving into decompilation. This process includes string extraction, PE structure analysis, and library signature scanning.

LLMs can be used here to efficiently organize vast amounts of output data. For instance, a `strings_output.txt` file can contain tens of thousands to millions of lines. An LLM can automatically summarize this, extracting only the keywords and surrounding context directly related to the patent claims (b) and (c), such as compression, encryption, and server communication.

Additionally, an LLM can normalize and deduplicate the imported APIs from PE-bear/DumpPE outputs, categorize them into functional groups like file I/O and cryptography, and map each item to a claim element. For example, `CreateFileW`, `ReadFile`, and `WriteFile` can be linked to (a) ‘file change detection capability,’ while `CryptEncrypt` or bcrypt-family functions can be linked to (c) ‘encryption capability.’ The LLM can then draft concise statements for each element and also note uncertainties, such as, “The presence of an import does not confirm its use at runtime,” and suggest what further evidence is needed.

Similarly, an LLM can normalize the results from Signsrch, remove duplicate signatures, and map each signature to its presumed library and version. This helps in describing whether static linking is present and connecting the detected libraries to claim (b) for compression (zlib) and (c) for encryption (OpenSSL/LibreSSL/AES).

*For the sake of readability, specific prompt examples for these tasks have been omitted from the main text.

Deep Dive with Ghidra & LLM

Using the clues from reconnaissance, we analyze the actual code logic with a decompiler like Ghidra or IDA Pro. By cross-referencing strings like ‘AES’, we can locate the core function containing the encryption logic (e.g., `process_file_for_upload`). We then examine the decompiled pseudo-code to see if the output of a `compress_data` function is passed directly as input to an `encrypt_data` function. This data flow is the key evidence that proves the sequential steps of the patent.

LLM Prompt Example: Code Logic Analysis and Structured Output

You can ask an LLM to translate complex pseudo-code into clear language that a legal professional can understand and to lay the groundwork for an analysis report.


# Role
You are a C++ reverse engineering expert supporting patent analysis.

# Input
Below is the pseudo-code for the `process_file_for_upload` function, decompiled using Ghidra.
[Paste Ghidra pseudo-code here]

# Task
1.  Describe the data processing pipeline (steps, order) of this function accurately.
2.  Confirm if the result of the compression is used as a direct input for the encryption and specify the variables that support this.
3.  State a clear conclusion on whether this function follows a 'compress-then-encrypt' architecture.
4.  Output the results in the JSON format below.

# Output Format (JSON)
{
  "finding_text": "A clear description of the function's behavior",
  "evidence_refs": ["Quote the lines of code that serve as evidence"],
  "confidence_score": 0.9,
  "uncertainties": ["What cannot be determined from the code alone"]
}
        

➡️ Expected LLM Output (JSON):


{
  "finding_text": "This function compresses the input data using zlib, then uses the compressed result directly as input for the AES encryption function, and finally sends it to the server. The compression clearly precedes the encryption, and the data flow is directly linked.",
  "evidence_refs": [
    "compressed_result = compress_data_with_zlib(original_data);",
    "final_payload = encrypt_data_with_aes(compressed_result->data, compressed_result->size);"
  ],
  "confidence_score": 0.9,
  "uncertainties": [
    "The AES mode of operation (e.g., CBC/GCM) and the key's origin cannot be determined from this code alone."
  ]
}
        
Heads up! The Limitations of Static Analysis
The findings from static analysis are merely a ‘hypothesis’ that must be proven with dynamic testing. The existence of a certain function in the code doesn’t guarantee it’s used at runtime in a manner that infringes the patent. Furthermore, if techniques like code obfuscation or packing are used, it can be extremely difficult to understand the true logic through static analysis alone.

 

Step 3: Dynamic Analysis – Capturing the Action

Dynamic analysis is the stage where you prove that the hypotheses formed during static analysis are actually put into ‘action’ at runtime, using objective logs and data. It is crucial that this process is conducted in a controlled environment (like a virtual machine or a rooted physical device).

  1. Verifying Real-time Detection (Process Monitor): Use ProcMon to monitor file system access by `SyncSphere.exe`. Confirm with timestamps that as soon as a file is saved in the sync folder, `SyncSphere.exe` immediately triggers a related file event. This log becomes direct evidence of ‘real-time detection.’
  2. Verifying Sequence and Data Flow (x64dbg): Attach a debugger (like x64dbg) to the running `SyncSphere.exe` process and set breakpoints at the memory addresses of the compression and encryption functions found in Step 2. When you sync a file, confirm the order in which the breakpoints are hit: ① the compression function should be hit first, followed by ② the encryption function. Crucially, verify that the memory address and size of the output buffer returned by the compression function exactly match the input buffer for the encryption function. This is the ‘smoking gun’ evidence that proves ‘compress-then-encrypt.’
  3. Verifying Post-Encryption Transmission (Wireshark & Burp Suite): Capture the network traffic generated by the program with Wireshark. Analyze the entropy of the transmitted data. Well-encrypted data is close to random, so its entropy will approach the theoretical maximum of 8.0. High entropy is strong circumstantial evidence that the data was transmitted after encryption.

LLM Prompt Example: Correlating Multiple Logs

You can ask an LLM to synthesize disparate logs from ProcMon, x64dbg, and Wireshark into a single, coherent timeline of events.


# Role
You are a digital forensics expert.

# Input
[Paste combined, timestamped logs from ProcMon, x64dbg, and Wireshark here]

# Task
1.  Reconstruct a timeline by ordering all logs chronologically.
2.  Analyze whether a causal relationship exists for the sequence: "File Save → Compression Function Call → Encryption Function Call → Network Transmission."
3.  Confirm from the x64dbg log that the output buffer of the compression function matches the input buffer of the encryption function.
4.  Based on the above analysis, write a concluding statement that supports the patent infringement hypothesis.
        
Heads up! Real-World Hurdles in Dynamic Analysis
Commercial software employs various security measures to thwart analysis. SSL Pinning, for instance, hardcodes a specific server certificate into the app, causing the connection to fail if a man-in-the-middle (MITM) attack is attempted to intercept packets. Therefore, simply capturing packets is not enough to see the plaintext data. A dynamic instrumentation tool like Frida can be used to observe or manipulate function calls within the app, allowing you to see data before it’s encrypted. However, many commercial apps also include anti-debugging and anti-hooking techniques to detect and block these tools. For example, the app might terminate or branch to a different execution path if a debugger is detected, or it might block hooking attempts, rendering the analysis futile. Overcoming SSL pinning, MITM avoidance, and anti-analysis techniques requires a high degree of expertise and adherence to legal procedures.

 

Step 4: Creating a Claim Chart – Translating Evidence into a Legal Argument

The claim chart is the most critical legal document in a patent lawsuit. It’s an evidence comparison table that clearly maps the collected technical evidence to each element of the patent’s claims, acting as a bridge to help non-experts like judges and juries easily understand the infringement.

LLM Prompt Example: Drafting the Claim Chart Narrative

By providing the facts collected by the analyst, an LLM can be prompted to structure them into the prose suitable for a legal document.


# Persona and Mission
You are a technical expert in a patent litigation case. Using the provided evidence, draft the 'Evidence of Infringement' section of a claim chart. Your writing must be objective and fact-based. Each piece of evidence must be clearly cited with its corresponding label (e.g., [Evidence A]).

# Context
- Patent Number: U.S. 15/987,654
- Claim 1(c): ...a step of encrypting the compressed data using an AES-256 encryption algorithm and then transmitting it to a remote server...

# Input Data (Minimum Viable Evidence package)
- [Evidence B (Ghidra)]: `encrypt_data(compressed_result->data, ...)`
- [Evidence C (x64dbg)]: Input buffer: `0xDCBA0000`, size: 150 for `AES_256_encrypt`
- [Evidence D (Wireshark)]: Payload entropy: 7.98 bits/byte

# Task
For claim element (c), write a paragraph starting with "SyncSphere performs this step by..." and support your assertion with the provided evidence.
        

Final Claim Chart (Example)

Claim 1 Element of U.S. Patent No. 15/987,654 Corresponding Element and Evidence in Accused Product (‘SyncSphere’ Client v2.5.1)
(a) a step of detecting, in real-time, the creation or modification of a file within a designated local folder; SyncSphere performs this step using an OS-level file system monitoring feature. When a user modifies a file in the designated ‘SyncSphere’ folder, the action is immediately detected, triggering the subsequent data processing procedures.

[Evidence A: Process Monitor Log] clearly shows that the SyncSphere.exe process accessed the file immediately after the user modified it at timestamp 14:01:15.123.
(b) a step of first applying a data compression algorithm to the detected file before transmitting it to a remote server; SyncSphere performs this step using a zlib-based compression library.

[Evidence B: Ghidra Decompiled Code] shows that the `compress_data_with_zlib` function is called as the first step in the file processing function.

[Evidence C: x64dbg Debugger Log] directly proves the actual execution order of this code. According to the log, the compression function (zlib.dll!compress) was clearly called before the encryption function.
(c) a method comprising the step of encrypting said compressed data by applying an AES-256 encryption algorithm, and then transmitting it to a remote server. SyncSphere performs this step by directly passing the output of the compression step as input to the AES-256 encryption function.

[Evidence B: Ghidra Decompiled Code] shows the data flow where the return value of the `compress_data_with_zlib` function is passed directly as an argument to the `encrypt_data_with_aes` function.

[Evidence C: x64dbg Debugger Log] corroborates this data flow at the memory level. The output buffer address (e.g., 0xDCBA0000) and size (e.g., 150 bytes) from the compression function exactly matched the input buffer for the `libcrypto.dll!AES_256_cbc_encrypt` function.

The subsequent transmission of the encrypted data is supported by [Evidence D: Wireshark Entropy Analysis]. The analysis revealed that the payload of data packets sent to the ‘SyncSphere’ server had a high entropy of 7.98 bits/byte, which is perfectly consistent with the statistical properties of AES-256 encrypted data.

 

Step 5: Expert Verification and Final Reporting – Giving Legal Weight to the Evidence

No matter how advanced AI becomes, it cannot assume legal responsibility. Every step of the analysis and all its outputs must be finally reviewed and signed off on by a human expert. All outputs generated by an LLM are merely ‘aids to interpretation,’ not evidence in themselves. This final step is what transforms the data organized by AI into powerful evidence with legal standing.

  • Cross-Verification of Facts: Meticulously verify that all analytical content generated by the LLM (code explanations, log summaries, etc.) matches the source data, correcting any technical errors or logical fallacies.
  • Integrity Assurance of the MVE Package: Finally, confirm the integrity of all items included in the Minimum Viable Evidence (MVE) package—from the hash value of the original file, to the versions of the tools used, all log records, and the records of interactions with the LLM.
  • Signing the Expert Declaration (Affidavit): As the analyst, sign a legal document affirming that all procedures were followed and that the analysis results represent your professional opinion.
๐Ÿ’ก Components of a Minimum Viable Evidence (MVE) Package
A Minimum Viable Evidence (MVE) package should consist of Identification Metadata, Static Evidence, Dynamic Evidence, Network Evidence, and a Concise Statement. It is best practice to store and share this as an archive (e.g., an encrypted ZIP file) along with an interchangeable JSON file.

Ultimately, it is the human expert who must testify in court and answer to cross-examination. It is only through these rigorous procedures that the data organized by AI is transformed into robust evidence that can withstand challenges in a legal setting.

๐Ÿ“‹

Patent Infringement Analysis Workflow Summary

๐Ÿ”’ 1. Legal/Forensic Prep: Secure authorization for analysis and calculate the original file hash to start the Minimum Viable Evidence (MVE) package.
๐Ÿ”Ž 2. Static Analysis: Analyze the executable itself to identify the presence and order of code related to ‘compression’ and ‘encryption,’ forming an infringement hypothesis.
⚡ 3. Dynamic Analysis: Run the program to observe file I/O, function call order, and network traffic to substantiate the hypothesis.
✍️ 4. Claim Chart Creation:
Map the collected technical evidence (code, logs) to each claim element of the patent on a 1:1 basis.
๐Ÿ‘จ‍⚖️ 5. Expert Verification: A human expert must finally verify all analysis results and LLM outputs, signing a legally binding declaration.

Conclusion: A Strategic Partnership Between Human Experts and AI

Using LLMs like ChatGPT, Gemini, and Claude in software patent infringement analysis is more than just a time-saver; it’s a strategic choice that elevates the depth and objectivity of the analysis. AI serves as a tireless partner, processing vast amounts of data and identifying patterns, while the human expert provides the creative insights and final legal judgment based on those findings.

Remember, the best tools shine brightest when they amplify the abilities of the person using them. We hope the forensics-based workflow presented in this guide will become a powerful and sharp weapon in defending your valuable intellectual property. If you have any further questions, feel free to leave a comment below!

 

Frequently Asked Questions (FAQ)

Q: Which LLM model is best to use?
A: There is no single ‘best’ model; there is only the ‘optimal’ model for each task. Claude might be better for summarizing and structuring long patent documents or logs, GPT-4o for complex code analysis and logical reasoning, and Gemini when visual materials like screenshots are involved. Understanding the strengths of each model and using them in combination is a key skill for an expert.
Q: Why is the ‘Minimum Viable Evidence (MVE) package’ so important?
A: The MVE is the core component that guarantees the ‘credibility’ and ‘reproducibility’ of the analysis results. During litigation, the opposing side will relentlessly attack questions like, “How was this evidence created?” and “Can the results be trusted?” The MVE transparently documents the entire process—from the original file to the tools used, all logs, and the analyst’s signature—defending against such attacks and serving as a legal safeguard that allows the judge to admit the evidence.
Q: Can I submit the JSON or code explanations generated by an LLM as evidence directly?
A: No. The outputs generated by an LLM (like JSON or code explanations) can be included in the MVE as a ‘record of the analysis process,’ but they are not the core evidence submitted directly to the court. The core evidence consists of the original log files, captured data, and, synthesizing all of this, the ‘Claim Chart’ and ‘Expert Report,’ written and signed by an expert. The LLM’s results are an intermediate product and a powerful aid in creating this final report.

K-Robot, ์ง€๊ธˆ ๊ฒฐ๋‹จํ•ด์•ผ ์‚ฐ๋‹ค: ็พŽ ํœด๋จธ๋…ธ์ด๋“œ ํˆฌ์ž ๊ด‘ํ’๊ณผ ํ•œ๊ตญ ์ •๋ถ€·๊ธฐ์—…์„ ์œ„ํ•œ 3๋Œ€ ๊ธด๊ธ‰ ์ œ์–ธ

  ๋กœ๋ด‡ ๋ฐ€๋„ 1์œ„ ํ•œ๊ตญ, ์ •๋ง ๋กœ๋ด‡ ๊ฐ•๊ตญ์ผ๊นŒ์š”? 2025๋…„ ๋ฏธ๊ตญ ์ œ์กฐ์—…์˜ ‘AI-๋กœ๋ด‡ ์œตํ•ฉ’ ํ˜„ํ™ฉ๊ณผ ํด๋Ÿฌ์Šคํ„ฐ๋ณ„ ํŠน์ง•์„ ์‹ฌ์ธต ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ํ•œ๊ตญ ์ •๋ถ€์™€ ๊ธฐ์—…์ด ‘๋„์•ฝ’ ์„ ๊ฒฐ์ •ํ•  2027๋…„๊นŒ์ง€์˜...