Showing posts with label Reverse Engineering. Show all posts
Showing posts with label Reverse Engineering. Show all posts

Saturday, September 27, 2025

Must-Read for In-House Patent Teams: Proving Software Patent Infringement Without Source Code – A Practical A-to-Z Guide to AI-Assisted Software Analysis

 

Software Patent Infringement: How Do You Prove It? This guide combines the latest reverse engineering techniques with Large Language Models (LLMs) to uncover crucial evidence within unseen code and create legally sound claim charts, all from the perspective of in-house patent experts.

 

Hello, patent professionals! Have you ever felt stuck, suspecting a competitor’s software infringes on your patent but having no way to prove it without the source code? Software patent infringement analysis is often compared to an investigation without a crime scene. You have to trace back the technical secrets using only one clue: the executable file distributed to the market.

Traditionally, this process required a massive amount of time and a high level of expertise. But now, Large Language Models (LLMs) are changing the game. LLMs are more than just assistants; they can be expert analytical partners with their own strengths—Claude for structuring vast documents, Gemini for multimodal analysis, and ChatGPT for drafting logical arguments.

This guide isn’t about turning patent attorneys or in-house counsel into reverse engineers. Instead, the goal is to provide a deep understanding of the process, enabling you to communicate effectively with technical experts and manage the quality of evidence that will ultimately decide the outcome of a lawsuit. So, shall we dive into the world of patent infringement analysis with AI? ๐Ÿ˜Š

Notice: Guidance and Disclaimers
  • This guide is for educational purposes only and does not constitute legal advice. Before beginning any analysis, you must consult with an intellectual property attorney in your jurisdiction.
  • The legality of reverse engineering varies by country and is subject to laws and contractual agreements (like EULAs). Always confirm the applicable regulations with your legal team in writing beforehand.
  • Do not send confidential code or assets to external LLM services. If unavoidable, proceed only after implementing safeguards like on-premise solutions, Data Loss Prevention (DLP), access controls, and a Data Processing Agreement (DPA).
  • LLM outputs may contain errors or hallucinations. Treat any reasoning from a model as unverified information until it has been independently confirmed by an expert and corroborated with technical evidence.

 

Analysis Scenario: A Hypothetical Patent Infringement Case

To illustrate the process, let’s set up a fictional patent and an accused product.

Case Overview

  • Fictional Patent: U.S. Patent No. 15/987,654, “Method for Data Processing and Transmission for Efficient File Synchronization.”
  • Core Technology: A sequential process that ① detects file changes in real-time, ② compresses the data, ③ encrypts it with AES-256, and then ④ transmits it to a server.
  • Target for Analysis: The Windows client for a cloud service called ‘SyncSphere,’ `SyncSphere.exe`.

 

Step 1: Legal & Forensic Pre-flight

Before any technical analysis begins, it’s crucial to establish the legal and procedural legitimacy of the entire process. The credibility of the evidence gathered in this stage will determine the direction of the entire case.

⚖️ Legal Pre-flight: Essential Checklist
  • Authorization: Review the software’s End User License Agreement (EULA) to assess the validity and legal risks associated with any clauses prohibiting reverse engineering. (Must verify against local laws like the DMCA in the U.S. or the Copyright Act in South Korea).
  • Attorney-Client Privilege: Clearly establish that the analysis is being conducted as part of legal counsel in anticipation of litigation. This helps protect the materials generated during the analysis.
  • Counsel Sign-off: Obtain written approval from legal counsel before conducting legally sensitive actions, such as network traffic interception or memory dumps, which may be subject to communication privacy laws.
  • Data Privacy: Evaluate the risk of collecting Personally Identifiable Information (PII) during dynamic analysis and establish measures to minimize or anonymize it in compliance with regulations like GDPR or PIPA.

Once the legal review is complete, begin the ‘Chain of Custody’ procedures, a fundamental principle of forensics. Calculate the SHA-256 hash of the `SyncSphere.exe` file to secure its “digital fingerprint” and meticulously document the versions of all analysis tools and the OS environment. All this information is recorded in a ‘Forensic Manifest’, which is the first step in ensuring the integrity and reproducibility of your evidence.

 

Step 2: Static Analysis – Uncovering the Code’s Blueprint

Static analysis involves dissecting the program’s internal structure without actually running it. This step helps verify if the program has the ‘capability’ to perform the patented technology and to form an infringement hypothesis.

Initial Reconnaissance

Before diving into the code, we use three reconnaissance techniques to set the direction of our analysis.

  1. String Extraction: Use the command strings -a SyncSphere.exe > strings.txt to extract all hardcoded text from the file. Keywords like “zlib”, “AES”, and “OpenSSL” are strong initial clues that suggest the presence of compression and encryption functionalities.
  2. PE Structure Analysis (PE-bear): Open `SyncSphere.exe` with a PE analysis tool to inspect the Import Address Table (IAT). The IAT is a list of external function dependencies, showing what functions the program borrows from Windows. File APIs from `kernel32.dll` (e.g., `CreateFileW`) indicate a capability for file detection (claim element a), while crypto APIs from `advapi32.dll` (e.g., `CryptEncrypt`) suggest an encryption capability (claim element c).
  3. Library Signature Scanning (signsrch): If libraries like zlib or OpenSSL were statically linked (i.e., included directly in the code), they won’t appear in the IAT. A tool like signsrch can identify them by scanning for their unique code patterns (signatures).

๐Ÿ“ Note: Advanced Use of LLMs in the Reconnaissance Phase

Initial static analysis (reconnaissance and hypothesis formation) is about gathering clues to direct the analysis before deep-diving into decompilation. This process includes string extraction, PE structure analysis, and library signature scanning.

LLMs can be used here to efficiently organize vast amounts of output data. For instance, a `strings_output.txt` file can contain tens of thousands to millions of lines. An LLM can automatically summarize this, extracting only the keywords and surrounding context directly related to the patent claims (b) and (c), such as compression, encryption, and server communication.

Additionally, an LLM can normalize and deduplicate the imported APIs from PE-bear/DumpPE outputs, categorize them into functional groups like file I/O and cryptography, and map each item to a claim element. For example, `CreateFileW`, `ReadFile`, and `WriteFile` can be linked to (a) ‘file change detection capability,’ while `CryptEncrypt` or bcrypt-family functions can be linked to (c) ‘encryption capability.’ The LLM can then draft concise statements for each element and also note uncertainties, such as, “The presence of an import does not confirm its use at runtime,” and suggest what further evidence is needed.

Similarly, an LLM can normalize the results from Signsrch, remove duplicate signatures, and map each signature to its presumed library and version. This helps in describing whether static linking is present and connecting the detected libraries to claim (b) for compression (zlib) and (c) for encryption (OpenSSL/LibreSSL/AES).

*For the sake of readability, specific prompt examples for these tasks have been omitted from the main text.

Deep Dive with Ghidra & LLM

Using the clues from reconnaissance, we analyze the actual code logic with a decompiler like Ghidra or IDA Pro. By cross-referencing strings like ‘AES’, we can locate the core function containing the encryption logic (e.g., `process_file_for_upload`). We then examine the decompiled pseudo-code to see if the output of a `compress_data` function is passed directly as input to an `encrypt_data` function. This data flow is the key evidence that proves the sequential steps of the patent.

LLM Prompt Example: Code Logic Analysis and Structured Output

You can ask an LLM to translate complex pseudo-code into clear language that a legal professional can understand and to lay the groundwork for an analysis report.


# Role
You are a C++ reverse engineering expert supporting patent analysis.

# Input
Below is the pseudo-code for the `process_file_for_upload` function, decompiled using Ghidra.
[Paste Ghidra pseudo-code here]

# Task
1.  Describe the data processing pipeline (steps, order) of this function accurately.
2.  Confirm if the result of the compression is used as a direct input for the encryption and specify the variables that support this.
3.  State a clear conclusion on whether this function follows a 'compress-then-encrypt' architecture.
4.  Output the results in the JSON format below.

# Output Format (JSON)
{
  "finding_text": "A clear description of the function's behavior",
  "evidence_refs": ["Quote the lines of code that serve as evidence"],
  "confidence_score": 0.9,
  "uncertainties": ["What cannot be determined from the code alone"]
}
        

➡️ Expected LLM Output (JSON):


{
  "finding_text": "This function compresses the input data using zlib, then uses the compressed result directly as input for the AES encryption function, and finally sends it to the server. The compression clearly precedes the encryption, and the data flow is directly linked.",
  "evidence_refs": [
    "compressed_result = compress_data_with_zlib(original_data);",
    "final_payload = encrypt_data_with_aes(compressed_result->data, compressed_result->size);"
  ],
  "confidence_score": 0.9,
  "uncertainties": [
    "The AES mode of operation (e.g., CBC/GCM) and the key's origin cannot be determined from this code alone."
  ]
}
        
Heads up! The Limitations of Static Analysis
The findings from static analysis are merely a ‘hypothesis’ that must be proven with dynamic testing. The existence of a certain function in the code doesn’t guarantee it’s used at runtime in a manner that infringes the patent. Furthermore, if techniques like code obfuscation or packing are used, it can be extremely difficult to understand the true logic through static analysis alone.

 

Step 3: Dynamic Analysis – Capturing the Action

Dynamic analysis is the stage where you prove that the hypotheses formed during static analysis are actually put into ‘action’ at runtime, using objective logs and data. It is crucial that this process is conducted in a controlled environment (like a virtual machine or a rooted physical device).

  1. Verifying Real-time Detection (Process Monitor): Use ProcMon to monitor file system access by `SyncSphere.exe`. Confirm with timestamps that as soon as a file is saved in the sync folder, `SyncSphere.exe` immediately triggers a related file event. This log becomes direct evidence of ‘real-time detection.’
  2. Verifying Sequence and Data Flow (x64dbg): Attach a debugger (like x64dbg) to the running `SyncSphere.exe` process and set breakpoints at the memory addresses of the compression and encryption functions found in Step 2. When you sync a file, confirm the order in which the breakpoints are hit: ① the compression function should be hit first, followed by ② the encryption function. Crucially, verify that the memory address and size of the output buffer returned by the compression function exactly match the input buffer for the encryption function. This is the ‘smoking gun’ evidence that proves ‘compress-then-encrypt.’
  3. Verifying Post-Encryption Transmission (Wireshark & Burp Suite): Capture the network traffic generated by the program with Wireshark. Analyze the entropy of the transmitted data. Well-encrypted data is close to random, so its entropy will approach the theoretical maximum of 8.0. High entropy is strong circumstantial evidence that the data was transmitted after encryption.

LLM Prompt Example: Correlating Multiple Logs

You can ask an LLM to synthesize disparate logs from ProcMon, x64dbg, and Wireshark into a single, coherent timeline of events.


# Role
You are a digital forensics expert.

# Input
[Paste combined, timestamped logs from ProcMon, x64dbg, and Wireshark here]

# Task
1.  Reconstruct a timeline by ordering all logs chronologically.
2.  Analyze whether a causal relationship exists for the sequence: "File Save → Compression Function Call → Encryption Function Call → Network Transmission."
3.  Confirm from the x64dbg log that the output buffer of the compression function matches the input buffer of the encryption function.
4.  Based on the above analysis, write a concluding statement that supports the patent infringement hypothesis.
        
Heads up! Real-World Hurdles in Dynamic Analysis
Commercial software employs various security measures to thwart analysis. SSL Pinning, for instance, hardcodes a specific server certificate into the app, causing the connection to fail if a man-in-the-middle (MITM) attack is attempted to intercept packets. Therefore, simply capturing packets is not enough to see the plaintext data. A dynamic instrumentation tool like Frida can be used to observe or manipulate function calls within the app, allowing you to see data before it’s encrypted. However, many commercial apps also include anti-debugging and anti-hooking techniques to detect and block these tools. For example, the app might terminate or branch to a different execution path if a debugger is detected, or it might block hooking attempts, rendering the analysis futile. Overcoming SSL pinning, MITM avoidance, and anti-analysis techniques requires a high degree of expertise and adherence to legal procedures.

 

Step 4: Creating a Claim Chart – Translating Evidence into a Legal Argument

The claim chart is the most critical legal document in a patent lawsuit. It’s an evidence comparison table that clearly maps the collected technical evidence to each element of the patent’s claims, acting as a bridge to help non-experts like judges and juries easily understand the infringement.

LLM Prompt Example: Drafting the Claim Chart Narrative

By providing the facts collected by the analyst, an LLM can be prompted to structure them into the prose suitable for a legal document.


# Persona and Mission
You are a technical expert in a patent litigation case. Using the provided evidence, draft the 'Evidence of Infringement' section of a claim chart. Your writing must be objective and fact-based. Each piece of evidence must be clearly cited with its corresponding label (e.g., [Evidence A]).

# Context
- Patent Number: U.S. 15/987,654
- Claim 1(c): ...a step of encrypting the compressed data using an AES-256 encryption algorithm and then transmitting it to a remote server...

# Input Data (Minimum Viable Evidence package)
- [Evidence B (Ghidra)]: `encrypt_data(compressed_result->data, ...)`
- [Evidence C (x64dbg)]: Input buffer: `0xDCBA0000`, size: 150 for `AES_256_encrypt`
- [Evidence D (Wireshark)]: Payload entropy: 7.98 bits/byte

# Task
For claim element (c), write a paragraph starting with "SyncSphere performs this step by..." and support your assertion with the provided evidence.
        

Final Claim Chart (Example)

Claim 1 Element of U.S. Patent No. 15/987,654 Corresponding Element and Evidence in Accused Product (‘SyncSphere’ Client v2.5.1)
(a) a step of detecting, in real-time, the creation or modification of a file within a designated local folder; SyncSphere performs this step using an OS-level file system monitoring feature. When a user modifies a file in the designated ‘SyncSphere’ folder, the action is immediately detected, triggering the subsequent data processing procedures.

[Evidence A: Process Monitor Log] clearly shows that the SyncSphere.exe process accessed the file immediately after the user modified it at timestamp 14:01:15.123.
(b) a step of first applying a data compression algorithm to the detected file before transmitting it to a remote server; SyncSphere performs this step using a zlib-based compression library.

[Evidence B: Ghidra Decompiled Code] shows that the `compress_data_with_zlib` function is called as the first step in the file processing function.

[Evidence C: x64dbg Debugger Log] directly proves the actual execution order of this code. According to the log, the compression function (zlib.dll!compress) was clearly called before the encryption function.
(c) a method comprising the step of encrypting said compressed data by applying an AES-256 encryption algorithm, and then transmitting it to a remote server. SyncSphere performs this step by directly passing the output of the compression step as input to the AES-256 encryption function.

[Evidence B: Ghidra Decompiled Code] shows the data flow where the return value of the `compress_data_with_zlib` function is passed directly as an argument to the `encrypt_data_with_aes` function.

[Evidence C: x64dbg Debugger Log] corroborates this data flow at the memory level. The output buffer address (e.g., 0xDCBA0000) and size (e.g., 150 bytes) from the compression function exactly matched the input buffer for the `libcrypto.dll!AES_256_cbc_encrypt` function.

The subsequent transmission of the encrypted data is supported by [Evidence D: Wireshark Entropy Analysis]. The analysis revealed that the payload of data packets sent to the ‘SyncSphere’ server had a high entropy of 7.98 bits/byte, which is perfectly consistent with the statistical properties of AES-256 encrypted data.

 

Step 5: Expert Verification and Final Reporting – Giving Legal Weight to the Evidence

No matter how advanced AI becomes, it cannot assume legal responsibility. Every step of the analysis and all its outputs must be finally reviewed and signed off on by a human expert. All outputs generated by an LLM are merely ‘aids to interpretation,’ not evidence in themselves. This final step is what transforms the data organized by AI into powerful evidence with legal standing.

  • Cross-Verification of Facts: Meticulously verify that all analytical content generated by the LLM (code explanations, log summaries, etc.) matches the source data, correcting any technical errors or logical fallacies.
  • Integrity Assurance of the MVE Package: Finally, confirm the integrity of all items included in the Minimum Viable Evidence (MVE) package—from the hash value of the original file, to the versions of the tools used, all log records, and the records of interactions with the LLM.
  • Signing the Expert Declaration (Affidavit): As the analyst, sign a legal document affirming that all procedures were followed and that the analysis results represent your professional opinion.
๐Ÿ’ก Components of a Minimum Viable Evidence (MVE) Package
A Minimum Viable Evidence (MVE) package should consist of Identification Metadata, Static Evidence, Dynamic Evidence, Network Evidence, and a Concise Statement. It is best practice to store and share this as an archive (e.g., an encrypted ZIP file) along with an interchangeable JSON file.

Ultimately, it is the human expert who must testify in court and answer to cross-examination. It is only through these rigorous procedures that the data organized by AI is transformed into robust evidence that can withstand challenges in a legal setting.

๐Ÿ“‹

Patent Infringement Analysis Workflow Summary

๐Ÿ”’ 1. Legal/Forensic Prep: Secure authorization for analysis and calculate the original file hash to start the Minimum Viable Evidence (MVE) package.
๐Ÿ”Ž 2. Static Analysis: Analyze the executable itself to identify the presence and order of code related to ‘compression’ and ‘encryption,’ forming an infringement hypothesis.
⚡ 3. Dynamic Analysis: Run the program to observe file I/O, function call order, and network traffic to substantiate the hypothesis.
✍️ 4. Claim Chart Creation:
Map the collected technical evidence (code, logs) to each claim element of the patent on a 1:1 basis.
๐Ÿ‘จ‍⚖️ 5. Expert Verification: A human expert must finally verify all analysis results and LLM outputs, signing a legally binding declaration.

Conclusion: A Strategic Partnership Between Human Experts and AI

Using LLMs like ChatGPT, Gemini, and Claude in software patent infringement analysis is more than just a time-saver; it’s a strategic choice that elevates the depth and objectivity of the analysis. AI serves as a tireless partner, processing vast amounts of data and identifying patterns, while the human expert provides the creative insights and final legal judgment based on those findings.

Remember, the best tools shine brightest when they amplify the abilities of the person using them. We hope the forensics-based workflow presented in this guide will become a powerful and sharp weapon in defending your valuable intellectual property. If you have any further questions, feel free to leave a comment below!

 

Frequently Asked Questions (FAQ)

Q: Which LLM model is best to use?
A: There is no single ‘best’ model; there is only the ‘optimal’ model for each task. Claude might be better for summarizing and structuring long patent documents or logs, GPT-4o for complex code analysis and logical reasoning, and Gemini when visual materials like screenshots are involved. Understanding the strengths of each model and using them in combination is a key skill for an expert.
Q: Why is the ‘Minimum Viable Evidence (MVE) package’ so important?
A: The MVE is the core component that guarantees the ‘credibility’ and ‘reproducibility’ of the analysis results. During litigation, the opposing side will relentlessly attack questions like, “How was this evidence created?” and “Can the results be trusted?” The MVE transparently documents the entire process—from the original file to the tools used, all logs, and the analyst’s signature—defending against such attacks and serving as a legal safeguard that allows the judge to admit the evidence.
Q: Can I submit the JSON or code explanations generated by an LLM as evidence directly?
A: No. The outputs generated by an LLM (like JSON or code explanations) can be included in the MVE as a ‘record of the analysis process,’ but they are not the core evidence submitted directly to the court. The core evidence consists of the original log files, captured data, and, synthesizing all of this, the ‘Claim Chart’ and ‘Expert Report,’ written and signed by an expert. The LLM’s results are an intermediate product and a powerful aid in creating this final report.

K-Robot, ์ง€๊ธˆ ๊ฒฐ๋‹จํ•ด์•ผ ์‚ฐ๋‹ค: ็พŽ ํœด๋จธ๋…ธ์ด๋“œ ํˆฌ์ž ๊ด‘ํ’๊ณผ ํ•œ๊ตญ ์ •๋ถ€·๊ธฐ์—…์„ ์œ„ํ•œ 3๋Œ€ ๊ธด๊ธ‰ ์ œ์–ธ

  ๋กœ๋ด‡ ๋ฐ€๋„ 1์œ„ ํ•œ๊ตญ, ์ •๋ง ๋กœ๋ด‡ ๊ฐ•๊ตญ์ผ๊นŒ์š”? 2025๋…„ ๋ฏธ๊ตญ ์ œ์กฐ์—…์˜ ‘AI-๋กœ๋ด‡ ์œตํ•ฉ’ ํ˜„ํ™ฉ๊ณผ ํด๋Ÿฌ์Šคํ„ฐ๋ณ„ ํŠน์ง•์„ ์‹ฌ์ธต ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ํ•œ๊ตญ ์ •๋ถ€์™€ ๊ธฐ์—…์ด ‘๋„์•ฝ’ ์„ ๊ฒฐ์ •ํ•  2027๋…„๊นŒ์ง€์˜...