feat: save duplicates

This commit is contained in:
AlterWare - WSL 2025-05-21 22:58:45 +02:00
parent f9043dd988
commit 888949b142
No known key found for this signature in database
3 changed files with 1923 additions and 1 deletions

View File

@ -1,6 +1,6 @@
# IWD Archive Lister
This script scans the `main/` and `iw4x/` folders under a specified root directory for `.iwd` files (which are ZIP archives). For each `.iwd` file found, it extracts the list of files inside the archive using `7z` and writes the output to a `.txt` file in a folder called `out/`.
The [list-iwd.sh](list-iwd.sh) script scans the `main/` and `iw4x/` folders under a specified root directory for `.iwd` files (which are ZIP archives). For each `.iwd` file found, it extracts the list of files inside the archive using `7z` and writes the output to a `.txt` file in a folder called `out/`.
Each `.iwd` file gets its own `.txt` file in the `out/` directory, with the same base name (e.g., `iw_00.iwd` -> `out/iw_00.iwd.txt`).
@ -22,3 +22,21 @@ sudo apt install p7zip-full
```
Where `<root_directory>` is the path that contains both `main/` and `iw4x/` subfolders.
# IWD Archive Duplicate Finder
The [show-duplicates.py](show-duplicates.py) Python script scans all `.txt` files inside the `out/` directory, which were previously generated by extracting the contents of `.iwd` archives, and identifies duplicate file entries that appear in **more than one archive**.
It prints the results to the console and saves a full report to `out/duplicates/result.txt`.
## What It Does
- Reads every `.txt` file in the `out/` folder.
- Detects which filenames appear in **multiple** `.txt` files (i.e. shared between archives).
- Writes a detailed list of these duplicates to: out/duplicates/result.txt
Each duplicate line includes the filename and a list of `.txt` files (archives) it appears in.
## Requirements
- Python 3.x

1870
out/duplicates/result.txt Normal file

File diff suppressed because it is too large Load Diff

34
show-duplicates.py Normal file
View File

@ -0,0 +1,34 @@
import os
from collections import defaultdict
# Folder containing the .txt files
out_folder = "out"
duplicates_folder = os.path.join(out_folder, "duplicates")
result_file_path = os.path.join(duplicates_folder, "result.txt")
os.makedirs(duplicates_folder, exist_ok=True)
# Map each line to a set of files that contain it
line_to_files = defaultdict(set)
# Iterate over all .txt files in the out folder
for filename in os.listdir(out_folder):
if filename.endswith(".txt") and filename != "result.txt":
path = os.path.join(out_folder, filename)
with open(path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:
line_to_files[line].add(filename)
# Open result file for writing
with open(result_file_path, "w", encoding="utf-8") as result_file:
print("Duplicate lines found in multiple files:\n")
result_file.write("Duplicate lines found in multiple files:\n\n")
for line, files in sorted(line_to_files.items()):
if len(files) > 1:
info = f"{line} -> in: {', '.join(sorted(files))}"
print(info)
result_file.write(info + "\n")
print(f"\nResults saved to: {result_file_path}")