feat: save duplicates
This commit is contained in:
parent
f9043dd988
commit
888949b142
20
README.md
20
README.md
@ -1,6 +1,6 @@
|
||||
# IWD Archive Lister
|
||||
|
||||
This script scans the `main/` and `iw4x/` folders under a specified root directory for `.iwd` files (which are ZIP archives). For each `.iwd` file found, it extracts the list of files inside the archive using `7z` and writes the output to a `.txt` file in a folder called `out/`.
|
||||
The [list-iwd.sh](list-iwd.sh) script scans the `main/` and `iw4x/` folders under a specified root directory for `.iwd` files (which are ZIP archives). For each `.iwd` file found, it extracts the list of files inside the archive using `7z` and writes the output to a `.txt` file in a folder called `out/`.
|
||||
|
||||
Each `.iwd` file gets its own `.txt` file in the `out/` directory, with the same base name (e.g., `iw_00.iwd` -> `out/iw_00.iwd.txt`).
|
||||
|
||||
@ -22,3 +22,21 @@ sudo apt install p7zip-full
|
||||
```
|
||||
|
||||
Where `<root_directory>` is the path that contains both `main/` and `iw4x/` subfolders.
|
||||
|
||||
# IWD Archive Duplicate Finder
|
||||
|
||||
The [show-duplicates.py](show-duplicates.py) Python script scans all `.txt` files inside the `out/` directory, which were previously generated by extracting the contents of `.iwd` archives, and identifies duplicate file entries that appear in **more than one archive**.
|
||||
|
||||
It prints the results to the console and saves a full report to `out/duplicates/result.txt`.
|
||||
|
||||
## What It Does
|
||||
|
||||
- Reads every `.txt` file in the `out/` folder.
|
||||
- Detects which filenames appear in **multiple** `.txt` files (i.e. shared between archives).
|
||||
- Writes a detailed list of these duplicates to: out/duplicates/result.txt
|
||||
|
||||
Each duplicate line includes the filename and a list of `.txt` files (archives) it appears in.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.x
|
||||
|
1870
out/duplicates/result.txt
Normal file
1870
out/duplicates/result.txt
Normal file
File diff suppressed because it is too large
Load Diff
34
show-duplicates.py
Normal file
34
show-duplicates.py
Normal file
@ -0,0 +1,34 @@
|
||||
import os
|
||||
from collections import defaultdict
|
||||
|
||||
# Folder containing the .txt files
|
||||
out_folder = "out"
|
||||
duplicates_folder = os.path.join(out_folder, "duplicates")
|
||||
result_file_path = os.path.join(duplicates_folder, "result.txt")
|
||||
|
||||
os.makedirs(duplicates_folder, exist_ok=True)
|
||||
|
||||
# Map each line to a set of files that contain it
|
||||
line_to_files = defaultdict(set)
|
||||
|
||||
# Iterate over all .txt files in the out folder
|
||||
for filename in os.listdir(out_folder):
|
||||
if filename.endswith(".txt") and filename != "result.txt":
|
||||
path = os.path.join(out_folder, filename)
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line:
|
||||
line_to_files[line].add(filename)
|
||||
|
||||
# Open result file for writing
|
||||
with open(result_file_path, "w", encoding="utf-8") as result_file:
|
||||
print("Duplicate lines found in multiple files:\n")
|
||||
result_file.write("Duplicate lines found in multiple files:\n\n")
|
||||
for line, files in sorted(line_to_files.items()):
|
||||
if len(files) > 1:
|
||||
info = f"{line} -> in: {', '.join(sorted(files))}"
|
||||
print(info)
|
||||
result_file.write(info + "\n")
|
||||
|
||||
print(f"\nResults saved to: {result_file_path}")
|
Loading…
x
Reference in New Issue
Block a user