maint: best code, literally the best

remove br
encoding
2024-09-05 13:41:30 +02:00 · 2024-05-08 12:49:59 +08:00 · 2024-05-08 01:50:22 +08:00 · 2024-02-22 19:49:25 +08:00 · 2024-02-22 19:21:58 +08:00 · 2024-02-22 18:55:21 +08:00
10 changed files with 528 additions and 214 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,9 @@
 *.part
 *.pyc
 *.log
 # output
 set.yaml
 tag.json
 output.csv
 torrents
--- a/121
+++ b/121
@@ -0,0 +1,121 @@
 Creative Commons Legal Code
 CC0 1.0 Universal
    CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
    LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
    ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
    INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
    REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
    PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
    THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
    HEREUNDER.
 Statement of Purpose
 The laws of most jurisdictions throughout the world automatically confer
 exclusive Copyright and Related Rights (defined below) upon the creator
 and subsequent owner(s) (each and all, an "owner") of an original work of
 authorship and/or a database (each, a "Work").
 Certain owners wish to permanently relinquish those rights to a Work for
 the purpose of contributing to a commons of creative, cultural and
 scientific works ("Commons") that the public can reliably and without fear
 of later claims of infringement build upon, modify, incorporate in other
 works, reuse and redistribute as freely as possible in any form whatsoever
 and for any purposes, including without limitation commercial purposes.
 These owners may contribute to the Commons to promote the ideal of a free
 culture and the further production of creative, cultural and scientific
 works, or to gain reputation or greater distribution for their Work in
 part through the use and efforts of others.
 For these and/or other purposes and motivations, and without any
 expectation of additional consideration or compensation, the person
 associating CC0 with a Work (the "Affirmer"), to the extent that he or she
 is an owner of Copyright and Related Rights in the Work, voluntarily
 elects to apply CC0 to the Work and publicly distribute the Work under its
 terms, with knowledge of his or her Copyright and Related Rights in the
 Work and the meaning and intended legal effect of CC0 on those rights.
 1. Copyright and Related Rights. A Work made available under CC0 may be
 protected by copyright and related or neighboring rights ("Copyright and
 Related Rights"). Copyright and Related Rights include, but are not
 limited to, the following:
  i. the right to reproduce, adapt, distribute, perform, display,
     communicate, and translate a Work;
 ii. moral rights retained by the original author(s) and/or performer(s);
 iii. publicity and privacy rights pertaining to a person's image or
     likeness depicted in a Work;
 iv. rights protecting against unfair competition in regards to a Work,
     subject to the limitations in paragraph 4(a), below;
  v. rights protecting the extraction, dissemination, use and reuse of data
     in a Work;
 vi. database rights (such as those arising under Directive 96/9/EC of the
     European Parliament and of the Council of 11 March 1996 on the legal
     protection of databases, and under any national implementation
     thereof, including any amended or successor version of such
     directive); and
 vii. other similar, equivalent or corresponding rights throughout the
     world based on applicable law or treaty, and any national
     implementations thereof.
 2. Waiver. To the greatest extent permitted by, but not in contravention
 of, applicable law, Affirmer hereby overtly, fully, permanently,
 irrevocably and unconditionally waives, abandons, and surrenders all of
 Affirmer's Copyright and Related Rights and associated claims and causes
 of action, whether now known or unknown (including existing as well as
 future claims and causes of action), in the Work (i) in all territories
 worldwide, (ii) for the maximum duration provided by applicable law or
 treaty (including future time extensions), (iii) in any current or future
 medium and for any number of copies, and (iv) for any purpose whatsoever,
 including without limitation commercial, advertising or promotional
 purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
 member of the public at large and to the detriment of Affirmer's heirs and
 successors, fully intending that such Waiver shall not be subject to
 revocation, rescission, cancellation, termination, or any other legal or
 equitable action to disrupt the quiet enjoyment of the Work by the public
 as contemplated by Affirmer's express Statement of Purpose.
 3. Public License Fallback. Should any part of the Waiver for any reason
 be judged legally invalid or ineffective under applicable law, then the
 Waiver shall be preserved to the maximum extent permitted taking into
 account Affirmer's express Statement of Purpose. In addition, to the
 extent the Waiver is so judged Affirmer hereby grants to each affected
 person a royalty-free, non transferable, non sublicensable, non exclusive,
 irrevocable and unconditional license to exercise Affirmer's Copyright and
 Related Rights in the Work (i) in all territories worldwide, (ii) for the
 maximum duration provided by applicable law or treaty (including future
 time extensions), (iii) in any current or future medium and for any number
 of copies, and (iv) for any purpose whatsoever, including without
 limitation commercial, advertising or promotional purposes (the
 "License"). The License shall be deemed effective as of the date CC0 was
 applied by Affirmer to the Work. Should any part of the License for any
 reason be judged legally invalid or ineffective under applicable law, such
 partial invalidity or ineffectiveness shall not invalidate the remainder
 of the License, and in such case Affirmer hereby affirms that he or she
 will not (i) exercise any of his or her remaining Copyright and Related
 Rights in the Work or (ii) assert any associated claims and causes of
 action with respect to the Work, in either case contrary to Affirmer's
 express Statement of Purpose.
 4. Limitations and Disclaimers.
 a. No trademark or patent rights held by Affirmer are waived, abandoned,
    surrendered, licensed or otherwise affected by this document.
 b. Affirmer offers the Work as-is and makes no representations or
    warranties of any kind concerning the Work, express, implied,
    statutory or otherwise, including without limitation warranties of
    title, merchantability, fitness for a particular purpose, non
    infringement, or the absence of latent or other defects, accuracy, or
    the present or absence of errors, whether or not discoverable, all to
    the greatest extent permissible under applicable law.
 c. Affirmer disclaims responsibility for clearing rights of other persons
    that may apply to the Work or any use thereof, including without
    limitation any person's Copyright and Related Rights in the Work.
    Further, Affirmer disclaims responsibility for obtaining any necessary
    consents, permissions or other rights required for any use of the
    Work.
 d. Affirmer understands and acknowledges that Creative Commons is not a
    party to this document and has no duty or obligation with respect to
    this CC0 or use of the Work.
--- a/README.md
+++ b/README.md
@@ -1,21 +1,29 @@
 # nhentai-favorites
-### how to use?
+Zǎoshang hǎo zhōngguó xiànzài wǒ yǒu BING CHILLING 🥶🍦 wǒ hěn xǐhuān BING CHILLING 🥶🍦 dànshì sùdù yǔ jīqíng 9 bǐ BING CHILLING 🥶🍦 sùdù yǔ jīqíng sùdù yǔ jīqíng 9 wǒ zuì xǐhuān suǒyǐ…xiànzài shì yīnyuè shíjiān zhǔnbèi 1 2 3 liǎng gè lǐbài yǐhòu sùdù yǔ jīqíng 9 ×3 bùyào wàngjì bùyào cu òguò jìdé qù diànyǐngyuàn kàn sùdù yǔ jīqíng 9 yīn wéi fēicháng hǎo diànyǐng dòngzuò fēicháng hǎo chàbùduō yīyàng BING CHILLING 🥶🍦zàijiàn 🥶🍦
 This project is a meme but it works until you have too many favorites to scrape and you get rate limited, or so I was told by a friend, not that I would know.
 ## how to use?
 `pip install -r ".\requirements.txt"`  
 open `nfavorites.py` and it will close and generate set.yaml  
 open `set.yaml` and enter your cookie and useragent  
 final open nfavorites.py again and it will generate csv file if everything is ok
 ## how to get my cookie?
-Rename_me_set.yaml (rename)-> set.yaml  
+open <https://nhentai.net/favorites/>  
-enter your cookie  
+open developer tools (F12)  
-open nfavorites.py
+switch to network tab  
 find favorites/ and click it  
 find cookie and useragent in request headers  
-### how to get my cookie?
+## if something goes wrong in gettags
-open https://nhentai.net/favorites/  
+rename `example_tag.json` to `tag.json`  
-press F12  
+rerun `nfavorites.py`  
 switch to network menu  
 find favorites/  
 copy cookie to set.yaml
-![alt text](https://github.com/phillychi3/nhentai-favorites/blob/main/image/nhentai_cookie.png?raw=true)
+![alt text](https://github.com/phillychi3/nhentai-favorites/blob/main/image/nhentai_cookie_anduseranegt.png?raw=true)
 ![alt text](https://github.com/phillychi3/nhentai-favorites/blob/main/image/csv.png?raw=true)
--- a/Rename_me_set.yaml
+++ b/Rename_me_set.yaml
@@ -1 +0,0 @@
 cookid: ""
--- a/example_tag.json
+++ b/example_tag.json
--- a/gettags.py
+++ b/gettags.py
@@ -1,23 +1,42 @@
 import gevent.monkey
 gevent.monkey.patch_all()
 import json
 import fake_useragent
 import requests
 from bs4 import BeautifulSoup
 import requests
 import json
 import yaml
 URL = "https://nhentai.net/tags/"
 def wtfcloudflare(url, method="get", useragent=None, cookie=None, data=None):
    session = requests.Session()
    session.headers = {
        'Referer': "https://nhentai.net/login/",
        'User-Agent': useragent,
        'Cookie': cookie,
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
    }
    if method == "get":
        r = session.get(url)
    elif method == "post":
        r = session.post(url, data=data)
    return r
 url = "https://nhentai.net/tags/"
 def get_tags():
    with open('set.yaml', 'r') as f:
        data = yaml.load(f, Loader=yaml.CLoader)
        cookie = data["cookid"]
        useragent = data["useragent"]
        if cookie == "":
            print("Please edit set.yaml")
            exit()
    now = 1
    tagjson = {}
    while True:
-        ua = fake_useragent.UserAgent()
+        data = wtfcloudflare(f"{URL}?page={now}",
-        useragent = ua.random
+                             useragent=useragent, cookie=cookie)
        headers = {
            'user-agent': useragent
        }
        data = requests.get(f"{url}?page={now}", headers=headers)
        soup = BeautifulSoup(data.text, 'html.parser')
        tags = soup.find_all("a", class_='tag')
        if tags == []:
@@ -30,9 +49,14 @@ def get_tags():
            tagnumber.append(fixnum)
        for i in enumerate(tagnumber):
            tagjson[i[1]] = tagnames[i[0]]
        print(f"page {now} done")
        now += 1
    if tagjson == {}:
        print("something wrong with your cookie or useragent")
        exit()
    with open('tag.json', 'w') as f:
        json.dump(tagjson, f)
    print("tag.json saved")
    return
--- a/image/nhentai_cookie_anduseranegt.png
+++ b/image/nhentai_cookie_anduseranegt.png
--- a/nfavorites.py
+++ b/nfavorites.py
@@ -3,23 +3,28 @@ from progress.spinner import PixelSpinner
 from bs4 import BeautifulSoup
 import yaml
 import requests
-import fake_useragent
+import locale
 import time
 import threading
 import random
 import queue
 import os
 import json
 import csv
 import gevent.monkey
 gevent.monkey.patch_all()
 if not os.path.isfile("set.yaml"):
    with open('set.yaml', 'w') as f:
        yaml.dump({"cookid": "", "useragent": ""}, f)
    print("Please edit set.yaml")
    exit()
 with open('set.yaml', 'r') as f:
-    cookie = yaml.load(f, Loader=yaml.CLoader)["cookid"]
+    data = yaml.load(f, Loader=yaml.CLoader)
    cookie = data["cookid"]
    useragent = data["useragent"]
    if cookie == "":
        print("Please edit set.yaml")
        exit()
 # setting
-url = "https://nhentai.net/favorites/"
+URL = "https://nhentai.net/favorites/"
-apiurl = "https://nhentai.net/api/gallery/"
+APIURL = "https://nhentai.net/api/gallery/"
 table = [
    ["id", "name", "tags"]
 ]
@@ -27,74 +32,79 @@ now = 1
 allnumbers = []
 allnames = []
 alltags = []
-ua = fake_useragent.UserAgent()
+locate = locale.getdefaultlocale()[0]
-useragent = ua.random
+if locate == "zh_TW":
    language = {
        "nodata": "沒有發現離線資料 抓取中請稍後...",
        "nodata2": "抓取完畢",
        "usedata": "使用離線資料",
        "getdata": "抓取資料中...",
        "403": "403 錯誤，可能被 cloudflare 阻擋，請檢查 cookie 是否正確",
        "nologin": "未登入，請先登入",
        "done": "完成"
    }
 else:
    language = {
        "nodata": "No offline data found, please wait a moment...",
        "nodata2": "Done",
        "usedata": "Use offline data",
        "getdata": "Getting data...",
        "403": "403 error, maby block by cloudflare , please check if the cookie is correct",
        "nologin": "Not login, please login first",
        "done": "Done"
    }
 def banner():
    data = r"               _           _        _         ___  _ \
    _ __   ___| |__  _ __ | |_ __ _(_)        / __\/_\/\   /\ \
    | '_ \ / _ \ '_ \| '_ \| __/ _` | |_____ / _\ //_\\ \ / / \
    | | | |  __/ | | | | | | || (_| | |_____/ /  /  _  \ V /  \
    |_| |_|\___|_| |_|_| |_|\__\__,_|_|     \/   \_/ \_/\_/   \
                                                            "
    print(data)
 # request
-def wtfcloudflare(url,method="get",data=None):
+
 def wtfcloudflare(url, method="get", data=None):
    session = requests.Session()
    session.headers = {
        'Referer': "https://nhentai.net/login/",
-        'User-Agent': "",
+        'User-Agent': useragent,
        'Cookie': cookie,
-        'Accept-Language': 'en-US,en;q=0.9',
+        'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7,zh-CN;q=0.6',
-        'Accept-Encoding': 'gzip, deflate, br',
+        'Accept-Encoding': 'gzip, deflate',
    }
    if method == "get":
        r = session.get(url)
    elif method == "post":
-        r = session.post(url,data=data)
+        r = session.post(url, data=data)
    r.encoding = 'utf-8'
    return r
-
+def check_pass():
-class gettagonline(threading.Thread):
+    res = wtfcloudflare("https://nhentai.net/")
-    def __init__(self, queue, number):
+    if res.status_code == 403:
-        threading.Thread.__init__(self)
+        print(language["403"])
-        self.number = number
+        exit()
        self.queue = queue
    # def run(self):
    #     while self.queue.qsize() > 0:
    #         num = self.queue.get()
    #         # print("get %d: %s" % (self.number, num))
    #         ua = fake_useragent.UserAgent()
    #         useragent = ua.random
    #         headers = {
    #             'user-agent': useragent
    #         }
    #         r = requests.get(apiurl + num, headers=headers)
    #         data = r.json()
    #         ctag = []
    #         for i in enumerate(data['tags']):
    #             ctag.append(i[1]['name'])
    #         alltags.append(ctag)
    #         time.sleep(random.uniform(0.5, 1))
-set1 = input("請問要使用離線資料嗎?(y/n)(默認為否)")
+# --- main ---
-if set1 == "y".lower() or set1 == "yes".lower():
+banner()
-    if not os.path.isfile("tag.json"):
+check_pass()
-        print("沒有發現離線資料 抓取中請稍後...")
+if not os.path.isfile("tag.json"):
    print(language["nodata"])
    get_tags()
-        print("抓取完畢")
+    print(language["nodata2"])
-    print("使用離線資料")
+print(language["usedata"])
-else:
+spinner = PixelSpinner(language["getdata"])
    print("使用線上資料")
    threadscount = input("請輸入要使用幾個線程(默認為5 不可超過10)")
    if threadscount == "":
        threadscount = 5
    else:
        try:
            threadscount = int(threadscount)
            if threadscount > 10:
                threadscount = 10
        except:
            threadscount = 5
 spinner = PixelSpinner('抓取資料中...')
 while True:
-    data = wtfcloudflare(f"{url}?page={now}")
+    data = wtfcloudflare(f"{URL}?page={now}")
    if "Abandon all hope, ye who enter here" in data.text:
        print(language["nologin"])
        exit()
    soup = BeautifulSoup(data.text, 'html.parser')
    book = soup.find_all("div", class_='gallery-favorite')
    if book == []:
@@ -113,33 +123,18 @@ while True:
    spinner.next()
-if set1 == "y".lower() or set1 == "yes".lower():
+with open('tag.json', 'r') as f:
    with open('tag.json', 'r') as f:
    tagjson = json.load(f)
-    for i in enumerate(allnumbers):
+for i in enumerate(allnumbers):
    tagstr = ""
    for j in alltags[i[0]]:
        if j in tagjson:
            tagstr += tagjson[j] + ", "
    table.append([i[1], allnames[i[0]], tagstr])
 else:
    alltags = []  # 清空
    get_tags_queue = queue.Queue()
    threads = []
    for i in allnumbers:
        get_tags_queue.put(i)
    for i in range(threadscount):
        t = gettagonline(get_tags_queue, i)
        t.start()
        threads.append(t)
    for t in threads:
        t.join()
    for i in enumerate(allnumbers):
        table.append([i[1], allnames[i[0]], alltags[i[0]]])
 with open('output.csv', 'w', newline='', encoding="utf_8_sig") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(table)
 print(language["done"])
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,6 +1,6 @@
-PyYAML == 5.4.1
+# Automatically generated by https://github.com/damnever/pigar.
-bs4 
+
-fake_useragent == 0.1.11
+beautifulsoup4
-gevent == 21.1.2
+progress
-progress == 1.6
+PyYAML
-requests == 2.27.1
+requests
--- a/scraper.py
+++ b/scraper.py
@@ -0,0 +1,163 @@
 from progress.spinner import PixelSpinner
 from bs4 import BeautifulSoup
 import yaml
 import requests
 import locale
 import os
 import json
 import csv
 if not os.path.isfile("set.yaml"):
    with open('set.yaml', 'w') as f:
        yaml.dump({"cookid": "", "useragent": ""}, f)
    print("Please edit set.yaml")
    exit()
 with open('set.yaml', 'r') as f:
    data = yaml.load(f, Loader=yaml.CLoader)
    cookie = data["cookid"]
    useragent = data["useragent"]
    if cookie == "":
        print("Please edit set.yaml")
        exit()
 # setting
 URL = "https://nhentai.net/favorites/"
 APIURL = "https://nhentai.net/api/gallery/"
 table = [
    ["id", "name", "tags"]
 ]
 now = 1
 allnumbers = []
 allnames = []
 alltags = []
 locate = locale.getdefaultlocale()[0]
 if locate == "zh_TW":
    language = {
        "nodata": "沒有發現離線資料 抓取中請稍後...",
        "nodata2": "抓取完畢",
        "usedata": "使用離線資料",
        "getdata": "抓取資料中...",
        "403": "403 錯誤，可能被 cloudflare 阻擋，請檢查 cookie 是否正確",
        "nologin": "未登入，請先登入",
        "done": "完成"
    }
 else:
    language = {
        "nodata": "No offline data found, please wait a moment...",
        "nodata2": "Done",
        "usedata": "Use offline data",
        "getdata": "Getting data...",
        "403": "403 error, maby block by cloudflare , please check if the cookie is correct",
        "nologin": "Not login, please login first",
        "done": "Done"
    }
 def banner():
    data = r"               _           _        _         ___  _ \
    _ __   ___| |__  _ __ | |_ __ _(_)        / __\/_\/\   /\ \
    | '_ \ / _ \ '_ \| '_ \| __/ _` | |_____ / _\ //_\\ \ / / \
    | | | |  __/ | | | | | | || (_| | |_____/ /  /  _  \ V /  \
    |_| |_|\___|_| |_|_| |_|\__\__,_|_|     \/   \_/ \_/\_/   \
                                                            "
    print(data)
 def wtfcloudflare(url, method="get", data=None):
    session = requests.Session()
    session.headers = {
        'Referer': "https://nhentai.net/login/",
        'User-Agent': useragent,
        'Cookie': cookie,
        'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7,zh-CN;q=0.6',
        'Accept-Encoding': 'gzip, deflate',
    }
    if method == "get":
        r = session.get(url)
    elif method == "post":
        r = session.post(url, data=data)
    r.encoding = 'utf-8'
    return r
 def wtfcloudflare_t(url, method="get", data=None, useragent=None, cookie=None):
    session = requests.Session()
    session.headers = {
        'Referer': "https://nhentai.net/login/",
        'User-Agent': useragent,
        'Cookie': cookie,
        'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7,zh-CN;q=0.6',
        'Accept-Encoding': 'gzip, deflate',
    }
    if method == "get":
        r = session.get(url, stream=True)  # Add stream=True for large/binary files
    elif method == "post":
        r = session.post(url, data=data, stream=True)  # stream=True for binary data
    r.raise_for_status()  # Check for request errors
    return r
 def check_pass():
    res = wtfcloudflare("https://nhentai.net/")
    if res.status_code == 403:
        print(language["403"])
        exit()
 url_list  = []
 def build_id_list():
    # Open and read the CSV file
    with open('output.csv', 'r', encoding='utf-8-sig') as file:
        reader = csv.DictReader(file)
        # Print out the headers to debug the issue
        print(reader.fieldnames)  # This will show the exact header names
        # Iterate over each row in the CSV
        for row in reader:
            # Check if 'id' exists in the row, and if not, print the row for debugging
            if 'id' in row:
                formatted_url = f"https://nhentai.net/g/{row['id']}/download"
                url_list.append(formatted_url)
            else:
                print(f"Row without 'id': {row}")
 banner()
 check_pass()
 build_id_list()
 def get_torrents():
    with open('set.yaml', 'r') as f:
        data = yaml.load(f, Loader=yaml.CLoader)
        cookie = data["cookid"]
        useragent = data["useragent"]
        if cookie == "":
            print("Please edit set.yaml")
            exit()
    for url in url_list:
        torrent_url = url
        # Call wtfcloudflare to download the torrent file
        response = wtfcloudflare_t(torrent_url, useragent=useragent, cookie=cookie)
        # Extract the ID from the URL for naming the file
        torrent_id = url.split('/')[4]  # The ID is in the 4th segment of the URL
        # Define the output directory and file name
        output_dir = "torrents"
        os.makedirs(output_dir, exist_ok=True)  # Create the directory if it doesn't exist
        torrent_path = os.path.join(output_dir, f"{torrent_id}.torrent")
        # Save the torrent file to disk
        with open(torrent_path, 'wb') as torrent_file:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:  # Filter out keep-alive chunks
                    torrent_file.write(chunk)
        print(f"Downloaded torrent: {torrent_path}")
 if __name__ == '__main__':
    get_torrents()
Author	SHA1	Message	Date
diamante0018	9409295f39	maint: best code, literally the best	2024-09-05 13:41:30 +02:00
phillychi3	4588205273	remove br	2024-05-08 12:49:59 +08:00
phillychi3	e2c21e2701	encoding	2024-05-08 01:50:22 +08:00
phillychi3	b9e3495bec	feat: add nogin check	2024-02-22 19:49:25 +08:00
phillychi3	884ed824e3	feat: update readme	2024-02-22 19:21:58 +08:00
phillychi3	bbfab783b5	fix #5	2024-02-22 18:55:21 +08:00
phillychi3	55d27d0402	edit readme	2023-03-12 15:52:41 +08:00
phillychi3	7ce50dde93	edit readme	2023-03-12 15:47:49 +08:00
phillychi3	dc2d4fe4cf	fix all	2023-03-12 15:37:08 +08:00
phillychi3	ae0bdf820f	should can be work	2023-03-12 15:11:11 +08:00