A script for collecting a database of forums. Python

To parsing an excellent base of forums for Xrumer or similar soft, it’s sufficient to find one topic where someone is publishing their own topics (advertisements) and linking their topics on other forums to reinforce them.

Using this script, you can collect their database.

Requirements:

Install the necessary libraries using:

pip install requests beautifulsoup4

Script:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, urljoin
import time

def is_external(url, base_url):
    return urlparse(url).netloc != urlparse(base_url).netloc

def get_links(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')
        links = [a.get('href') for a in soup.find_all('a', href=True)]
        return links
    except requests.RequestException as e:
        print(f"Failed to fetch {url}: {e}")
        return []

def scrape_forums(starting_urls, max_depth=2):
    visited = set()
    external_links = set()
    
    def scrape(url, depth):
        if url in visited or depth > max_depth:
            return
        print(f"Scraping {url} at depth {depth}")
        visited.add(url)
        
        links = get_links(url)
        for link in links:
            full_url = urljoin(url, link)
            if is_external(full_url, url):
                external_links.add(full_url)
            else:
                scrape(full_url, depth + 1)
    
    for url in starting_urls:
        scrape(url, 1)
    
    return external_links

def save_links_to_file(links, filename):
    with open(filename, 'w') as f:
        for link in links:
            f.write(link + '\n')

if __name__ == '__main__':
    starting_urls = [
        # Add your starting forum URLs here
        'http://example-forum.com/topic1',
        'http://example-forum.com/topic2'
    ]
    filename = 'external_links.txt'
    
    external_links = scrape_forums(starting_urls)
    save_links_to_file(external_links, filename)
    
    print(f"Collected {len(external_links)} external links. Saved to {filename}.")

How the script works:

Function get_links:
- Sends a request to the given URL.
- Parses the HTML and collects all the links.
Function is_external:
- Checks if a link is external.
Function scrape_forums:
- Recursively scrapes forums starting from the given URLs.
- Collects all external links.
Function save_links_to_file:
- Saves all collected external links to a text file.
Main part of the script:
- Sets the initial forum URLs.
- Starts the scraping process.
- Saves the collected links to a file.

Instructions for use:

Insert the initial forum URLs into the starting_urls list.
Run the script:

python script_name.py

The collected links will be saved to the external_links.txt file.

This script can be improved and adapted to specific needs, such as more complex parsing rules or error handling.

Comments (0)

There are no comments here yet, you can be the first!

Requirements:

Script:

How the script works:

Instructions for use:

Recent Posts

Comments (0)

Leave a Reply Cancel reply

Choose and Buy Proxy

Datacenter Proxies

Rotating Proxies

UDP Proxies

Trusted By 10000+ Customers Worldwide

All Countries

Mixed Countries

Requirements:

Script:

How the script works:

Instructions for use:

Related posts:

Recent Posts

Comments (0)

Leave a Reply Cancel reply

Choose and Buy Proxy

Datacenter Proxies

Rotating Proxies

UDP Proxies

Trusted By 10000+ Customers Worldwide