siterender

siterender is a Node.js application that renders web pages listed in a sitemap and saves the rendered HTML content to a specified output directory. This tool is particularly useful for static site generation, web scraping, and ensuring content is pre-rendered for SEO and social media sharing purposes.

The application is unusual as all the code was "written" by ChatGPT 4o. For more about the concept, please take a look at: Maximal Instruction Prompting: a strategy for software development with LLMs (2024-08-06).

Features

  • Fetches and parses sitemaps from URLs or local files.
  • Supports sitemaps and sitemap indexes (nested sitemaps).
  • Replaces URL prefixes based on specified rules.
  • Renders pages in parallel using Puppeteer.
  • Parallelizes rendering operations for maximum speed/throughput.
  • Saves rendered HTML content to a specified output directory.
  • Retry mechanism for rendering and browser launch/close operations.

License

The software is released under a BSD 3-Clause license.

Installation

Before using siterender, ensure you have Node.js installed. You can install the dependencies by running:

npm install

Usage

The script can be executed from the command line with various options:

node siterender.js [options]

Options

  • --sitemap-file <path> - Path to the local sitemap file (conflicts with --sitemap-url)
  • --sitemap-url <url> - URL of the sitemap file (conflicts with --sitemap-file)
  • --replace-url <new=old> - Replace URL prefixes in the form "new=old"
  • --output <path> - Output directory (required)
  • --parallel-renders <number> - Number of parallel renders (default is the number of CPU cores)
  • --max-retries <number> - Max retries for rendering a page (default is 3)
  • -h, --help - Show this message

Examples

Render from a Sitemap URL:

node siterender.js --sitemap-url https://example.com/sitemap.xml --output ./output

Render from a Local Sitemap File:

node siterender.js --sitemap-file ./sitemap.xml --output ./output

Replace URL Prefix:

node siterender.js --sitemap-url https://example.com/sitemap.xml --replace-url "https://newdomain.com=https://olddomain.com" --output ./output

Specify Parallel Renders and Max Retries:

node siterender.js --sitemap-url https://example.com/sitemap.xml --output ./output --parallel-renders 4 --max-retries 5

How it works

  1. Fetch Sitemap: The sitemap is fetched from a URL or read from a local file.
  2. Parse Sitemap: The XML content of the sitemap is parsed to extract URLs.
  3. URL Replacement: If a replace rule is provided, URLs are modified accordingly.
  4. Render Pages: Each URL is rendered. Pages are rendered in parallel based on the specified number of parallel renders.
  5. Save Content: The rendered HTML content is saved to the specified output directory, maintaining the directory structure of the URLs.
  6. Retry Mechanism: The script includes retry logic for rendering pages and launching/closing the browser to handle transient errors.

Building the software

The software is built using make. Yes, this is a little unusual for the JavaScript world, but it's not going to change!

To build:

make

Testing

The core logic of application is supported by tests, implemented using Jest.

To run the tests:

make test

Contributing

The aim of the project is to see how far we can go with having ChatGPT build the software.

Please feel free to submit PRs against the prompt, or to suggest prompts that will generate a code change you might like to see. While you can submit PRs against the code, these won't be merged directly, but can serve to improve the prompt that allows the code to be evolved.

Source code

The source code can be found on GitHub: https://github.com/dave-hudson/siterender