Can my tools build tools? Pre-rendering web pages with help from ChatGPT

Published: 2024-07-15 08:00

We're all trying to work out the implications of generative AI. We've all seen examples of it building websites and coding a version of the snake game. These seem impressive until we realize there are millions of websites and thousands of versions of snake, so it's not too surprising tools like ChatGPT can regurgitate them. After all, it's easy to replicate something that's well-understood?

How about a more interesting challenge? How well can these tools build something new? I decided to give this a try. I needed a tool to use with my blog site, davehudson.io. How well would ChatGPT be able to help?

A quick aside

Humans have a long history of building tools. About 2.6 million years ago, in the early Stone Age, our ancestors began creating simple stone implements to aid in hunting, gathering, and daily survival. I daresay stone engineering was quite satisfying at the time, but these days our most prolific tool builders are software engineers.

Most software takes the form of tooling. It's intended to be created once and used many times. We could almost say software engineering is the pursuit of finding ways to build ever better, more reliable, and less expensive tools.

The problem statement

I’ve been rebuilding my blog site (formerly hashingit.com), and this time I decided to do it from first principles. I like to do slightly mad low-level things sometimes!

The blog site is a single page app (SPA) and is 99% TypeScript code (we'll save discussions on why for another time), but I realized that doesn't play nicely with web crawlers, or social media sharing. As I'd like people to find things I write, and potentially share them, I needed to come up with a way to serve up initial static versions of any page on the site before the TypeScript code could take over.

First steps

As I’ve increasingly done for the last 6 months I asked my sidekick, ChatGPT, how to solve this problem. It suggested I needed to pre-render these pages.

It also recommended I should follow the crowd and serve up the pre-rendered pages to crawlers and bots but serve the dynamic version to humans. While apparently common practice, that seemed a bit clunky, so I decided I'd serve the static version to everyone but then let the dynamic code take over after that first page load thus any first page is the only page you “load”).

It next, helpfully, identified some projects or commercial services that might do what I needed, but my intuition was I could do something local, fast, and free (except for my time). Being able to work offline was a bonus because I spend a lot of time on trains and planes and Internet access is awful on both.

30 minutes and code

Here's my initial prompt in a new session:

How do I use google chrome to pre-render my single page app website?

And the first reply:

Pre-rendering a single-page application (SPA) using Google Chrome involves generating static HTML content that can be served to search engines and users to improve SEO and performance. Here are the steps to pre-render your SPA:

1. Use a Pre-rendering Service

Pre-rendering services can handle the task for you. These services visit your SPA, render it using a headless browser, and save the resulting HTML. Popular services include Prerender.io, Rendertron, and others.

2. Use Headless Chrome with Puppeteer

Puppeteer is a Node.js library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Here's a basic example of using Puppeteer to pre-render your SPA:

...goes on to give some code fragments.

I like to build things - option 2 was the only way to go!

The chat went on for about half an hour with me asking questions and suggesting things I'd like to see, and with ChatGPT writing all the code. We ended up with this version:

import puppeteer from 'puppeteer';
import fs from 'fs';
import path from 'path';
import {fileURLToPath} from 'url';
import {dirname} from 'path';
import {parseStringPromise} from 'xml2js';

// Get the current module path
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);

// Define the path to your sitemap.xml file
const sitemapPath = path.join(__dirname, 'sitemap.xml');
const outputDir = path.join(__dirname, 'prerendered');
const localBaseUrl = 'http://localhost:3000';

// Utility function to create directories recursively
const ensureDirectoryExistence = (filePath) => {
    const dirname = path.dirname(filePath);
    if (fs.existsSync(dirname)) {
        return true;
    }
    ensureDirectoryExistence(dirname);
    fs.mkdirSync(dirname);
};

(async () => {
    // Read and parse the sitemap.xml file
    const sitemapData = fs.readFileSync(sitemapPath, 'utf8');
    const sitemap = await parseStringPromise(sitemapData);

    // Extract URLs from the sitemap
    const urls = sitemap.urlset.url.map(entry => entry.loc[0]);

    // Launch Puppeteer
    const browser = await puppeteer.launch();

    for (const url of urls) {
        // Replace the base URL with the local base URL
        const localUrl = url.replace(/^https?:\/\/[^\/]+/, localBaseUrl);

        const page = await browser.newPage();
        await page.goto(localUrl, {waitUntil: 'networkidle0'});

        // Get the content of the page
        const html = await page.content();

        // Generate the file path based on the original URL path
        const urlPath = new URL(url).pathname;
        const filePath = path.join(outputDir, urlPath, 'index.html');

        // Ensure the directory exists
        ensureDirectoryExistence(filePath);

        // Save the HTML content to the file
        fs.writeFileSync(filePath, html);

        // Close the page to free up resources
        await page.close();
    }

    // Close the browser
    await browser.close();
})();

Initial software developed using ChatGPT 4o

It has a few rather quirky features because I wanted to render my site using a local express.js server and not the one that hosts my live blog, so it does some translation of the sitemap.xml file to find the correct URL to render.

Let's make this more interesting!

We could almost declare victory at this point. To render 29 pages took 47 seconds on my M1 MacBook Air.

I'm fairly impatient though, and hate waiting for builds. So, I upped the ante a little and asked for a parallelized version. A little back and forth and we ended up with this:

import puppeteer from 'puppeteer';
import fs from 'fs';
import path from 'path';
import {fileURLToPath} from 'url';
import {dirname} from 'path';
import {parseStringPromise} from 'xml2js';

// Get the current module path
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);

// Define the path to your sitemap.xml file
const sitemapPath = path.join(__dirname, 'sitemap.xml');
const outputDir = path.join(__dirname, 'prerendered');
const localBaseUrl = 'http://localhost:3000';
const maxConcurrentRenders = 4;

// Utility function to create directories recursively
const ensureDirectoryExistence = (filePath) => {
    const dirname = path.dirname(filePath);
    if (fs.existsSync(dirname)) {
        return true;
    }
    ensureDirectoryExistence(dirname);
    fs.mkdirSync(dirname);
};

(async () => {
    // Read and parse the sitemap.xml file
    const sitemapData = fs.readFileSync(sitemapPath, 'utf8');
    const sitemap = await parseStringPromise(sitemapData);

    // Extract URLs from the sitemap
    const urls = sitemap.urlset.url.map(entry => entry.loc[0]);

    // Launch Puppeteer
    const browser = await puppeteer.launch();

    // Helper function to render a single page
    const renderPage = async (url) => {
        const localUrl = url.replace(/^https?:\/\/[^\/]+/, localBaseUrl);
        const page = await browser.newPage();
        await page.goto(localUrl, {waitUntil: 'networkidle0'});
        const html = await page.content();
        const urlPath = new URL(url).pathname;
        const filePath = path.join(outputDir, urlPath, 'index.html');
        ensureDirectoryExistence(filePath);
        fs.writeFileSync(filePath, html);
        await page.close();
    };

    // Process URLs in batches of maxConcurrentRenders
    for (let i = 0; i < urls.length; i += maxConcurrentRenders) {
        const batch = urls.slice(i, i + maxConcurrentRenders).map(url => renderPage(url));
        await Promise.all(batch);
    }

    // Close the browser
    await browser.close();
})();

Final software developed using ChatGPT 4o

This one could complete in about 12 seconds. Now we were getting close to what I wanted! A manual tweak to run 16 in parallel and rendering was just under 6 seconds. I'll take that as a huge win!

Total elapsed time - about 2 hours.

A sign of things to come

Just like a human engineer, ChatGPT made mistakes. It needed dialogue and questioning to keep it on the right path, and it worked better if you could give it a way to assess if it's doing the right things (that's a subject for another time).

As the code got longer, it was also a little irritating in wanting to give me all the code all the time, when I only wanted a small section changing.

However, I got an effective solution to my problem in 2 hours rather than 2 days. Left unassisted, I'd have probably still be reading docs at the 2 hours point.

This was seriously impressive.

I've been building software for a very long time. That included many years building compilers and code generators, and it's very clear we're somewhere we've not been before.

I read Fred Brooks', “The Mythical Man Month” almost 30 years ago. “No Silver Bullet - Essence and Accident in Software Engineering” was the chapter that has resonated with me for all that time. It posed that in the 10 years after it was written there wouldn't be a single technology that would give a 10x improvement. That has remained true for almost 50 years. I have a feeling that 10x barrier may finally be about to break.