Skip to content

Instantly share code, notes, and snippets.

@simonw
Created May 22, 2023 20:16
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save simonw/ac671a9bdbda9f29dc5256511497bc1d to your computer and use it in GitHub Desktop.
Save simonw/ac671a9bdbda9f29dc5256511497bc1d to your computer and use it in GitHub Desktop.

HTML Table to JSON

May 22, 2023 at 12:34:10 PM

system:

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. Knowledge cutoff: 2021-09 Current date: 2023-05-22


Prompt:

<table>
<thead>
<tr>
<th align="left">ISO 693-3</th>
<th align="left">Language Name</th>
<th align="left">Language Family</th>
<th align="left">Subgrouping</th>
<th align="left">ASR</th>
<th align="left">TTS</th>
<th align="left">LID</th>
<th align="left"></th>
</tr>
</thead>
<tbody><tr>
<td align="left">aaa</td>
<td align="left">Ghotuo</td>
<td align="left">Niger-Congo</td>
<td align="left">Atlantic-Congo</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">✔️</td>
<td align="left"><a href="https://www.ethnologue.com/language/aaa">Ethnologue</a></td>
</tr>
<tr>
<td align="left">aab</td>
<td align="left">Arum</td>
<td align="left">Niger-Congo</td>
<td align="left">Atlantic-Congo</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">✔️</td>
<td align="left"><a href="https://www.ethnologue.com/language/aab">Ethnologue</a></td>
</tr>

Write JavaScript to turn this table into newline-delimited JSON objects


ChatGPT:

Here's a JavaScript function which can parse HTML tables into newline-delimited JSON objects. This script uses the browser's built-in DOMParser to parse the HTML, and then it walks the DOM to build up the JSON objects:

function tableToJson(table) {
    var parser = new DOMParser();
    var doc = parser.parseFromString(table, "text/html");
    var rows = doc.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            if (cols[j].textContent.trim() === '✔️') {
                obj[keys[j]] = true;
            } else if (cols[j].querySelector('a')) {
                obj[keys[j]] = cols[j].querySelector('a').href;
            } else {
                obj[keys[j]] = cols[j].textContent.trim() || null;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

You can use this function like this:

var table = `...`;  // Paste your table here
console.log(tableToJson(table));

Replace '...' with your table's HTML. This script will output a string where each line is a separate JSON object. Each JSON object corresponds to a row in the table. Note that this script assumes that the first row of the table contains the headers.


Prompt:

ASR and TTS and LID should always be boolean false or true


ChatGPT:

Okay, let's adjust the JavaScript function to ensure that ASR, TTS, and LID fields are always either true or false. The presence of the ✔️ character will be translated to true and its absence will be translated to false.

function tableToJson(table) {
    var parser = new DOMParser();
    var doc = parser.parseFromString(table, "text/html");
    var rows = doc.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            switch(keys[j]) {
                case 'ASR':
                case 'TTS':
                case 'LID':
                    obj[keys[j]] = cols[j].textContent.trim() === '✔️';
                    break;
                case '':
                    obj[keys[j]] = cols[j].querySelector('a').href;
                    break;
                default:
                    obj[keys[j]] = cols[j].textContent.trim() || null;
                    break;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

Now, the columns labeled 'ASR', 'TTS', and 'LID' are guaranteed to be boolean values. If the cell contains a ✔️, the corresponding field in the JSON object will be true, otherwise false.


Prompt:

Make the input to that function a DOM table object


ChatGPT:

You're right, the input would be more appropriately a DOM table object, not an HTML string. Here's a revised version of the function:

function tableToJson(tableElement) {
    var rows = tableElement.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            switch(keys[j]) {
                case 'ASR':
                case 'TTS':
                case 'LID':
                    obj[keys[j]] = cols[j].textContent.trim() === '✔️';
                    break;
                case '':
                    obj[keys[j]] = cols[j].querySelector('a').href;
                    break;
                default:
                    obj[keys[j]] = cols[j].textContent.trim() || null;
                    break;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

You can use this function like this:

var tableElement = document.querySelector("table"); // Select the table in your DOM
console.log(tableToJson(tableElement));

This script will output a string where each line is a separate JSON object. Each JSON object corresponds to a row in the table. Note that this script assumes that the first row of the table contains the headers.


Prompt:

Uncaught TypeError: cols[j].querySelector(...) is null


ChatGPT:

I apologize for the error, it seems that there is an issue when the table cell does not contain a link. Let's revise the code to handle this case:

function tableToJson(tableElement) {
    var rows = tableElement.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            switch(keys[j]) {
                case 'ASR':
                case 'TTS':
                case 'LID':
                    obj[keys[j]] = cols[j].textContent.trim() === '✔️';
                    break;
                case '':
                    var link = cols[j].querySelector('a');
                    obj[keys[j]] = link ? link.href : null;
                    break;
                default:
                    obj[keys[j]] = cols[j].textContent.trim() || null;
                    break;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

With this adjustment, the function now checks whether a link exists in the last column before attempting to access its href attribute. If no link exists, it assigns null to the corresponding field in the JSON object.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment