Skip to content

Instantly share code, notes, and snippets.

@davestevens
Created February 19, 2021 17:01
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save davestevens/4257bbfc82b1e59eeec7085e66314215 to your computer and use it in GitHub Desktop.
Save davestevens/4257bbfc82b1e59eeec7085e66314215 to your computer and use it in GitHub Desktop.
Fetching all comments from a Reddit post

Reddit API

I could not find a simple example showing how to consume the Reddit API to get the comments for a post. This shows step by step how to authenticate and the endpoints to request.

Setup

You need to create an app to get an app id and secret. Go to https://www.reddit.com/prefs/apps and at the bottom of the page under "developed applications" create a new app or view a current one.

If you click "edit" to open the app view you will be able to see the app id and secret, these are required for authentication. NB: id is displayed under the app title/description, it is not obvious that this is the id you require

Authentication

For simple script authentication you can make requests to get an access_token by including your username and password in the request, you need to Base64 encode the id and secret to be used for the basic auth.

Docs: https://github.com/reddit-archive/reddit/wiki/OAuth2

Request

const id = "app-id";
const secret = "app-secret";
const basicAuth = Buffer.from(`${id}:${secret}`).toString("base64");
const username = "username";
const password = "password";

const params = new URLSearchParams();
params.append("grant_type", "password");
params.append("username", username);
params.append("password", password);

fetch("https://www.reddit.com/api/v1/access_token", {
    method: "POST",
    headers: {
        Authorization: `Basic ${basicAuth}`
    },
    body: params
})

Response

{
    "access_token": "some-access-token",
    "token_type": "bearer",
    "expires_in": 3600,
    "scope": "*"
}

You then use access_token for all subsequent requests.

{
    headers: {
        Authorization: `Bearer ${access_token}`
    }
}

Get Comments

Getting comments for a post is done using the post id, there are different options which can be used for sorting/limiting.

For example, with this post: https://www.reddit.com/r/Showerthoughts/comments/lmj453/people_who_jog_on_the_roads_in_the_dark_wearing/ the id is lmj453.

Docs: https://www.reddit.com/dev/api/#GET_comments_{article}

Request

const postId = "lmj453";
const sort = "old";
const threaded = false;

fetch(`http://oauth.reddit.com/comments/${postId}?sort=${sort}&threded=${threaded}`, {
    method: "GET",
    headers: {
        Authorization: `Bearer ${access_token}`
    }
})

Response

A lot of the fields which are retured are ignored here, its only really focusing on the structure.

[
    {
        "kind": "Listing",
        "data": {
            /// This is the Post information
        }
    },
    {
        "kind": "Listing",
        "data": {
            "children": [
                {
                    "kind": "t1",
                    "data": {
                        "author": "Post Author",
                        "body": "Post Body",
                        "id": "Post Id",
                        "parent_id": "Parent Id",
                        "...": "..." /// More info here
                    }
                },
                {
                    "kind": "more",
                    "data": {
                        "count": 1234,
                        "children": [
                            "child-id-1",
                            "child-id-2",
                            "child-id-3"
                        ],
                        "...": "..." /// More info here
                    }
                }
            ]
        }
    },
]

Get All Comments

To get all of the comments for a post you need to call the get comments above, and then use the morechildren api to fetch everything. You will see in the response above { "kind": "more" }, these kinds hold the data required to fetch all comments.

Docs: https://www.reddit.com/dev/api/#GET_api_morechildren

Request

const linkId = `t3_${postId}`; // This is the post id of the original request, the t3_ prefix signifies post
const children = "child-id-1,child-id-2,child-id-3"; // A comma separated list of child ids from "more" data NB: this can only be 100 children long

fetch(`http://oauth.reddit.com/api/morechildren?link_id=${linkId}&children=${children}&api_type=json`, {
    method: "GET",
    headers: {
        Authorization: `Bearer ${access_token}`
    }
})

Response

{
    "json": {
        "errors": [],
        "data": {
            "things": [
                {
                    "kind": "t1",
                    "data": {
                        "author": "Post Author",
                        "body": "Post Body",
                        "id": "Post Id",
                        "parent_id": "Parent Id",
                        "...": "..." /// More info here
                    }
                }
            ]
        }
    }
}
const fetch = require("node-fetch");
const { URLSearchParams } = require("url");
// Add credentials here
const id = "";
const secret = "";
const username = "";
const password = "";
const extractComments = (child, comments, more) => {
switch (child.kind) {
case "t1":
comments.push({
body: child.data.body,
author: child.data.author,
id: child.data.id,
parent_id: child.data.parent_id,
});
break;
case "more":
if (child.data.count > 0) {
more.push(child.data.children);
}
break;
}
};
const auth = async () => {
const basicAuth = Buffer.from(`${id}:${secret}`).toString("base64");
const params = new URLSearchParams();
params.append("grant_type", "password");
params.append("username", username);
params.append("password", password);
const res = await fetch("https://www.reddit.com/api/v1/access_token", {
method: "POST",
headers: {
Authorization: `Basic ${basicAuth}`,
},
body: params,
});
const body = await res.json();
return body.access_token;
};
const getPost = async (postId, access_token) => {
const sort = "old";
const threaded = false;
const res = await fetch(
`http://oauth.reddit.com/comments/${postId}?sort=${sort}&threaded=${threaded}`,
{
method: "GET",
headers: {
Authorization: `Bearer ${access_token}`,
},
}
);
const body = await res.json();
const comments = [];
const more = [];
body[1].data.children.forEach((child) => {
extractComments(child, comments, more);
});
return { comments, more };
};
const getMoreChildren = async (linkId, children, access_token) => {
const res = await fetch(
`http://oauth.reddit.com/api/morechildren?link_id=${linkId}&children=${children}&api_type=json`,
{
method: "GET",
headers: {
Authorization: `Bearer ${access_token}`,
},
}
);
const body = await res.json();
const comments = [];
const more = [];
body.json.data.things.forEach((thing) => {
extractComments(thing, comments, more);
});
return { comments, more };
};
const getAllComments = async (postId) => {
const token = await auth();
const { comments, more } = await getPost(postId, token);
while (more.length) {
const current = more.shift();
const selection = current.splice(0, 100); // NOTE: We can only query 100 at a time
if (current.length) {
more.push(current);
}
const { comments: moreComments, more: moreMore } = await getMoreChildren(
`t3_${postId}`,
selection.join(","),
token
);
comments.push(...moreComments);
if (moreMore.length) {
more.push(moreMore);
}
}
return comments;
};
getAllComments("lmj453").then((comments) => {
console.log(comments);
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment