Skip to content

Instantly share code, notes, and snippets.

@Kas-tle
Last active February 16, 2023 05:20
Show Gist options
  • Save Kas-tle/249d73f9f73ae43aa64413ac0ee49a37 to your computer and use it in GitHub Desktop.
Save Kas-tle/249d73f9f73ae43aa64413ac0ee49a37 to your computer and use it in GitHub Desktop.
Scraping an Enjin Site via the Enjin API

Scraping an Enjin Site via the Enjin API

Authentication

Obtain a Session ID

First, you must obtain a Session ID. This can be done by logging in to the site with email and password. The Session ID is contained in the session_id field of the response. The session_id field is contained in the result object.

Request

Set the FORUM_DOMAIN variable to the domain of the forum you wish to scrape.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "email": "XXX@XXX.XXX",
        "password": "XXX"
	},
	"method":"User.login"
}

Response

The Session ID is contained in the session_id field of the response. The session_id field is contained in the result object.

{
  "result": {
    "session_id": "{{$SessionID}}",
    //...
    },
    "id": "12345",
    "jsonrpc": "2.0"
}

Forums

Obtain Forum IDs

Before obtaining individual forum IDs, you must obtain the Module ID for the forum module (herin {{$ForumModuleID}}) you wish to scrape. This can be obtained in the admin panel of your site under "Modules". Using the left side panel, you can filter to the type "Forum Board". Make a list of the Module IDs you wish to scrape. You can then follow this process for each module.

Request

Send a request to the Forum.getCategoriesAndForums method. The preset_id field is the Module ID of the forum module you wish to scrape. The session_id field is the Session ID you obtained in the previous step.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "preset_id": "{{$ForumModuleID}}",
        "session_id": "{{$SessionID}}"
	},
	"method":"Forum.getCategoriesAndForums"
}

Response

Forum IDs are contained in the forum_id field of the response. The forum_id field is contained in the categories object, which is contained in the result object. The categories object contains a list of categories, which are objects containing a list of forums, which are objects containing the forum_id field. The categories object is keyed by the category ID, and the forums are keyed by the forum ID. The subforums object contains a list of subforums, which are lists of forums, which are objects containing the forum_id field. The subforums object is keyed by the forum ID.

This should comprise all Forum IDs for the module.

{
  "result": {
    "subforums": {
      "4511155": [
        {
            //...
            "forum_id": "0000000"
            //...
        }
      ]
    },
    "categories": {
      "0000000": {
        "0000001": {
            //...
            "forum_id": "0000001"
        },
        "0000002": {
            //...
            "forum_id": "0000002"
            //...
        }
      },
      "0000001": {
        "0000003": {
            //...
            "forum_id": "0000003"
            //...
        }
      }
    }
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

Obtain Thread IDs

Once you have scraped the Forum IDs, these can be used to scrape the Thread IDs. Starting at this point in the API, we now have to worry about the responses being paginated. Luckily, the API provides a pages field in the response, which tells us how many pages there are. For each response, we can store this value and test if it is greater than the page value we are about to request. We can then loop through each page, and scrape the Thread IDs from each page.

Request

Send a request to the Forum.getForum method. The forum_id field is the Forum ID you wish to scrape. The session_id field is the Session ID you obtained in the previous step. The page field is the page number you wish to scrape.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "session_id": "{{$SessionID}}",
        "forum_id": "{{$ForumID}}",
        "page": "1"
	},
	"method":"Forum.getForum"
}

Response

Thread IDs are contained in the thread_id field of the response. The thread_id field is contained in the threads object, which is contained in the result object. The threads object contains a list of threads, which are objects containing the thread_id field. The sticky object contains a list of sticky threads, which are objects containing the thread_id field.

Be sure to continue to scrape the next page until the pages field is equal to the page field.

{
  "result": {
    "sticky": [
      {
        "thread_id": "0000000"
        //...
      }
    ],
    "threads": [
      {
        "thread_id": "0000001"
        //...
      },
      {
        "thread_id": "0000002"
        //...
      }
    ],
    "page": "1",
    "pages": 1
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

Obtain Threads

Once you have scraped the Thread IDs, these can be used to scrape the Thread data. This process will need to be repeated for each obtained Thread ID.

These responses will also be paginated. Again, the API provides a pages field in the response, which tells us how many pages there are. For each response, we can store this value and test if it is greater than the page value we are about to request. We can then loop through each page, and scrape the Thread data from each page.

Request

Send a request to the Forum.getThread method. The thread_id field is the Thread ID you wish to scrape. The session_id field is the Session ID you obtained in the previous step. The page field is the page number you wish to scrape.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "session_id": "{{$SessionID}}",
        "thread_id": "{{$ThreadID}}",
        "page": "1"
	},
	"method":"Forum.getThread"
}

Response

Be sure to continue to scrape the next page until the pages field is equal to the page field.

It would likely be advisable to download any assets contained in the post as well, as those hosted on Enjin servers will likely be deleted when the site goes down.

{
  "result": {
    "thread": {
      //...
    },
    "posts": [
      {
        "post_id": "000000001",
        "post_time": "1500574914",
        "post_content": "...",
        "post_content_html": "...",
        "post_content_clean": "...",
        "post_user_id": "00000001",
        "show_signature": "0",
        "last_edit_time": "0",
        "post_votes": "0",
        "post_unhidden": "0",
        "post_admin_hidden": "0",
        "post_locked": "0",
        "last_edit_user": "0",
        "votes": null,
        "post_username": "...",
        "avatar": "https:\/\/assets-cloud.enjin.com\/users\/00000001\/avatar\/medium.00000001.jpeg",
        "user_online": false,
        "user_votes": "0",
        "user_posts": "8",
        "url": "https:\/\/www.enjin.com\/ajax.php?s=redirect&cmd=forum-post&mobile=1&preset=00000001&id=00000001"
      }
    ],
    "total_items": "2",
    "pages": 1
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

News

Obtain News Posts

Before obtaining news posts, you must obtain the Module ID for the forum module (herin {{$NewsModuleID}}) you wish to scrape. This can be obtained in the admin panel of your site under "Modules". Using the left side panel, you can filter to the type "News / Blog". Make a list of the Module IDs you wish to scrape. You can then follow this process for each module.

These responses will also be paginated. The responses here, however, do not contain a page value. Instead, for each response, we must test the length of result[]. If this is of length 0, we have reached the end of the news posts. We should then delete the most recent response since it contains no useful data. If it is greater than 0, we can continue to scrape the next page and save the current response.

Request

Send a request to the News.getNews method. The preset_id field is the Module ID you wish to scrape. The session_id field is the Session ID you obtained in the previous step. The page field is the page number you wish to scrape.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "session_id": "{{$SessionID}}",
        "preset_id": "{{$NewsModuleID}}",
        "page": "1"
	},
	"method":"News.getNews"
}

Response

Be sure to continue to scrape the next page until the result[] field is of length 0.

{
  "result": [
    {
      "preset_id": "{{$NewsModuleID}}",
      "article_id": "0000001",
      "user_id": "0000001",
      "num_comments": "0",
      "timestamp": "1347150882",
      "status": "1",
      "title": "...",
      "content": "...",
      "commenting_mode": "0",
      "ordering": "101",
      "sticky": "0",
      "last_updated": null,
      "username": "",
      "displayname": "..."
    }
  ],
  "id": "12345",
  "jsonrpc": "2.0"
}

Tickets

Obtain Modules

For whatever reason, unlike all other module types, the Enjin API provides a method to obtain all ticket modules for a given site. This makes our job slightly easier, as we do not have to manually obtain these IDs from the admin panel. We can instead obtain them from the endpoint, and then obtain the tickets for each module.

Obtaining the tickets can only be done with a site API key (herein {{$APIKey}}). Per Enjin's documentation, this is found in the same place in which the API is enabled.

To enable your API, visit your admin panel / settings / API area. The content on this page includes your base API URL, your secret API key, and the API mode. Ensure that the API mode is set to "Public".

Request

Send a request to the Tickets.getModules method. The api_key field is the API key you obtained in the previous step.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "api_key": "{{$APIKey}}"
	},
	"method":"Tickets.getModules"
}

Response

The Ticket Module IDs are listed twice, once as the keys of the result field, and once as the preset_id of each member of ...questions[].

{
  "result": {
    "{{$TicketModuleID}}": {
      "module_name": "...",
      "questions": [
        {
          "id": "0001",
          "site_id": "{{$SiteID}}",
          "preset_id": "{{$TicketModuleID}}",
          "type": "text",
          "label": "...",
          "required": "1",
          "bold": "0",
          "help_text": "...",
          "order": "0",
          "other_options": {
            "bbcode": "0",
            "lines": "4",
            "min": "1",
            "max": "100"
          },
          "options": null,
          "conditions": null,
          "condition_qualify": "all_true",
          "system": "1"
        }
      ]
    }
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

Obtain Tickets

Using each obtained Ticket Module ID, the tickets for each module can be obtained. These responses will also be paginated. The responses here do contain a page value. We can simply continue until our page value is greater than the total number of pages.

Request

Send a request to the Tickets.getTickets method. The preset_id field is the Module ID you wish to scrape. The session_id field is the Session ID you obtained in the previous step. The page field is the page number you wish to scrape.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
            "session_id": "{{$SessionID}}",
            "preset_id": "{{$TicketModuleID}}",
	    "status": "all",
            "page" : "1"
	},
	"method":"Tickets.getTickets"
}

Response

Be sure to continue until result.pagination.last_page is less than the page you are about to scrape.

{
  "result": {
    "results": [
      {
        "id": "0000001",
        "code": "a3d65661",
        "site_id": "{{$SiteID}}",
        "preset_id": "{{$NewsModuleID}}",
        "subject": "...",
        "created": "1673888558",
        "status": "open",
        "assignee": "00000001",
        "updated": "1673888558",
        "requester": "00000002",
        "priority": "1",
        "extra_questions": "...",
        "status_change": "1673888558",
        "email": null,
        "viewers": false,
        "createdHTML": "8 hours ago",
        "updatedHTML": "8 hours ago",
        "requesterHTML": "...",
        "assigneeText": "...",
        "assigneeHTML": "...",
        "priority_name": "Low",
        "replies_count": 0,
        "private_reply_count": 0
      }
    ],
    "pagination": {
      "page": "1",
      "nr_pages": 4,
      "nr_results": "92",
      "first_page": 1,
      "last_page": 4
    }
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

Applications

Obtain Application Types

Before obtaining applications, you must obtain the possible application types. This is the only way to ensure all Application IDs are later obtained, as this endpoint requires an input type.

Request

Send a request to the Applications.getTypes method. No paramters are needed for this endpoint.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
	},
	"method":"Applications.getTypes"
}

Response

The types used later will be the keys listed under result, not the values.

{
  "result": {
    "open": "Open",
    "approved": "Approved",
    "rejected": "Rejected",
    "general": "General",
    "archive": "Archive",
    "trash": "Trash",
    "my-applications": "My Apps"
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

Obtain Site ID

The Site ID (herein {{$SiteID}}) is not required for the Applications.getList endpoint, but it will ensure that we only obtain applications associated with the site we are scraping.

Request

Send a request to the Site.getStats method. No paramters are needed for this endpoint.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
	},
	"method":"Site.getStats"
}

Response

The Site ID is listed under result.latest_user.site_id.

{
  "result": {
    "total_users": "1",
    "latest_user": {
      "site_id": "{{$SiteID}}",
      "user_id": "0000001",
      "access": "2",
      "datejoined": "1673893006",
      "lastseen": "1673897992",
      "post_count": "0",
      "forum_votes": "0",
      "forum_up_votes": "0",
      "forum_down_votes": "0",
      "banned_date": "0",
      "banned_expiration": "0",
      "allow_issue_warnings": null,
      "allow_issue_punishments": null,
      "banned_by": "system",
      "banned_by_id": "0",
      "banned_reason": ""
    }
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

Obtain Application IDs

Once you have the Site ID and the Application Types, you can obtain the Application IDs. To ensure all Application IDs are obtained, this process must be repeated for each application type.

Furthermore, the Applications.getList endpoint is paginated. To ensure all results are obtained, this process must be repeated for each page. The endpoint does not return the total number of pages, so the process must be repeated until the results.items[] is of length 0. This final response of length 0 should then be deleted, and the process should be repeated for the next application type.

Request

Send a request to the Applications.getList method. The session_id is the Session ID obtained earlier. The type is the Application Type obtained earlier. The site_id is the Site ID obtained earlier. The page is the page number, starting at 1.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "session_id": "{{$SessionID}}",
        "type": "{{$type}}",
        "site_id": "{{$SiteID}}",
        "page": "1"
	},
	"method":"Applications.getList"
}

Response

The Application ID is listed at the beginning of each application under result.items[].application_id.

{
  "result": {
    "items": [
      {
        "application_id": "0000001",
        "site_id": "{{$SiteID}}",
        "preset_id": "0000001",
        "title": "...",
        "user_ip": "xxx.xxx.xxx.xxx",
        "is_mine": false,
        "can_manage": true,
        "created": "1668401890",
        "updated": "1671750293",
        "read": true,
        "comments": 0,
        "read_comments": null,
        "app_comments": "1",
        "admin_comments": "1",
        "site_name": "...",
        "user_id": "0000001",
        "is_online": false,
        "admin_online": false,
        "username": "Belle!",
        "avatar": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png",
        "admin_user_id": "0000002",
        "admin_username": "...",
        "admin_avatar": "https:\/\/cravatar.eu\/helmavatar\/autumn_carrots\/74.png",
        "site_logo": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png"
      }
    ],
    "total": "1"
  },
  "id": "12345",
  "jsonrpc": "2.0"
}

Obtain Applications

Using the Application IDs obtained from the previous endpoint, you can obtain the applications themselves. To ensure all applications are obtained, this process must be repeated for each application ID.

Request

Send a request to the Applications.getApplication method. The session_id is the Session ID obtained earlier. The application_id is the Application ID obtained earlier.

POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
	"jsonrpc":"2.0",
	"id":"12345",
	"params":{
        "session_id": "{{$SessionID}}",
        "application_id": "{{$ApplicationID}}"
	},
	"method":"Applications.getApplication"
}

Response

An example response is given below. It should be noted that the fields in user data correspond to fields in the form. The raw HMTL of any form pages should be saved before Enjin goes offline to ensure that the original questions can be correlated with these responses.

{
  "result": {
    "application_id": "{{$ApplicationID}}",
    "site_id": "{{$SiteID}}",
    "preset_id": "0000001",
    "title": "...",
    "user_ip": "xxx.xxx.xxx.xxx",
    "is_mine": false,
    "can_manage": true,
    "created": "1668401890",
    "updated": "1671750293",
    "read": true,
    "comments": 0,
    "app_comments": "1",
    "admin_comments": "1",
    "site_name": "...",
    "user_id": "0000001",
    "is_online": false,
    "username": "...",
    "avatar": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png",
    "admin_user_id": "0000002",
    "admin_online": false,
    "admin_username": "...",
    "admin_avatar": "...",
    "site_logo": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png",
    "user_data": {
      "8szicgnohx": [
        "..."
      ],
      "xy8oih250y": "...",
      "51md9eq5q5": 19,
      "nicd7towfu": "...",
      "aiaypctyj6": [
        "..."
      ],
      "io0odjma55": "...",
      "aor8go2of7": "...",
      "q62x889j70": "...",
      "guysviozjo": [
        "..."
      ],
      "5osw7rr977": "...",
      "vuxwi9zygo": [
        "..."
      ],
      "27s0622qqw": "..."
    },
    "is_archived": false,
    "is_trashed": false,
    "allow_app_comments": "1",
    "post_app_comments": true,
    "allow_admin_comments": true
  },
  "id": "12345",
  "jsonrpc": "2.0"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment