Skip to content

Instantly share code, notes, and snippets.

@syntaxi
Last active May 6, 2020 22:16
Show Gist options
  • Save syntaxi/b28ff43c4aa1f18c0b9c0714f516d2fb to your computer and use it in GitHub Desktop.
Save syntaxi/b28ff43c4aa1f18c0b9c0714f516d2fb to your computer and use it in GitHub Desktop.
Deciphering google doc revision data

General

URL's found by loading up a google doc. Opening chrome inspector and going to record network.
Then going to File -> Version History -> See Version History

Revision List

URL form is https://docs.google.com/document/d/<DOCUMENT ID>/revisions/tiles?id=<DOCUMENT ID>&start=1&showDetailedRevisions=false&filterNamed=false&token=<TOKEN>&includes_info_params=true Obtained by recording the network whilst loading revision history. Will return a json of the form seen in "List.json". Expanding a revision gave: https://docs.google.com/document/d/1namGVTADAlbFkF2QdrcHdr1ilxmOt_6dYJ8Wx5PBsTg/revisions/tiles?id=1namGVTADAlbFkF2QdrcHdr1ilxmOt_6dYJ8Wx5PBsTg&start=3&end=26&showDetailedRevisions=true&filterNamed=false&token=AC4w5VjCEgQV5tpwQNS5gYEEJ3__xGqcQA%3A1535553491010&includes_info_params=true with content in "Expanded.json"

Revision data

URL form is https://docs.google.com/document/u/0/d/<DOCUMENT ID>/showrevision?id=<DOCUMENT ID>&end=<START REVISION NUM>&start=<END REVISION NUM> Obtained by monitoring network and selecting a revision in revison history Will return a json of the form seen in "Revisions.json". This json also appears to always start with )]}'

Data in chunkedSnapshot is split up into chunks. Each chunk consists of list of entries. Possible entries are detailed below.

For each 'entry':

  • ty indicates the type
    • as being metadata
    • is being content data
  • st indicates the type of metadata. only appears on metatada entries
    • document is for information about the document as a whole
    • headings is for information about the headings
    • language is for information about the document language
    • paragraph is for information about a specific paragraph
    • text is for information about a specific section of text. Comparable to div?
  • si indicates the character this entry starts at (Not applicable for all metadata)
  • ei indicates the character this entry ends at (Not applicable for all metadata)

For the content data the following keys have been seen:

  • ibi seems to indicate the starting index of that chunk of content

For each of these metadata types, the keys indicated are assumed to be part of sm unless otherwise specified:

  • language data types

    • lgs_l seems to indicate the language code, eg 'en'
  • revision_diff

    • revdiff_aid is the key of the author that made the revision. A null editor is given by ""
    • revdiff_dt sometimes matches revdiff_aid. Possibly indicates addition/removal.
    • Further testing has shown no way to link user ID from this API to that from the offical API. Additionally the colour assigned to each editor is the only unique constant between the revisions themselves and the revision list.
  • text

    • Not all elements may be present.
    • If the element is appended with _i eg, ts_fgc_i it may indicate if the value should be inherited. It's not clear where the value would be inherited from. Possibly this would be from the prior entry with a fallback to the given value if that is not available.
    • ts_fgc is the color of the text. Given as a hex code, eg #000000
    • ts_bgc is the color of the background surrrounding the text (ie, highlight color). Given as a hex code, eg #000000
    • ts_fs is the font size
    • ts_ff is the font face. Given as a font name, eg Arial
    • ts_un appears to be a flag for if the text is underlined
    • ts_it appears to be a flag for if the text is italicised
    • ts_bt appears to be a flag for if the text is bolded
    • ts_st appears to be a flag for if the text is struckthrough
  • headings

    • The top level key here appears to be the level of the heading the following styles apply to.
      • hs_h1, hs_h2, hs_h3, hs_h4, hs_h5, & hs_h6 appear to be heading levels 1 -> 6
      • hs_t appears to be the title
      • hs_nt appears to be the normal text style
      • hs_st appears to be the subtitle style
    • sdef_ts seems to indicate data about the text style
      • ts_fgc is the color of the text. Given as a hex code, eg #000000
      • ts_bgc is the color of the background surrrounding the text (ie, highlight color). Given as a hex code, eg #000000
      • ts_fs is the font size
      • ts_ff is the font face. Given as a font name, eg Arial
      • ts_un appears to be a flag for if the text is underlined
      • ts_it appears to be a flag for if the text is italicised
      • ts_bt appears to be a flag for if the text is bolded
      • ts_st appears to be a flag for if the text is struckthrough
    • sdef_ps appears to indicate data about other stuff
      • ps_hdid appears to be the id for that type of heading. Observed values include h.4wm2lu96oxp8, h.cfkguxvzv5jl & h.pli4mhndnqfr

    More info can probably be gleaned from prettyJson.json

{
"tileInfo": [
{
"start": 3,
"end": 25,
"endMillis": 1535542604486,
"users": [
"06981672446100480936",
"03685920490996254104"
],
"systemRevs": [
],
"expandable": false,
"revisionMac": "Nk5moysGFPv2Jw"
},
{
"start": 26,
"end": 26,
"endMillis": 1535542751265,
"users": [
"03685920490996254104"
],
"systemRevs": [
],
"expandable": false,
"revisionMac": "VfyNdfRG8ucNTg"
}
],
"userMap": {
"06981672446100480936": {
"name": "Quinn Roberts",
"photo": "\/\/lh5.googleusercontent.com\/-xmINQGIqmyc\/AAAAAAAAAAI\/AAAAAAAAFKE\/Zlc10fkL0WE\/s50-c-k-no\/photo.jpg",
"color": "#26A69A",
"anonymous": false
},
"03685920490996254104": {
"name": "Quinn Roberts",
"photo": "\/\/lh4.googleusercontent.com\/-ZBDtBUdxeqo\/AAAAAAAAAAI\/AAAAAAAAAAw\/xcapKP39Xbg\/s50-c-k-no\/photo.jpg",
"color": "#673AB7",
"anonymous": false
}
},
"firstRev": 3
}
{
"tileInfo": [
{
"start": 1,
"end": 1,
"endMillis": 1535542533275,
"users": [
"06981672446100480936"
],
"systemRevs": [
],
"expandable": false,
"revisionMac": "Vnq5tSlhVxxmxA"
},
{
"start": 3,
"end": 26,
"endMillis": 1535542751265,
"users": [
"06981672446100480936",
"03685920490996254104"
],
"systemRevs": [
],
"expandable": true,
"revisionMac": "VfyNdfRG8ucNTg"
},
{
"start": 27,
"end": 31,
"endMillis": 1535552963461,
"users": [
"03685920490996254104"
],
"systemRevs": [
],
"expandable": false,
"revisionMac": "azH8Ly7haPkMXA"
}
],
"userMap": {
"06981672446100480936": {
"name": "Quinn Roberts",
"photo": "\/\/lh5.googleusercontent.com\/-xmINQGIqmyc\/AAAAAAAAAAI\/AAAAAAAAFKE\/Zlc10fkL0WE\/s50-c-k-no\/photo.jpg",
"color": "#26A69A",
"anonymous": false
},
"03685920490996254104": {
"name": "Quinn Roberts",
"photo": "\/\/lh4.googleusercontent.com\/-ZBDtBUdxeqo\/AAAAAAAAAAI\/AAAAAAAAAAw\/xcapKP39Xbg\/s50-c-k-no\/photo.jpg",
"color": "#673AB7",
"anonymous": false
}
},
"firstRev": 1
}
{
"chunkedSnapshot": [
[
{
"ty": "is",
"ibi": 1,
"s": "This is a test line\nThis is my second test"
},
{
"ty": "as",
"st": "document",
"si": 0,
"ei": 0,
"sm": {
"ds_ulhfl": false
}
},
{
"ty": "as",
"st": "headings",
"si": 0,
"ei": 0,
"sm": {
"hs_h3": {
"sdef_ps": {
"ps_sb_i": false,
"ps_sb": 16
},
"sdef_ts": {
"ts_fgc": "#434343",
"ts_fgc_i": false,
"ts_bd_i": false,
"ts_bd": false
}
},
"hs_t": {
"sdef_ps": {
"ps_hdid": "h.4wm2lu96oxp8",
"ps_sb_i": true,
"ps_sa_i": false,
"ps_sm": 1,
"ps_sm_i": true,
"ps_hd": 100,
"ps_sa": 3,
"ps_sb": 0
},
"sdef_ts": {
"ts_fgc": "#b01513",
"ts_fgc_i": false,
"ts_bd_i": true,
"ts_fs": 26,
"ts_bd": false,
"ts_fs_i": false
}
},
"hs_h2": {
"sdef_ps": {
"ps_hdid": "h.cfkguxvzv5jl",
"ps_sa_i": false,
"ps_sm": 1,
"ps_sm_i": true,
"ps_hd": 2,
"ps_sa": 6
},
"sdef_ts": {
"ts_fgc": "#b01513",
"ts_fgc_i": false,
"ts_bd_i": true,
"ts_fs": 16,
"ts_bd": false,
"ts_fs_i": false
}
},
"hs_h1": {
"sdef_ps": {
"ps_sb_i": false,
"ps_sb": 20
},
"sdef_ts": {
"ts_bd_i": true,
"ts_fs": 20,
"ts_bd": false,
"ts_fs_i": false
}
},
"hs_nt": {
"sdef_ts": {
"ts_ff_i": false,
"ts_ff": "Cambria"
}
},
"hs_st": {
"sdef_ps": {
"ps_hdid": "h.pli4mhndnqfr",
"ps_sb_i": true,
"ps_ls": 1,
"ps_sa_i": true,
"ps_sm": 1,
"ps_sm_i": true,
"ps_hd": 101,
"ps_ls_i": false,
"ps_sa": 0,
"ps_sb": 0
},
"sdef_ts": {
"ts_ff_i": true,
"ts_fgc": "#595959",
"ts_fgc_i": false,
"ts_fs": 14,
"ts_ff": "Cambria",
"ts_fs_i": false
}
},
"hs_h6": {
"sdef_ps": {
"ps_sb_i": false,
"ps_sa_i": false,
"ps_sa": 4,
"ps_sb": 12
},
"sdef_ts": {
"ts_fgc": "#666666",
"ts_fgc_i": false,
"ts_it": true,
"ts_bd_i": true,
"ts_fs": 11,
"ts_it_i": false,
"ts_bd": false,
"ts_fs_i": false
}
},
"hs_h5": {
"sdef_ps": {
"ps_sb_i": false,
"ps_sa_i": false,
"ps_sa": 4,
"ps_sb": 12
},
"sdef_ts": {
"ts_fgc": "#666666",
"ts_fgc_i": false,
"ts_bd_i": true,
"ts_bd": false
}
},
"hs_h4": {
"sdef_ps": {
"ps_sb_i": false,
"ps_sa_i": false,
"ps_sa": 4,
"ps_sb": 14
},
"sdef_ts": {
"ts_fgc": "#666666",
"ts_fgc_i": false,
"ts_bd_i": true,
"ts_bd": false
}
}
}
},
{
"ty": "as",
"st": "language",
"si": 0,
"ei": 0,
"sm": {
"lgs_l": "en"
}
},
{
"ty": "as",
"st": "paragraph",
"si": 20,
"ei": 20,
"sm": {
"ps_klt_i": true,
"ps_awao_i": true,
"ps_sm_i": true,
"ps_ls_i": true,
"ps_il_i": true,
"ps_ir_i": true,
"ps_al_i": true,
"ps_bl_i": true,
"ps_sd_i": true,
"ps_sb_i": true,
"ps_sa_i": true,
"ps_br_i": true,
"ps_bbtw_i": true,
"ps_kwn_i": true,
"ps_bt_i": true,
"ps_ifl_i": true,
"ps_bb_i": true
}
},
{
"ty": "as",
"st": "paragraph",
"si": 43,
"ei": 43,
"sm": {
"ps_klt_i": true,
"ps_awao_i": true,
"ps_sm_i": true,
"ps_ls_i": true,
"ps_il_i": true,
"ps_ir_i": true,
"ps_bgc_i": true,
"ps_al_i": true,
"ps_bl_i": true,
"ps_sd_i": true,
"ps_sb_i": true,
"ps_sa_i": true,
"ps_br_i": true,
"ps_bbtw_i": true,
"ps_kwn_i": true,
"ps_bt_i": true,
"ps_ifl_i": true,
"ps_bb_i": true
}
},
{
"ty": "as",
"st": "revision_diff",
"si": 1,
"ei": 19,
"sm": {
"revdiff_aid": "0",
"revdiff_dt": 1
}
},
{
"ty": "as",
"st": "revision_diff",
"si": 20,
"ei": 42,
"sm": {
"revdiff_aid": "1",
"revdiff_dt": 1
}
},
{
"ty": "as",
"st": "text",
"si": 0,
"ei": 43,
"sm": {
"ts_un": false,
"ts_un_i": true,
"ts_sc": false,
"ts_st_i": true,
"ts_bgc": null,
"ts_fs_i": true,
"ts_bgc_i": true,
"ts_ff_i": true,
"ts_bd_i": true,
"ts_va_i": true,
"ts_fs": 11,
"ts_ff": "Arial",
"ts_bd": false,
"ts_tw": 400,
"ts_it_i": true,
"ts_fgc": "#000000",
"ts_fgc_i": true,
"ts_it": false,
"ts_va": "nor",
"ts_st": false,
"ts_sc_i": true
}
},
{
"ty": "as",
"st": "text",
"si": 1,
"ei": 19,
"sm": {
"ts_fgc": "#00796b",
"ts_fgc_i": false,
"ts_st": false,
"ts_st_i": false
}
},
{
"ty": "as",
"st": "text",
"si": 20,
"ei": 42,
"sm": {
"ts_fgc": "#512da8",
"ts_fgc_i": false,
"ts_st": false,
"ts_st_i": false
}
}
]
],
"userInfo": {
"0": {
"name": "Quinn Roberts",
"color": "#26A69A",
"anonymous": false
},
"1": {
"name": "Quinn Roberts",
"color": "#673AB7",
"anonymous": false
}
},
"suggestionColors": {
}
}
@youngblood
Copy link

Any idea how one might infer the page number that a particular character number? That is, if I know the si and ei for a word in the text, is there a way to infer page number from that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment