URL's found by loading up a google doc. Opening chrome inspector and going to record network.
Then going to File -> Version History -> See Version History
URL form is https://docs.google.com/document/d/<DOCUMENT ID>/revisions/tiles?id=<DOCUMENT ID>&start=1&showDetailedRevisions=false&filterNamed=false&token=<TOKEN>&includes_info_params=true
Obtained by recording the network whilst loading revision history.
Will return a json of the form seen in "List.json".
Expanding a revision gave: https://docs.google.com/document/d/1namGVTADAlbFkF2QdrcHdr1ilxmOt_6dYJ8Wx5PBsTg/revisions/tiles?id=1namGVTADAlbFkF2QdrcHdr1ilxmOt_6dYJ8Wx5PBsTg&start=3&end=26&showDetailedRevisions=true&filterNamed=false&token=AC4w5VjCEgQV5tpwQNS5gYEEJ3__xGqcQA%3A1535553491010&includes_info_params=true
with content in "Expanded.json"
URL form is https://docs.google.com/document/u/0/d/<DOCUMENT ID>/showrevision?id=<DOCUMENT ID>&end=<START REVISION NUM>&start=<END REVISION NUM>
Obtained by monitoring network and selecting a revision in revison history
Will return a json of the form seen in "Revisions.json". This json also appears to always start with )]}'
Data in chunkedSnapshot
is split up into chunks. Each chunk consists of list of entries. Possible entries are detailed below.
For each 'entry':
ty
indicates the typeas
being metadatais
being content data
st
indicates the type of metadata. only appears on metatada entriesdocument
is for information about the document as a wholeheadings
is for information about the headingslanguage
is for information about the document languageparagraph
is for information about a specific paragraphtext
is for information about a specific section of text. Comparable to div?
si
indicates the character this entry starts at (Not applicable for all metadata)ei
indicates the character this entry ends at (Not applicable for all metadata)
For the content data the following keys have been seen:
ibi
seems to indicate the starting index of that chunk of content
For each of these metadata types, the keys indicated are assumed to be part of sm
unless otherwise specified:
-
language data types
lgs_l
seems to indicate the language code, eg 'en'
-
revision_diff
revdiff_aid
is the key of the author that made the revision. A null editor is given by""
revdiff_dt
sometimes matchesrevdiff_aid
. Possibly indicates addition/removal.- Further testing has shown no way to link user ID from this API to that from the offical API. Additionally the colour assigned to each editor is the only unique constant between the revisions themselves and the revision list.
-
text
- Not all elements may be present.
- If the element is appended with
_i
eg,ts_fgc_i
it may indicate if the value should be inherited. It's not clear where the value would be inherited from. Possibly this would be from the prior entry with a fallback to the given value if that is not available. ts_fgc
is the color of the text. Given as a hex code, eg#000000
ts_bgc
is the color of the background surrrounding the text (ie, highlight color). Given as a hex code, eg#000000
ts_fs
is the font sizets_ff
is the font face. Given as a font name, egArial
ts_un
appears to be a flag for if the text is underlinedts_it
appears to be a flag for if the text is italicisedts_bt
appears to be a flag for if the text is boldedts_st
appears to be a flag for if the text is struckthrough
-
headings
- The top level key here appears to be the level of the heading the following styles apply to.
hs_h1
,hs_h2
,hs_h3
,hs_h4
,hs_h5
, &hs_h6
appear to be heading levels 1 -> 6hs_t
appears to be the titlehs_nt
appears to be the normal text stylehs_st
appears to be the subtitle style
sdef_ts
seems to indicate data about the text stylets_fgc
is the color of the text. Given as a hex code, eg#000000
ts_bgc
is the color of the background surrrounding the text (ie, highlight color). Given as a hex code, eg#000000
ts_fs
is the font sizets_ff
is the font face. Given as a font name, egArial
ts_un
appears to be a flag for if the text is underlinedts_it
appears to be a flag for if the text is italicisedts_bt
appears to be a flag for if the text is boldedts_st
appears to be a flag for if the text is struckthrough
sdef_ps
appears to indicate data about other stuffps_hdid
appears to be the id for that type of heading. Observed values includeh.4wm2lu96oxp8
,h.cfkguxvzv5jl
&h.pli4mhndnqfr
More info can probably be gleaned from
prettyJson.json
- The top level key here appears to be the level of the heading the following styles apply to.
Any idea how one might infer the page number that a particular character number? That is, if I know the
si
andei
for a word in the text, is there a way to infer page number from that?