Skip to content

Instantly share code, notes, and snippets.

@alexjc
Last active July 27, 2023 22:34
Show Gist options
  • Save alexjc/14ff6e70e1ada34f198056657cfb93d9 to your computer and use it in GitHub Desktop.
Save alexjc/14ff6e70e1ada34f198056657cfb93d9 to your computer and use it in GitHub Desktop.
Domain Name Opt-Out Images
1) cdn.shopify.com YES 134,989,900
✓ Loaded data request for https://www.shopify.com/legal/terms from 20f7352cc22027d97700844b4368df8e.pkl []
✓ Retrieved Terms Of Service for https://www.shopify.com/legal/terms with 234,135 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 248 in Terms Of Service. ['highlight', 'paragraph']
❝You agree not to access the Services or monitor any material or
information from the Services using any robot, spider, scraper, or
other automated means.❞
2) i.pinimg.com YES 85,190,559
✓ Loaded data request for https://policy.pinterest.com/en-gb/community-guidelines from 1a38e3617a85185299bf584307e44cba.pkl []
✓ Retrieved Terms Of Service for https://policy.pinterest.com/en-gb/community-guidelines with 120,322 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 192 in Terms Of Service. ['highlight', 'paragraph']
❝Don’t use any undocumented or unsupported method to access, search,
scrape, download or change any part of Pinterest.❞
3) i.ebayimg.com YES 37,013,317
✓ Loaded data request for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 from 27f1e5726ad6ee491fab37c89bcbfd15.pkl []
✓ Retrieved Terms Of Service for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 with 305,622 bytes of text. []
✓ Found a total of 8 matching paragraphs out of 321 in Terms Of Service. ['highlight', 'paragraph']
❝use any robot, spider, scraper, data mining tools, data gathering
and extraction tools, or other automated means to access our
Services for any purpose, except with the prior express permission
of eBay;❞
4) images-na.ssl-images-amazon.com YES 28,936,548
✓ Loaded data request for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM from 9eef0e9202b1101ef60dc0b9928b0f7a.pkl []
✓ Retrieved Terms Of Service for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM with 309,540 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 84 in Terms Of Service. ['highlight', 'paragraph']
❝or any use of data mining, robots, or similar data gathering and
extraction tools.❞
5) thumbs.dreamstime.com YES 24,189,088
✓ Loaded data request for https://www.dreamstime.com/terms from 6a238ed6699afcbd3b4eb66428a463a0.pkl []
✓ Retrieved Terms Of Service for https://www.dreamstime.com/terms with 130,607 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 211 in Terms Of Service. ['highlight', 'paragraph']
❝Additionally, we do not allow the use of automated software or other
crawling techniques for searching our web site and/or retrieve
Media or related information.❞
6) www.specsserver.com ERROR 23,162,113
𐄂 Loaded failed request for https://www.specsserver.com from 4fa8364a397cfb208b57845746fc0e33.pkl []
𐄂 Loaded failed request for https://specsserver.com from 982f0f45ac5cc16eab86ebd024c3a50a.pkl []
7) i0.wp.com MAYBE 23,078,056
✓ Loaded data request for https://wordpress.com/tos/ from 22c13cc537acb524be3915d562660171.pkl []
✓ Retrieved Terms Of Service for https://wordpress.com/tos/ with 113,541 bytes of text. []
𐄂 No direct matches found in 178 paragraphs found at https://wordpress.com/tos/. []
[alexjc] NOTE: Wordpress is the best test for overly broad interpretation of Opt-Out.
8) render.fineartamerica.com YES 22,986,509
✓ Loaded data request for https://fineartamerica.com/termsofuse.html from b86720da0e390886aef1c40a54383807.pkl []
✓ Retrieved Terms Of Service for https://fineartamerica.com/termsofuse.html with 92,919 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 95 in Terms Of Service. ['highlight', 'paragraph']
❝You may not use any computer program tools, including, but not
limited to, web spiders, bots, indexers, robots, crawlers,
harvesters, or any other automatic device, program, algorithm, or
methodology, or any similar equivalent process❞
9) i.ytimg.com YES 20,182,019
✓ Loaded data request for https://www.youtube.com/t/terms from b81d777b9d0781b3cb533166ad7825a9.pkl []
✓ Retrieved Terms Of Service for https://www.youtube.com/t/terms with 37,899 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 157 in Terms Of Service. ['highlight', 'paragraph']
❝such as robots, botnets or scrapers❞
10) images.slideplayer.com YES 18,485,777
✓ Loaded data request for https://slideplayer.com/support/terms/ from 1752de33251020795a08286aa1d4e703.pkl []
✓ Retrieved Terms Of Service for https://slideplayer.com/support/terms/ with 52,416 bytes of text. []
✓ Found total of 1 paragraphs matching non-commercial activities in Terms Of Service. ['highlight']
❝Except as otherwise provided, the Content published on this Website
may be reproduced or distributed in unmodified form for personal
non-commercial use only.❞
11) us.123rf.com YES 18,283,590
✓ Loaded data request for https://www.123rf.com/terms/ from 21e8f5509d8d49cbc77ef31732333c8e.pkl []
✓ Retrieved Terms Of Service for https://www.123rf.com/terms/ with 94,596 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 138 in Terms Of Service. ['highlight', 'paragraph']
❝terminate your account immediately if we detect you are using, or
are trying to use any automated means to download the Content;❞
12) i2.wp.com MAYBE 17,905,324
✓ Loaded data request for https://wordpress.com/tos/ from 22c13cc537acb524be3915d562660171.pkl []
✓ Retrieved Terms Of Service for https://wordpress.com/tos/ with 113,541 bytes of text. []
𐄂 No direct matches found in 178 paragraphs found at https://wordpress.com/tos/. []
13) i1.wp.com MAYBE 17,775,546
✓ Loaded data request for https://wordpress.com/tos/ from 22c13cc537acb524be3915d562660171.pkl []
✓ Retrieved Terms Of Service for https://wordpress.com/tos/ with 113,541 bytes of text. []
𐄂 No direct matches found in 178 paragraphs found at https://wordpress.com/tos/. []
14) t0.gstatic.com YES 17,336,011
✓ The HTTP request (code 200) from https://policies.google.com/terms?hl=en-us returned 290,994 bytes. ['headers']
✓ Retrieved Terms Of Service for https://policies.google.com/terms?hl=en-us with 290,994 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
15) ae01.alicdn.com YES 16,168,646
✓ Loaded data request for https://rulechannel.alibaba.com/icbu?spm=a1zaa.8161610.0.0.23eb61ecCCJxWd&type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 from bc8844c287d47c690adc02a507f8c1a8.pkl []
✓ Retrieved Terms Of Service for https://rulechannel.alibaba.com/icbu?spm=a1zaa.8161610.0.0.23eb61ecCCJxWd&type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 with 726,574 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 366 in Terms Of Service. ['highlight', 'paragraph']
❝whether through robots, spiders, automatic devices or manual
processes❞
16) www.picclickimg.com YES 13,171,157
✓ Loaded data request for https://picclick.com/pages/terms.html from f2afbd693a1a1de4d2e986b8b1576d47.pkl []
✓ Retrieved Terms Of Service for https://picclick.com/pages/terms.html with 98,102 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 116 in Terms Of Service. ['highlight', 'paragraph']
❝copying, distributing, or disclosing any part of the Site or
Services in any medium, including without limitation by any
automated or non-automated “scraping”;❞
17) media.gettyimages.com YES 12,310,094
✓ Loaded data request for https://www.gettyimages.com/company/terms?language=en-us from f2f79f18d4c5c4f06922ea5b7cee3772.pkl []
✓ Retrieved Terms Of Service for https://www.gettyimages.com/company/terms?language=en-us with 108,322 bytes of text. []
𐄂 Found a possible ToS page but language is 'DE' at https://www.gettyimages.at/company/terms?language=en-us. []
[alexjc] NOTE: The page automatically redirects to non-english for me based on my IP,
but the English ToS are correctly parsed.
18) m.media-amazon.com YES 11,849,449
✓ Loaded data request for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM from 9eef0e9202b1101ef60dc0b9928b0f7a.pkl []
✓ Retrieved Terms Of Service for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM with 309,540 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 84 in Terms Of Service. ['highlight', 'paragraph']
❝or any use of data mining, robots, or similar data gathering and
extraction tools.❞
19) images.squarespace-cdn.com YES 10,319,840
✓ Loaded data request for https://www.squarespace.com/acceptable-use-policy/ from 0b4764e36b2d6b216256062e82835f8c.pkl []
✓ Retrieved Terms Of Service for https://www.squarespace.com/acceptable-use-policy/ with 131,824 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 50 in Terms Of Service. ['highlight', 'paragraph']
❝for example, scraping, spidering or crawling❞
20) rlv.zcache.com YES 10,213,734
✓ Loaded data request for https://www.zazzle.com/mk/policy/user_agreement from 1c21c5334b1af9f0be5bb39d41578359.pkl []
✓ Retrieved Terms Of Service for https://www.zazzle.com/mk/policy/user_agreement with 271,424 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 23 in Terms Of Service. ['highlight', 'paragraph']
❝use a manual or automatic device or process to retrieve, index,
"data mine" or in any way reproduce or circumvent the navigational
structure or presentation of the Service;❞
21) static.wixstatic.com YES 9,303,678
✓ Loaded data request for https://www.wix.com/about/terms-of-use from d8dd93974a8aae72c14f65c072588487.pkl []
✓ Retrieved Terms Of Service for https://www.wix.com/about/terms-of-use with 626,561 bytes of text. []
✓ Found a total of 8 matching paragraphs out of 488 in Terms Of Service. ['highlight', 'paragraph']
❝publish and/or make any use of the Wix Services or Licensed Content
on any website, media, network or system other than those provided
by Wix, and/or frame, “deep link”, “page scrape”, mirror and/or
create a browser or border environment around any of the Wix
Services, Licensed Content and/or User Platform❞
22) ssl.c.photoshelter.com YES 9,277,563
✓ Loaded data request for https://company.photoshelter.com/terms/ from be631aff45f59cbe7c0a9059cd72b20a.pkl []
✓ Retrieved Terms Of Service for https://company.photoshelter.com/terms/ with 240,455 bytes of text. []
✓ Found a total of 6 matching paragraphs out of 498 in Terms Of Service. ['highlight', 'paragraph']
❝harvesting or scraping any Content from the Services;❞
23) thumbs.ebaystatic.com YES 9,123,373
✓ Loaded data request for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 from 27f1e5726ad6ee491fab37c89bcbfd15.pkl []
✓ Retrieved Terms Of Service for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 with 305,622 bytes of text. []
✓ Found a total of 8 matching paragraphs out of 321 in Terms Of Service. ['highlight', 'paragraph']
❝use any robot, spider, scraper, data mining tools, data gathering
and extraction tools, or other automated means to access our
Services for any purpose, except with the prior express permission
of eBay;❞
24) photos.smugmug.com YES 8,896,018
✓ Loaded data request for https://www.smugmug.com/about/terms from 29c86d6e6e1b71122b5a3ada51ff185f.pkl []
✓ Retrieved Terms Of Service for https://www.smugmug.com/about/terms with 122,426 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 119 in Terms Of Service. ['highlight', 'paragraph']
❝Scraping or otherwise using any data mining, robots or similar data
gathering or extraction methods on or in connection with the
Services;❞
25) s3.amazonaws.com YES 8,486,239
✓ Loaded data request for https://s3.amazonaws.com from ebd054b47a8c34de5a2bbcf813167752.pkl []
✓ Found 2 obvious ToS link(s) from 3 href matches and 2 text matches. []
✓ Loaded data request for https://aws.amazon.com/legal/?nc1=f_cc from 4e132a7ef45e24a27b1a49734199c749.pkl []
✓ Retrieved Terms Of Service for https://aws.amazon.com/legal/?nc1=f_cc with 369,893 bytes of text. []
𐄂 The ToS page does not appear to contain a legal text at https://aws.amazon.com/legal/?nc1=f_cc. []
✓ Found 7 obvious ToS link(s) from 66 href matches and 9 text matches. []
✓ Loaded data request for https://aws.amazon.com/terms/?nc1=f_pr from a104d097fcba7f4aa3ce3d39f0a45358.pkl []
✓ Retrieved Terms Of Service for https://aws.amazon.com/terms/?nc1=f_pr with 58,317 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 23 in Terms Of Service. ['highlight', 'paragraph']
❝or any use of data mining, robots, or similar data gathering and
extraction tools.❞
[alexjc] NOTE: For S3 domains, the tool finds AWS Site Terms. The Opt-Out should have been checked
on the originating domain instead.
26) image.made-in-china.com YES 8,156,285
𐄂 Loaded failed request for https://image.made-in-china.com from 937d45cf31ec6b66526b629326b928e7.pkl []
✓ Loaded data request for https://made-in-china.com from f3675802fe0bb4cbae244d4588546a8a.pkl []
✓ Found 1 obvious ToS link(s) from 3 href matches and 1 text matches. []
✓ Loaded data request for https://www.made-in-china.com/help/terms/ from 8e80e3ac4d446b5baec9e4327c6ce91a.pkl []
✓ Retrieved Terms Of Service for https://www.made-in-china.com/help/terms/ with 53,130 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 71 in Terms Of Service. ['highlight', 'paragraph']
❝Collecting information about other users❞
[alexjc] NOTE: The correct section for these terms would be the following, but it doesn't match the current rules:
❝Unless mandatorily provided by law, without a written consent of
Focus Tech., no entity or individual shall copy, reproduce, cite,
link, grasp or otherwise use the aforementioned information or
property in whole or in part in any way, otherwise, Focus Tech.
reserves the right to pursue its legal responsibilities.❞
27) slideplayer.com YES 7,900,772
✓ Loaded data request for https://slideplayer.com/support/terms/ from 1752de33251020795a08286aa1d4e703.pkl []
✓ Retrieved Terms Of Service for https://slideplayer.com/support/terms/ with 52,416 bytes of text. []
✓ Found total of 1 paragraphs matching non-commercial activities in Terms Of Service. ['highlight']
❝Except as otherwise provided, the Content published on this Website
may be reproduced or distributed in unmodified form for personal
non-commercial use only.❞
28) p.keepcalms.com YES 7,623,626
✓ Loaded data request for https://keepcalms.com/privacy/ from 302cb73a61496f832b3d0bdd2d3b685f.pkl []
✓ Retrieved Terms Of Service for https://keepcalms.com/privacy/ with 32,935 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 35 in Terms Of Service. ['highlight', 'paragraph']
❝In addition to log data, we may also collect information about the
device you’re using,including what type of device it is, what
operating system you’re using, device settings, unique device
identifiers, and crash data.❞
[alexjc] NOTE: This appears to be a false positive, the other paragraph matching requires precautions
from site users, but it's unclear whether it's a reservation:
❝Please make sure that you respect copyrights, trademarks and other
legal rights when using our website❞
29) ecx.images-amazon.com YES 7,491,200
✓ Loaded data request for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM from 9eef0e9202b1101ef60dc0b9928b0f7a.pkl []
✓ Retrieved Terms Of Service for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM with 309,540 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 84 in Terms Of Service. ['highlight', 'paragraph']
❝or any use of data mining, robots, or similar data gathering and
extraction tools.❞
30) i3.cpcache.com YES 7,361,988
✓ Loaded data request for https://www.cafepress.com/p/terms-conditions from f6114c509032554714f3c2942f47e186.pkl []
✓ Retrieved Terms Of Service for https://www.cafepress.com/p/terms-conditions with 352,765 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 137 in Terms Of Service. ['highlight', 'paragraph']
❝You may not use any data mining, robots, scraping or similar data
gathering or extraction methods in connection with your use of the
Services unless provided by us as part of the Services.❞
31) encrypted-tbn0.gstatic.com YES 7,300,508
✓ Loaded data request for https://policies.google.com/terms?hl=en-us from 1bfd317a96a80e2f9dec834a5d2e2ac0.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/terms?hl=en-us with 290,994 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
32) cdn11.bigcommerce.com YES 6,948,007
✓ Loaded data request for https://www.bigcommerce.com/terms/acceptable-use-policy/ from 9beb89525b0743e645994d46febaaa24.pkl []
✓ Retrieved Terms Of Service for https://www.bigcommerce.com/terms/acceptable-use-policy/ with 130,205 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 13 in Terms Of Service. ['highlight', 'paragraph']
❝You will not access or copy any portion of the Website or Services
through any automated viewing, downloading or crawling systems.❞
33) thumbs4.ebaystatic.com YES 6,776,366
✓ Loaded data request for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 from 27f1e5726ad6ee491fab37c89bcbfd15.pkl []
✓ Retrieved Terms Of Service for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 with 305,622 bytes of text. []
✓ Found a total of 8 matching paragraphs out of 321 in Terms Of Service. ['highlight', 'paragraph']
❝use any robot, spider, scraper, data mining tools, data gathering
and extraction tools, or other automated means to access our
Services for any purpose, except with the prior express permission
of eBay;❞
34) image.shutterstock.com YES 6,490,103
✓ Loaded data request for https://www.shutterstock.com/terms?language=en-US from 232dec04b7ec701660922bb221d69be7.pkl []
✓ Retrieved Terms Of Service for https://www.shutterstock.com/terms?language=en-US with 174,714 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 47 in Terms Of Service. ['highlight', 'paragraph']
❝In addition, you agree not to use any data mining, robots or similar
data and/or image gathering and extraction methods in connection
with the Site or Shutterstock Content.❞
35) ctl.s6img.com YES 6,138,016
✓ Loaded data request for https://society6.com/terms from fd1a466be238ed9a41f74f7e1df7ca94.pkl []
✓ Retrieved Terms Of Service for https://society6.com/terms with 284,548 bytes of text. []
✓ Found a total of 4 matching paragraphs out of 217 in Terms Of Service. ['highlight', 'paragraph']
❝Use any robot, spider, scraper, or other automated means to access
the Services for any purpose without our express written
permission;❞
36) cdn.vectorstock.com YES 6,109,783
✓ Loaded data request for https://www.vectorstock.com/faq/members/terms-of-use from 553226aacd92262f1b2f95558fe95350.pkl []
✓ Retrieved Terms Of Service for https://www.vectorstock.com/faq/members/terms-of-use with 66,191 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 56 in Terms Of Service. ['highlight', 'paragraph']
❝compile or extract Materials or Content from this Website for the
purpose of making such Materials or Content available to others❞
37) tse1.mm.bing.net YES 6,101,341
✓ Loaded data request for https://www.microsoft.com/en-us/legal/terms-of-use from 40f8d3e5f6fba31bece3f862d0e963c7.pkl []
✓ Retrieved Terms Of Service for https://www.microsoft.com/en-us/legal/terms-of-use with 212,173 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 54 in Terms Of Service. ['highlight', 'paragraph']
❝You may not use web scraping, web harvesting, or web data extraction
methods to extract data from the AI services.❞
38) cdn.xxl.thumbs.canstockphoto.com YES 6,082,900
✓ Loaded data request for https://www.canstockphoto.com/terms_conditions/ from adaef877ca223243ff0617b47b69c5da.pkl []
✓ Retrieved Terms Of Service for https://www.canstockphoto.com/terms_conditions/ with 50,364 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 54 in Terms Of Service. ['highlight', 'paragraph']
❝Use any data mining, robots or similar data and/or image gathering
or extraction technology or algorithms to crawl, scrape or monitor
the Site or seek information on Site Visitor’s or Company’s
customers;❞
39) tse2.mm.bing.net YES 5,814,565
✓ Loaded data request for https://www.microsoft.com/en-us/legal/terms-of-use from 40f8d3e5f6fba31bece3f862d0e963c7.pkl []
✓ Retrieved Terms Of Service for https://www.microsoft.com/en-us/legal/terms-of-use with 212,173 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 54 in Terms Of Service. ['highlight', 'paragraph']
❝You may not use web scraping, web harvesting, or web data extraction
methods to extract data from the AI services.❞
40) tse3.mm.bing.net YES 5,763,449
✓ Loaded data request for https://www.microsoft.com/en-us/legal/terms-of-use from 40f8d3e5f6fba31bece3f862d0e963c7.pkl []
✓ Retrieved Terms Of Service for https://www.microsoft.com/en-us/legal/terms-of-use with 212,173 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 54 in Terms Of Service. ['highlight', 'paragraph']
❝You may not use web scraping, web harvesting, or web data extraction
methods to extract data from the AI services.❞
41) tse4.mm.bing.net YES 5,758,678
✓ Loaded data request for https://www.microsoft.com/en-us/legal/terms-of-use from 40f8d3e5f6fba31bece3f862d0e963c7.pkl []
✓ Retrieved Terms Of Service for https://www.microsoft.com/en-us/legal/terms-of-use with 212,173 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 54 in Terms Of Service. ['highlight', 'paragraph']
❝You may not use web scraping, web harvesting, or web data extraction
methods to extract data from the AI services.❞
42) ecdn.teacherspayteachers.com YES 5,694,468
✓ Loaded data request for https://www.teacherspayteachers.com/Terms-of-Service from 54eae5a2c927ab89ed22e44d8894a7a5.pkl []
✓ Retrieved Terms Of Service for https://www.teacherspayteachers.com/Terms-of-Service with 477,639 bytes of text. []
✓ Found total of 12 paragraphs matching non-commercial activities in Terms Of Service. ['highlight']
❝Digital Users may use Digital Resources and Digitized Content for
Personal Use in accordance with available Digital Services
functionality and integrations, including to share or assign
Digital Resources and/or Digitized Content to enable student access
and use.❞
[alexjc] NOTE: There is a better paragraph that explains non-commercial reservation of rights more clearly:
❝You may not use the Resource, in part or in whole, for commercial
purposes. This means you can’t sell it, use it for advertising or
marketing purposes, or use it in any other way in connection with a
business or profit making activity.❞
43) ih1.redbubble.net YES 5,636,940
✓ Loaded data request for https://www.redbubble.com/agreement from a8dcada66cb9729948f2843b941138a2.pkl []
✓ Retrieved Terms Of Service for https://www.redbubble.com/agreement with 131,186 bytes of text. []
✓ Found total of 1 paragraphs matching non-commercial activities in Terms Of Service. ['highlight']
❝non-commercial use❞
[alexjc] NOTE: The code that picks the highlighted text stops around brackets, but if the result
is too short it should expand the highlight further to provide more context.
44) media-cdn.tripadvisor.com YES 5,538,481
✓ Loaded data request for https://tripadvisor.mediaroom.com/us-terms-of-use from e15e6035848ed720a4be1b6bf7d7e7da.pkl []
✓ Retrieved Terms Of Service for https://tripadvisor.mediaroom.com/us-terms-of-use with 120,993 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 177 in Terms Of Service. ['highlight', 'paragraph']
❝access, monitor, reproduce, distribute, transmit, broadcast,
display, sell, license, copy or otherwise exploit any Content of
the Services, including but not limited to, user profiles and
photos, using any robot, spider, scraper or other automated means
or any manual process for any purpose not in accordance with this
Agreement or without our express written permission;❞
45) i.dailymail.co.uk YES 5,459,929
✓ Loaded data request for https://www.dailymail.co.uk/home/article-1388146/Terms.html from c835ad5b88e3b9f97754dfd989e6f4b0.pkl []
✓ Retrieved Terms Of Service for https://www.dailymail.co.uk/home/article-1388146/Terms.html with 445,050 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 160 in Terms Of Service. ['highlight', 'paragraph']
❝any content, files or data from the Site to make or populate a
database or publication of any kind whatsoever.❞
46) m.psecn.photoshelter.com YES 5,296,951
✓ Loaded data request for https://www.photoshelter.com/support/terms from b198f4e4710be691b5bc0351442128e4.pkl []
✓ Retrieved Terms Of Service for https://www.photoshelter.com/support/terms with 96,696 bytes of text. []
✓ Found a total of 17 matching paragraphs out of 123 in Terms Of Service. ['highlight', 'paragraph']
❝harvesting or scraping any Content from the Services;❞
47) ih0.redbubble.net YES 5,228,487
✓ Loaded data request for https://www.redbubble.com/agreement from a8dcada66cb9729948f2843b941138a2.pkl []
✓ Retrieved Terms Of Service for https://www.redbubble.com/agreement with 131,186 bytes of text. []
✓ Found total of 1 paragraphs matching non-commercial activities in Terms Of Service. ['highlight']
❝non-commercial use❞
48) static1.bigstockphoto.com YES 5,220,604
✓ Loaded data request for https://www.bigstockphoto.com/usage.html from 9d7b5fbb094c6b54311e8f36d16ad611.pkl []
✓ Retrieved Terms Of Service for https://www.bigstockphoto.com/usage.html with 87,645 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 77 in Terms Of Service. ['highlight', 'paragraph']
❝com website or any content thereon for any purpose, including, by
way of example only, downloading Content, indexing, scraping or
caching any content on the website.❞
49) rlv.zcache.co.uk YES 5,113,229
✓ Loaded data request for https://www.zazzle.com/mk/policy/user_agreement from 1c21c5334b1af9f0be5bb39d41578359.pkl []
✓ Retrieved Terms Of Service for https://www.zazzle.com/mk/policy/user_agreement with 271,424 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 23 in Terms Of Service. ['highlight', 'paragraph']
❝use a manual or automatic device or process to retrieve, index,
"data mine" or in any way reproduce or circumvent the navigational
structure or presentation of the Service;❞
50) sc01.alicdn.com YES 4,881,201
✓ Loaded data request for https://rulechannel.alibaba.com/icbu?spm=a1zaa.8161610.0.0.23eb61ecCCJxWd&type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 from bc8844c287d47c690adc02a507f8c1a8.pkl []
✓ Retrieved Terms Of Service for https://rulechannel.alibaba.com/icbu?spm=a1zaa.8161610.0.0.23eb61ecCCJxWd&type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 with 726,574 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 366 in Terms Of Service. ['highlight', 'paragraph']
❝whether through robots, spiders, automatic devices or manual
processes❞
51) images-eu.ssl-images-amazon.com YES 4,741,685
✓ Loaded data request for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM from 9eef0e9202b1101ef60dc0b9928b0f7a.pkl []
✓ Retrieved Terms Of Service for https://www.amazon.com/gp/help/customer/display.html%3FnodeId%3DGLSBYFE9MGKKQXXM with 309,540 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 84 in Terms Of Service. ['highlight', 'paragraph']
❝or any use of data mining, robots, or similar data gathering and
extraction tools.❞
52) media.istockphoto.com YES 4,588,375
✓ Loaded data request for https://www.istockphoto.com/en/legal/terms-of-use from 46c3e51118d84adb7f4449795eeb91ba.pkl []
✓ Retrieved Terms Of Service for https://www.istockphoto.com/en/legal/terms-of-use with 191,667 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 90 in Terms Of Service. ['highlight', 'paragraph']
❝using any data mining, robots or similar data gathering or
extraction methods;❞
53) img.shopstyle-cdn.com YES 4,527,840
✓ Loaded data request for https://www.shopstyle.com/tos from 70d7f148a7be60c09845657997c0810f.pkl []
✓ Retrieved Terms Of Service for https://www.shopstyle.com/tos with 188,595 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 73 in Terms Of Service. ['highlight', 'paragraph']
❝use the manual or automated software, devices, or other processes to
“scrape“, “crawl“ or “spider“ or index any page or Content of the
Services;❞
54) 1.bp.blogspot.com YES 4,489,825
✓ Loaded data request for https://policies.google.com/u/0/terms from e9327c12176a3c7a2ea02462392d8851.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/u/0/terms with 289,517 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
55) lid.zoocdn.com YES 4,444,825
✓ Loaded data request for https://www.zoopla.co.uk/terms/ from de8b27abb399de24e147cbb05eb39258.pkl []
✓ Retrieved Terms Of Service for https://www.zoopla.co.uk/terms/ with 233,835 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 66 in Terms Of Service. ['highlight', 'paragraph']
❝including, without limitation, any web-crawling or screen-scraping
software or any equivalent technology or techniques❞
56) storage.googleapis.com YES 4,426,841
✓ Loaded data request for https://policies.google.com/terms?hl=en from fcb9ad48b61d46da7bece1e62bc0681f.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/terms?hl=en with 285,596 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
[alexjc] NOTE: For this domain, the originating domain for each file should be checked for Opt-Out.
Unfortunately, this information was never checked by the dataset creator and lost.
57) s.yimg.com YES 4,356,764
✓ Loaded data request for https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html from 0d047fe64c7b266285f896f813ac29ac.pkl []
✓ Retrieved Terms Of Service for https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html with 106,877 bytes of text. []
✓ Found a total of 5 matching paragraphs out of 556 in Terms Of Service. ['highlight', 'paragraph']
❝You may not access or collect data, or attempt to access or collect
data, from our Services using any automated means, devices,
programs, algorithms or methodologies, including but not limited to
robots, spiders, scrapers, data mining tools, or data gathering or
extraction tools, for any purpose without our express, prior
permission;❞
58) img0.etsystatic.com YES 4,340,301
✓ Loaded data request for https://www.etsy.com/legal/terms-of-use?ref=ftr from afdd55d84befdb1de8270f979e4033a0.pkl []
✓ Retrieved Terms Of Service for https://www.etsy.com/legal/terms-of-use?ref=ftr with 184,268 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 104 in Terms Of Service. ['highlight', 'paragraph']
❝You agree not to crawl, scrape, or spider any page of the Services
or to reverse engineer or attempt to obtain the source code of the
Services❞
59) img1.etsystatic.com YES 4,327,461
✓ Loaded data request for https://www.etsy.com/legal/terms-of-use?ref=ftr from afdd55d84befdb1de8270f979e4033a0.pkl []
✓ Retrieved Terms Of Service for https://www.etsy.com/legal/terms-of-use?ref=ftr with 184,268 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 104 in Terms Of Service. ['highlight', 'paragraph']
❝You agree not to crawl, scrape, or spider any page of the Services
or to reverse engineer or attempt to obtain the source code of the
Services❞
60) static2.bigstockphoto.com YES 4,293,261
✓ Loaded data request for https://www.bigstockphoto.com/usage.html from 9d7b5fbb094c6b54311e8f36d16ad611.pkl []
✓ Retrieved Terms Of Service for https://www.bigstockphoto.com/usage.html with 87,645 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 77 in Terms Of Service. ['highlight', 'paragraph']
❝com website or any content thereon for any purpose, including, by
way of example only, downloading Content, indexing, scraping or
caching any content on the website.❞
61) lh3.googleusercontent.com YES 4,288,953
✓ Loaded data request for https://policies.google.com/terms?hl=en from fcb9ad48b61d46da7bece1e62bc0681f.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/terms?hl=en with 285,596 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
62) upload.wikimedia.org YES 4,055,419
✓ Loaded data request for https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use from 6f71f6ce2ae2bfffc2d73bab684a81dd.pkl []
✓ Retrieved Terms Of Service for https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use with 128,842 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 201 in Terms Of Service. ['highlight', 'paragraph']
❝By using our APIs, you agree to abide by all applicable policies
governing the use of the APIs, which include but are not limited to
the User-Agent Policy , the Robot Policy , and the API:Etiquette❞
63) www.dhresource.com MAYBE 3,955,179
✓ Loaded data request for https://www.dhgate.com/helpbuyer/11111111111111111111111111111111-en-0/444c48545da84a0e9ad179106ca19d09.html from b5ac7d8ea9ca376e54629e9db222eaaa.pkl []
✓ Retrieved Terms Of Service for https://www.dhgate.com/helpbuyer/11111111111111111111111111111111-en-0/444c48545da84a0e9ad179106ca19d09.html with 87,771 bytes of text. []
𐄂 No direct matches found in 125 paragraphs found at https://www.dhgate.com/helpbuyer/11111111111111111111111111111111-en-0/444c48545da84a0e9ad179106ca19d09.html. []
64) static3.bigstockphoto.com YES 3,943,912
✓ Loaded data request for https://www.bigstockphoto.com/usage.html from 9d7b5fbb094c6b54311e8f36d16ad611.pkl []
✓ Retrieved Terms Of Service for https://www.bigstockphoto.com/usage.html with 87,645 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 77 in Terms Of Service. ['highlight', 'paragraph']
❝com website or any content thereon for any purpose, including, by
way of example only, downloading Content, indexing, scraping or
caching any content on the website.❞
65) images.pond5.com YES 3,925,528
✓ Loaded data request for https://www.pond5.com/legal/terms from 2840b91f9f3b965c28b896d5d4952d74.pkl []
✓ Retrieved Terms Of Service for https://www.pond5.com/legal/terms with 225,657 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 84 in Terms Of Service. ['highlight', 'paragraph']
❝use any data mining, robots or similar data gathering and extraction
tools on or at the Website or use any other automated means to
access the Website;❞
66) ak1.ostkcdn.com YES 3,866,642
✓ Loaded data request for https://help.overstock.com/help/s/article/TERMS-AND-CONDITIONS from 9948e39f59f8f4737e2c58ab3250ba76.pkl []
✓ Retrieved Terms Of Service for https://help.overstock.com/help/s/article/TERMS-AND-CONDITIONS with 303,217 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 84 in Terms Of Service. ['highlight', 'paragraph']
❝and any use of data mining, screen-scraping, robots, or similar data
gathering and extraction tools.❞
67) cdn3.volusion.com YES 3,554,449
✓ Loaded data request for https://volusion.com/terms-of-service/ from 6d0c65e1e01b82414f1fb14a5cdaff17.pkl []
✓ Retrieved Terms Of Service for https://volusion.com/terms-of-service/ with 106,793 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 171 in Terms Of Service. ['highlight', 'paragraph']
❝access the Services through automated means, including, without
limitation, through “robots”, “spiders” or “offline readers”❞
68) data.whicdn.com YES 3,553,977
✓ Loaded data request for https://prezi.com/legal/terms-of-service/ from d1bbe7e620260f576ded447c64a59b95.pkl []
✓ Retrieved Terms Of Service for https://prezi.com/legal/terms-of-service/ with 188,281 bytes of text. []
✓ Found a total of 4 matching paragraphs out of 253 in Terms Of Service. ['highlight', 'paragraph']
❝including, without limitation, any robot, spider or offline reader❞
69) pictures.abebooks.com YES 3,485,233
✓ Loaded data request for https://www.abebooks.com/docs/legal/termsandconditions.shtml?cm_sp=Ftr-_-Home-_-legal from 4e4dd5d4990567a308a2a7edc50be4bd.pkl []
✓ Retrieved Terms Of Service for https://www.abebooks.com/docs/legal/termsandconditions.shtml?cm_sp=Ftr-_-Home-_-legal with 113,165 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 58 in Terms Of Service. ['highlight', 'paragraph']
❝or any use of data mining, robots, or similar data gathering and
extraction tools.❞
70) thumbs3.ebaystatic.com YES 3,394,978
✓ Loaded data request for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 from 27f1e5726ad6ee491fab37c89bcbfd15.pkl []
✓ Retrieved Terms Of Service for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 with 305,622 bytes of text. []
✓ Found a total of 8 matching paragraphs out of 321 in Terms Of Service. ['highlight', 'paragraph']
❝use any robot, spider, scraper, data mining tools, data gathering
and extraction tools, or other automated means to access our
Services for any purpose, except with the prior express permission
of eBay;❞
71) st.depositphotos.com YES 3,328,332
✓ Loaded data request for https://depositphotos.com/terms-of-use.html from 4599a6bfa7c7327b409398c4cac383f3.pkl []
✓ Retrieved Terms Of Service for https://depositphotos.com/terms-of-use.html with 79,637 bytes of text. []
✓ Found total of 1 paragraphs matching non-commercial activities in Terms Of Service. ['highlight']
❝Files available at the Site may be copied and used for private use
by the User solely for non-commercial or educational purposes only.❞
72) thumbs1.ebaystatic.com YES 3,318,784
✓ Loaded data request for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 from 27f1e5726ad6ee491fab37c89bcbfd15.pkl []
✓ Retrieved Terms Of Service for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 with 305,622 bytes of text. []
✓ Found a total of 8 matching paragraphs out of 321 in Terms Of Service. ['highlight', 'paragraph']
❝use any robot, spider, scraper, data mining tools, data gathering
and extraction tools, or other automated means to access our
Services for any purpose, except with the prior express permission
of eBay;❞
73) di2ponv0v5otw.cloudfront.net YES 3,317,474
✓ Loaded data request for https://poshmark.com/terms from 5f9d0d37daaeee3ddfb583eac739bef6.pkl []
✓ Retrieved Terms Of Service for https://poshmark.com/terms with 342,066 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 128 in Terms Of Service. ['highlight', 'paragraph']
❝You agree you will not copy, modify, scrape, distribute, create
derivative works, or the like, or do or perform any other action
with the Service Content or Poshmarks intellectual property that
you are exposed to through our Service not explicitly authorized by
this Agreement.❞
74) image.slidesharecdn.com YES 3,311,646
✓ Loaded data request for https://support.scribd.com/hc/en-us/articles/210129326-General-Terms-of-Use from 611ebdea8dae81715460f7b093f90bf0.pkl []
✓ Retrieved Terms Of Service for https://support.scribd.com/hc/en-us/articles/210129326-General-Terms-of-Use with 151,021 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 77 in Terms Of Service. ['highlight', 'paragraph']
❝15 use any robot, spider, scraper, or other automated means to
access Scribd, or copy, print, access, store, transfer, or share
any content accessible through Scribd, for any purpose or to bypass
any measures Scribd may use to prevent or restrict access, or the
ability to copy, print, access, store, transfer, or share content;❞
75) 2.bp.blogspot.com YES 3,283,455
✓ Loaded data request for https://policies.google.com/u/0/terms from e9327c12176a3c7a2ea02462392d8851.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/u/0/terms with 289,517 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
76) rlv.zcache.ca YES 3,271,715
✓ Loaded data request for https://www.zazzle.com/mk/policy/user_agreement from 1c21c5334b1af9f0be5bb39d41578359.pkl []
✓ Retrieved Terms Of Service for https://www.zazzle.com/mk/policy/user_agreement with 271,424 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 23 in Terms Of Service. ['highlight', 'paragraph']
❝use a manual or automatic device or process to retrieve, index,
"data mine" or in any way reproduce or circumvent the navigational
structure or presentation of the Service;❞
77) img.youtube.com YES 3,244,258
✓ Loaded data request for https://www.youtube.com/t/terms from b81d777b9d0781b3cb533166ad7825a9.pkl []
✓ Retrieved Terms Of Service for https://www.youtube.com/t/terms with 37,899 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 157 in Terms Of Service. ['highlight', 'paragraph']
❝such as robots, botnets or scrapers❞
78) thumb7.shutterstock.com YES 3,218,314
✓ Loaded data request for https://www.shutterstock.com/terms?language=en-US from 232dec04b7ec701660922bb221d69be7.pkl []
✓ Retrieved Terms Of Service for https://www.shutterstock.com/terms?language=en-US with 174,714 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 47 in Terms Of Service. ['highlight', 'paragraph']
❝In addition, you agree not to use any data mining, robots or similar
data and/or image gathering and extraction methods in connection
with the Site or Shutterstock Content.❞
79) thumbs2.ebaystatic.com YES 3,216,555
✓ Loaded data request for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 from 27f1e5726ad6ee491fab37c89bcbfd15.pkl []
✓ Retrieved Terms Of Service for https://www.ebay.com/help/policies/member-behaviour-policies/user-agreement?id=4259 with 305,622 bytes of text. []
✓ Found a total of 8 matching paragraphs out of 321 in Terms Of Service. ['highlight', 'paragraph']
❝use any robot, spider, scraper, data mining tools, data gathering
and extraction tools, or other automated means to access our
Services for any purpose, except with the prior express permission
of eBay;❞
80) thumbs.slideserve.com YES 3,212,294
✓ Loaded data request for https://www.slideserve.com/terms from 5c9273d29a4be38dc06f12bc191692b7.pkl []
✓ Retrieved Terms Of Service for https://www.slideserve.com/terms with 38,686 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 77 in Terms Of Service. ['highlight', 'paragraph']
❝You agree not to use or launch any automated system, including
without limitation, "robots," "spiders," "offline readers," etc.❞
81) sc02.alicdn.com YES 3,204,697
✓ Loaded data request for https://rulechannel.alibaba.com/icbu?spm=a1zaa.8161610.0.0.23eb61ecCCJxWd&type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 from bc8844c287d47c690adc02a507f8c1a8.pkl []
✓ Retrieved Terms Of Service for https://rulechannel.alibaba.com/icbu?spm=a1zaa.8161610.0.0.23eb61ecCCJxWd&type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 with 726,574 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 366 in Terms Of Service. ['highlight', 'paragraph']
❝whether through robots, spiders, automatic devices or manual
processes❞
82) books.google.com YES 3,198,061
✓ Loaded data request for https://policies.google.com/terms?hl=en from fcb9ad48b61d46da7bece1e62bc0681f.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/terms?hl=en with 285,596 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
83) res.cloudinary.com YES 3,176,409
✓ Loaded data request for https://cloudinary.com/tou from f1b60ea61edbaaf620cfc42e1d3f290b.pkl []
✓ Retrieved Terms Of Service for https://cloudinary.com/tou with 105,541 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 70 in Terms Of Service. ['highlight', 'paragraph']
❝Some features or capabilities of the Services integrate with or use
generative Artificial Intelligence tools and machine learning
abilities.❞
84) www.gannett-cdn.com ERROR 3,159,971
𐄂 Loaded failed request for https://www.gannett-cdn.com from a2950ef20bd384d6836f0cef95d45a8d.pkl []
𐄂 Loaded failed request for https://gannett-cdn.com from 8d3b4559540678e44ee6d840d5c2014d.pkl []
85) rlv.zcache.com.au YES 3,152,758
✓ Loaded data request for https://www.zazzle.com/mk/policy/user_agreement from 1c21c5334b1af9f0be5bb39d41578359.pkl []
✓ Retrieved Terms Of Service for https://www.zazzle.com/mk/policy/user_agreement with 271,424 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 23 in Terms Of Service. ['highlight', 'paragraph']
❝use a manual or automatic device or process to retrieve, index,
"data mine" or in any way reproduce or circumvent the navigational
structure or presentation of the Service;❞
86) i.etsystatic.com YES 3,069,860
✓ Loaded data request for https://www.etsy.com/legal/terms-of-use?ref=ftr from afdd55d84befdb1de8270f979e4033a0.pkl []
✓ Retrieved Terms Of Service for https://www.etsy.com/legal/terms-of-use?ref=ftr with 184,268 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 104 in Terms Of Service. ['highlight', 'paragraph']
❝You agree not to crawl, scrape, or spider any page of the Services
or to reverse engineer or attempt to obtain the source code of the
Services❞
87) 3.bp.blogspot.com YES 3,059,071
✓ Loaded data request for https://policies.google.com/u/0/terms from e9327c12176a3c7a2ea02462392d8851.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/u/0/terms with 289,517 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
88) www.cowcow.com MAYBE 3,048,788
✓ Loaded data request for https://www.cowcow.com/useragreement.html from 7b2125527c4a43d85ef5956bd1558266.pkl []
✓ Retrieved Terms Of Service for https://www.cowcow.com/useragreement.html with 79,689 bytes of text. []
𐄂 No direct matches found in 93 paragraphs found at https://www.cowcow.com/useragreement.html. []
[alexjc] NOTE: These terms were last updated in 2011! With new updates likely robots and spiders would be disallowed.
89) thumb1.shutterstock.com YES 3,048,750
✓ Loaded data request for https://www.shutterstock.com/terms?language=en-US from 232dec04b7ec701660922bb221d69be7.pkl []
✓ Retrieved Terms Of Service for https://www.shutterstock.com/terms?language=en-US with 174,714 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 47 in Terms Of Service. ['highlight', 'paragraph']
❝In addition, you agree not to use any data mining, robots or similar
data and/or image gathering and extraction methods in connection
with the Site or Shutterstock Content.❞
90) cdp.azureedge.net YES 3,029,208
✓ Loaded data request for https://kenssports.com/Terms from 9b3e845ae18150597f64779009ed4bfc.pkl []
✓ Retrieved Terms Of Service for https://kenssports.com/Terms with 74,027 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 59 in Terms Of Service. ['highlight', 'paragraph']
❝including but not limited to spiders, robots, crawlers, scrapers,
deep-links, data-mining, data-gathering or extraction tools and the
like❞
91) 4.bp.blogspot.com YES 3,016,343
✓ Loaded data request for https://policies.google.com/u/0/terms from e9327c12176a3c7a2ea02462392d8851.pkl []
✓ Retrieved Terms Of Service for https://policies.google.com/u/0/terms with 289,517 bytes of text. []
✓ Found a total of 3 matching paragraphs out of 176 in Terms Of Service. ['highlight', 'paragraph']
❝we reasonably believe that your conduct causes harm or liability to
a user, third party, or Google — for example, by hacking, phishing,
harassing, spamming, misleading others, or scraping content that
doesn’t belong to you❞
92) i.oodleimg.com YES 2,998,674
✓ Loaded data request for https://www.oodle.com/info/terms/ from fc86d9e6917b16684cb8a623baa10f36.pkl []
✓ Retrieved Terms Of Service for https://www.oodle.com/info/terms/ with 80,811 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 66 in Terms Of Service. ['highlight', 'paragraph']
❝Additionally, you agree not to: Contact anyone who has asked not to
be contacted "Stalk" or otherwise harass anyone Delete or revise
any material posted by any other user Impersonate any person or
entity or falsely state or misrepresent your affiliation with
another person or entity Use automated means, including spiders,
robots, crawlers, data mining tools, or the like to download data
from the Site Attempt to gain unauthorized access Oodle computer
systems or networks to engage in any activity that disrupts,
diminishes the quality of, interferes with the performance of, or
impairs the functionality of, the Service or the Site❞
[alexjc] NOTE: The highlight is displayed in a confusing way and should ideally start half-way through.
93) bt-photos.global.ssl.fastly.net YES 2,969,078
✓ Loaded data request for https://boomtownroi.com/terms-of-use/ from 4cfe2a5d0529386d6a243f77397e90d3.pkl []
✓ Retrieved Terms Of Service for https://boomtownroi.com/terms-of-use/ with 87,741 bytes of text. []
✓ Found a total of 1 matching paragraphs out of 22 in Terms Of Service. ['highlight', 'paragraph']
❝Use or launch any automated system, including without limitation,
“robots,” “spiders,” or “offline readers,” that accesses the Site
in a manner that sends more request messages to the BoomTown
servers in a given period of time than a human can reasonably
produce in the same period by using a conventional on-line web
browser;❞
94) i00.i.aliimg.com YES 2,939,722
✓ Loaded data request for https://rulechannel.alibaba.com/icbu?type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 from de1a85b59a9b13d923d8929fe653cfde.pkl []
✓ Retrieved Terms Of Service for https://rulechannel.alibaba.com/icbu?type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 with 742,855 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 366 in Terms Of Service. ['highlight', 'paragraph']
❝whether through robots, spiders, automatic devices or manual
processes❞
95) i01.i.aliimg.com YES 2,914,088
✓ Loaded data request for https://rulechannel.alibaba.com/icbu?type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 from de1a85b59a9b13d923d8929fe653cfde.pkl []
✓ Retrieved Terms Of Service for https://rulechannel.alibaba.com/icbu?type=detail&ruleId=2041&cId=1307#/rule/detail?cId=1307&ruleId=2041 with 742,855 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 366 in Terms Of Service. ['highlight', 'paragraph']
❝whether through robots, spiders, automatic devices or manual
processes❞
96) kbimages1-a.akamaihd.net YES 2,804,124
✓ Loaded data request for https://authorize.kobo.com/terms/termsofuse from 29275d3c4eb19f92e3e2ab6b9b761d9f.pkl []
✓ Retrieved Terms Of Service for https://authorize.kobo.com/terms/termsofuse with 112,378 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 70 in Terms Of Service. ['highlight', 'paragraph']
❝using any "deep-link", "page-scrape", "robot", "spider" or other
automatic device, program, algorithm or methodology, or any similar
or equivalent manual process, to access, acquire, copy or monitor
any portion of the Service or any Site Content, or in any way
reproduce or circumvent the navigational structure or presentation
of the Service or any Site Content, to obtain or attempt to obtain
any materials, documents or information through any means not
purposely made available through the Site;❞
97) cdn.shoplightspeed.com MAYBE 2,776,500
✓ Loaded data request for https://www.lightspeedhq.com/legal/lightspeed-service-agreement/ from 51975501f9b3a7a5290e6924e77a1bd5.pkl []
✓ Retrieved Terms Of Service for https://www.lightspeedhq.com/legal/lightspeed-service-agreement/ with 260,497 bytes of text. []
𐄂 No direct matches found in 193 paragraphs found at https://www.lightspeedhq.com/legal/lightspeed-service-agreement/. []
98) i.bosscdn.com MAYBE 2,710,303
✓ Loaded data request for https://www.bossgoo.com/rule/term.html from 03a7e150a057ee890c93747e031b31a4.pkl []
✓ Retrieved Terms Of Service for https://www.bossgoo.com/rule/term.html with 119,868 bytes of text. []
𐄂 No direct matches found in 87 paragraphs found at https://www.bossgoo.com/rule/term.html. []
99) s.ecrater.com MAYBE 2,580,056
✓ Loaded data request for https://www.ecrater.com/terms.php from 9d35521495c3ffcb065cdc5e3c06c5c1.pkl []
✓ Retrieved Terms Of Service for https://www.ecrater.com/terms.php with 30,560 bytes of text. []
𐄂 No direct matches found in 104 paragraphs found at https://www.ecrater.com/terms.php. []
100) images.fineartamerica.com YES 2,538,450
✓ Loaded data request for https://fineartamerica.com/termsofuse.html from b86720da0e390886aef1c40a54383807.pkl []
✓ Retrieved Terms Of Service for https://fineartamerica.com/termsofuse.html with 92,919 bytes of text. []
✓ Found a total of 2 matching paragraphs out of 95 in Terms Of Service. ['highlight', 'paragraph']
❝You may not use any computer program tools, including, but not
limited to, web spiders, bots, indexers, robots, crawlers,
harvesters, or any other automatic device, program, algorithm, or
methodology, or any similar equivalent process❞
TOTAL 839,049,375 opted-out from 912,879,127. (UNAVAILABLE 26,322,084)
@alexjc
Copy link
Author

alexjc commented Jul 27, 2023

Please see inline comments prefixed with [alexjc] NOTE for thoughts that came up during manual review.

Version 3 of this .log file assumes that the Terms of the CDNs (e.g. Google, AWS, Wordpress) to determine the permissions of the content — which is not correct as the originating domain should be used to check Opt-Out. Unfortunately, that information was neither checked by the creators nor is it available in the dataset.

This "CDN Assumption" means that an absolute 2% of images are marked as YES (when they could be MAYBE) and 7% of images are marked as MAYBE (when they could be YES). These assumptions are not as strict as they could be (due to mistakes by dataset creators) and likely would be seen as non-compliant.

As it stands 91.91% of the content is Opted-Out, and a stricter arguably more compliant interpretation of CDNs would be 98.34% Opted-Out for the top 100 domains of laion2B-en and 912M images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment