- Modify /etc/nginx/nginx.conf file
- Modify /etc/nginx/sites-available/site.conf file
- Create /etc/nginx/useragent.rule file
Where to find user agent strings?
https://explore.whatismybrowser.com/useragents/explore/software_name/facebook-bot/
Looking for same but for Apache2? Here: https://techexpert.tips/apache/apache-blocking-bad-bots-crawlers/
Test:
[rubin@reaper ~]$ curl -A "instagram" -I https://plrm.podcastalot.com
HTTP/2 418
server: nginx/1.18.0
date: Mon, 26 Jun 2023 06:07:25 GMT
content-type: text/html
content-length: 197
You miss the point. I don't want my content on those sites in any form and I don't want my content to feed their algorithms.
Using robot.txt assumes they will 'obey' it. But they may choose not to. Its not mandatory in anyway.