- Modify /etc/nginx/nginx.conf file
- Modify /etc/nginx/sites-available/site.conf file
- Create /etc/nginx/useragent.rule file
Where to find user agent strings?
https://explore.whatismybrowser.com/useragents/explore/software_name/facebook-bot/
Looking for same but for Apache2? Here: https://techexpert.tips/apache/apache-blocking-bad-bots-crawlers/
Test:
[rubin@reaper ~]$ curl -A "instagram" -I https://plrm.podcastalot.com
HTTP/2 418
server: nginx/1.18.0
date: Mon, 26 Jun 2023 06:07:25 GMT
content-type: text/html
content-length: 197
Worth noting that most of these bots are 'good bots' (i.e. they will obey robots.txt). So you can avoid the nginx resource usage entirely by adding suitable robots.txt entries.
I think using nginx tests like this could have negative effects on showing OpenGraph metadata (including images).
If choosing this approach however I would probably respond with a
403
code to mark forbidden as bots are more likely to continue making attempts if they think the server would come back online.