Here is another site that comes in and ignores the robots.txt. They come in as a browser. It is not human, it is a robot and it is searching your site many times at once. This spider attacks your site and will pull down every page and link on your site multiple times. You can pass several gigs in a few minutes to these hogs. So who is Attributor?
Attributor is basically the brain police for the rich who own major content and they are looking at everything in your site to find any part, word, picture or anything that may belong to one of its clients. We saw them come in a suck 30gig of bandwidth on a site running Drupal and only 100 pages. How they did that? They search every variable in the hopes of finding something for a client on your pages. They do it many times at once looking for things that may be hidden.
While we firmly believe in the rights of creators to earn from their works we see this brute force theft of bandwidth out of sync in the real world. Having a robot steal your bandwidth to test if you have stolen a few lines or pictures is an oxymoron. You steal my bandwidth to find a stolen picture is a thief looking for a thief.
The attributor site claims they are looking for copyright works but stealing bandwidth to do it with spiders that ignore robots.txt is deplorable. They claim to scan “billions of websites” so yours is one of them.
Here is the IP you look for in your sever logs and what to put in your .htaccess to stop them:
deny from 22.214.171.124
It does not matter if you are trying to stop these idiots from finding your stolen photos as they are sucking your bandwidth without any return to you. Stop them now.
How do you find bandwidth hogs? All you have to do is find big page and data requests in your site logs. Send them to us and we will determine their origin. Generally the most hits should come from your staff and designers and after that if they are not google or one of the search engines they are something like this.
If you find a new IP they are using let us know.