Often underestimated, .htaccess files have significant influence over server configurations, directly impacting website performance, security, and functionality. Whether you’re an experienced webmaster or a newcomer to web development, grasping the intricacies of .htaccess is essential for optimizing your online presence.
This is why we created this ultimate resource for unlocking the potential of .htaccess files. In this in-depth guide, we explore all the key aspects of .htaccess and empower you to harness its capabilities effectively.
Throughout this article, we delve into the origins and core functionalities of .htaccess. So, bookmark this .htaccess guide, and prepare to delve deep into the intricacies of all things .htaccess.
So, bookmark this .htaccess guide for any .htaccess tutorial you may need. We cover all the .htaccess basics and more for your convenience. htaccess configures the way that a server deals with a variety of requests. Quite a few servers support it, like Apache – which most commercial hosting providers tend to favor. htaccess files work at directory level, which lets them supersede universal configuration settings of .htaccess commands further up the directory tree.
Why is it called .htaccess? This type of file was initially used to limit user access to specific directories, and the name has just stuck. It uses a subset of Apache’s http.conf settings directives that give a sysadmin control over who has access to each directory.
It looks to an associated .htpasswd file for the usernames and passwords of those people who have permission to access them. .htaccess still performs this valuable function, but it’s a file that’s grown in versatility to do more besides that. So we’ll have the .htaccess basics and more explained in this article.
Where will I find the .htaccess file?
An .htaccess tutorial will tell you that you could find one in every folder (directory) on your server. But typically, the web root folder (the one that contains everything of your website) will have one. It will usually have a name like public_html or www. If you’ve got a directory with numerous website subdirectories, you’ll typically find an .htaccess file in the main root ( public_html ) directory. And one in all of the subdirectories (/sitename) too.
Why can’t I find the .htaccess file?
Most file systems – file names that start with a dot ( . ) will be hidden. So by default, you won’t be able to see them. However, you can still get to them. If you look at your FTP client or File Manager, you’ll likely find a setting to “show hidden files.” It may be in some other location depending on which program you use. But you’ll usually find it if you look under “Preferences”, “Settings”, “Folder Options” or “View.”
What if I don’t have an .htaccess file?
The first thing to establish is that you definitely don’t have one. Check that you have set the system to “show hidden files” (or whatever it’s called on your system). So that you can be sure it really isn’t there. You should have a .htaccess file as they’re frequently created by default, but it’s always worth checking.
If you’ve looked everywhere and you still can’t find one, never fear because .htaccess basics are not hard to understand. And we’ve got a .htaccess guide for you here. You can make one by opening a text editor and creating a new document. It should not have the .txt or any other file extension. Just .htaccess, and make sure it’s saved in ASCII format (it shouldn’t be in UTF-8 or anything) as .htaccess. Transfer it to the right directory using FTP or the file manager in your web browser.
Handling an error code
One of your simple .htaccess basics is setting up error documents. Any .htaccess guide like this one will tell you that when a server receives a request, it responds by offering a document. Just like with HTML pages. Otherwise, it can pull that response from a particular application (as with Content Management Systems and other web apps).
If this process trips up, then the server reports an error and its corresponding code. Different types of errors have different error codes. And you’ve probably seen a 404 “Not Found” error quite a few times. It’s not the only one though.
Client Request Errors
- 400 — Bad Request
- 401 — Authorization Required
- 402 — Payment Required (not used yet)
- 403 — Forbidden
- 404 — Not Found
- 405 — Method Not Allowed
- 406 — Not Acceptable (encoding)
- 407 — Proxy Authentication Required
- 408 — Request Timed Out
- 409 — Conflicting Request
- 410 — Gone
- 411 — Content Length Required
- 412 — Precondition Failed
- 413 — Request Entity Too Long
- 414 — Request URI Too Long
- 415 — Unsupported Media Type.
Server Errors
- 500 — Internal Server Error
- 501 — Not Implemented
- 502 — Bad Gateway
- 503 — Service Unavailable
- 504 — Gateway Timeout
- 505 — HTTP Version Not Supported
What Happens by Default?
When there’s no specification on how to approach error-handling, the server just sends the message to the browser. Which in turn gives the user a general error message, but this isn’t especially helpful.
Creating Error Documents
At this point in your .htaccess guide, you’ll need an HTML document for each error code. You can call them anything, but you may want to consider a name that’s appropriate. Such as not-found.html or just 404.html.
Then, in the .htaccess file, determine which document goes with which error type.
ErrorDocument 400 /errors/bad-request.html
ErrorDocument 401 /errors/auth-reqd.html
ErrorDocument 403 /errors/forbid.html
ErrorDocument 404 /errors/not-found.html
ErrorDocument 500 /errors/server-err.html
Just note that each one gets its own line – and you’re done.
Alternatives to .htaccess – .htaccess guide for error-handling
Most CMS, like WordPress and Drupal – and web apps, will deal with these errors codes in their own way.
Password Protection With .htaccess
As we’ve said, .htaccess files were originally used to limit which users could get into certain directories. So let’s take a look at that in our .htaccess tutorial first.
.htpasswd – this file holds usernames and passwords for the .htaccess system
Each one sits on its own line like this:
username:encryptedpassword
for example:
jamesbrown:523xT67mD1
Note that this password isn’t the actual one, it’s just a cryptographic hash of the password. This means that it’s been put through an encryption algorithm, and this is what came out. It works in the other direction too. So each time a user logs in, the password text goes through that same algorithm. If it matches with what the user typed, they get access.
This is a highly secure way of storing passwords. Because even if someone gets into your .htpasswd file, all they’re seeing is hashed passwords – not the real ones. And there’s no way to use them to reconstruct the password either, because the algorithm is a one-way-street.
You can choose from a few different hashing algorithms:
- bcrypt — The securest one but chugging through the encryption process slows it down as a result. Apache and NGINX are compatible.
- md5 — The latest versions of Apache use this as their default hashing algorithm, but NGINX doesn’t support it.
Insecure Algorithms — These are best avoided.
- crypt() — was previously the default hashing function, but isn’t a secure option.
- SHA and Salted SHA.
.htaccess Guide to Adding Usernames and Passwords with CLI
You can use the command line or an SSH terminal to create an .htpasswd file and add username-password pairs to it directly.
.htpasswd is the command for dealing with the .htpasswd file.
Simply use the command with the -c option to create a new .htpasswd file. Then enter the directory path (the actual path on the server, not the URL). You can also add a user if you want to.
> htpasswd -c /usr/local/blah/.htpasswd jamesbrown
This makes a new .htpasswd file in the /blah/ directory, along with a record for a user called jamesbrown. It will then ask you for a password – also encrypted and stored using md5 encryption.
If an .htpasswd file already exists at that location, the new user just becomes part of the existing file. So it won’t create a new one. Otherwise, if you’d rather use the bcrypt hashing algorithm, go with the -b option.
Password hashing without the command line
If you’re only familiar with .htaccess basics, you may choose not to use the command line or SSH terminal. In that case, you can just create an .htpasswd file. Then, just use a text editor to fill everything in before uploading using FTP or file manager.
Of course, that leaves you with the task of encrypting the passwords. But that shouldn’t be a problem because there are lots of password encryption programs online. Many other .htaccess tutorials will probably approve of the htpasswd generator at Aspirine.org. Because it offers a few choices of algorithms that let you determine how strong the password is. Once you run it, copy your hashed password into the .htpasswd file.
You’ll only need one.htpasswd file for all your.htaccess files. So there’s no need to have one for each. One will do the job for the whole main server directory or web hosting account. But don’t put your .htpasswd file in a directory that anyone can access. So, not in public_html or www or any subdirectory. It’s safer from a security standpoint to put it somewhere that is only accessible from within the server itself.
Quick .htaccess Tutorial: How to use .htpasswd with .htaccess
If you want to have a .htaccess file for every directory, you can assign a set of users to have access to it. To grant universal access, do nothing, because this is enabled by default. If you want to limit who can get access, then your .htaccess file should look like this:
AuthUserFile /usr/local/etc/.htpasswd
AuthName "Name of Secure Area"
AuthType Basic
<Limit GET POST>
require valid-user
</Limit>
Line one shows the location of where your usernames and passwords are. Line two defines the name for the area you want to keep secure, and you can call it anything. Line three specifies “Basic” authentication, which is fine in most instances.
The <Limit>
tag defines what is being limited. In this instance, the ability to GET or POST to any file in the directory. Within the pair of <Limit>
tags is a list of who is allowed to access files.
In this example, access files can be accessed by any valid user. If you only want certain users to have access you can name them.
AuthUserFile /usr/local/etc/.htpasswd
AuthName "Name of Secure Area"
AuthType Basic
<Limit GET POST>
require user janebrown
require user jamesbrown
</Limit>
You can also grant/deny access based on the group where you put users – which is a real-time saver. You can do this by creating a group file and adding names. Give your group file a name, such as .htpeople, and have it look something like this:
admin: janebrown jamesbrown
staff: zappafrank agrenmorgen
Now it’s become something that you can refer to in your .htaccess file:
AuthUserFile /usr/local/etc/.htpasswd
AuthGroupFile /usr/local/etc/.htpeople
AuthName "Admin Area"
AuthType Basic
<Limit GET POST>
require group admin
</Limit>
The .htaccess guide for .htpasswd alternatives
It only makes sense to use .htaccess/.htpasswd to limit server file access if you have lots of static files. This approach appeared in the early days of websites, where they consisted of lots of HTML docs and other resources. If you’re using WordPress, you’ll have a feature that lets you do this as part of the system.
Enabling Server Side Includes (SSI) – .htaccess Tutorial
SSI is a simple scripting language which you would mainly use to embed HTML documents into other HTML documents. So you can easily re-use frequently-used elements like menus and headers.
<!-- include virtual="header.shtml" -->
It’s also got conditional directives (if, else, etc.) and variables, which makes it a complete scripting language. Although one that’s hard to use if you have anything more complicated in your project than one or two includes. If it gets to that point then a developer will usually be reaching for PHP or Perl instead.
Server Side Includes are enabled by default with some web-hosting servers. If yours isn’t, you can use your .htaccess file to enable it, like this:
AddType text/html .shtml
AddHandler server-parsed .shtml
Options Indexes FollowSymLinks Includes
This should enable SSI for all files that have the .shtml extension. You can tell SSI to parse .html files using a directive like this:
AddHandler server-parsed .html
Why bother?
Well, this way lets you use SSI without alerting anyone to the fact that you are doing so. On top of that, if you change implementations later, you can hold on to your .html file extensions. The only fly in the ointment here is that every .html file will then be parsed with SSI. And if you have many .html files that don’t need SSI parsing, it makes the server work needlessly harder. Thus, bogging it down for no extra benefit.
SSI on your Index page
To avoid parsing every single.html file without using SSI on your index home, you have to stipulate in your .htaccess file. Because when the web server looks for the directory index page, it will be hunting for index.html by default. If you aren’t parsing .html files, you must name your index page “named index.shtml” if you want SSI to work. Because then, your server won’t automatically look for it. To make that happen just add:
DirectoryIndex index.shtml index.html
This lets the web server know that the index.shtml file is the main one for the directory. The second parameter, index.html is a failsafe. This gets referred when it can’t find index.shtml.
IP Blacklisting and IP Whitelisting with .htaccess
If you’ve had problems from certain users/IP addresses, there are .htaccess basics you can use to blacklist/block. Otherwise, you can do the opposite and whitelist/approve everyone from particular addresses if you want to exclude everybody else.
Blacklisting by IP
This will let you blacklist addresses (numbers are examples):
order allow,deny
deny from 444.33.55.6
deny from 735.34.6.
allow from all
The first line says to evaluate the allow directives before the deny directives. This makes allow from all the default state. In this case, only those which match the deny directives will be denied. If you switched it round to say deny,allow, then the last thing it looked at would be the allow from all directive. This allows everybody, and overrides the deny statements.
Take note of the third line, which says deny from 735.34.6. This isn’t a complete IP address, but that’s okay because it denies every IP address in that block. Ergo, anything that begins with 735.34.6. You can include as many IP addresses as you like, one on each line, with a deny from directive.
Whitelisting by IP
The opposite of blacklisting is whitelisting — restricting everyone except those you specify. As you might suspect, the order directive has to be turned back to front. So that you deny access to everyone at first, but then allow certain addresses after.
order deny,allow
deny from all
allow from 111.22.3.4
allow from 789.56.4.
Domain names instead of IP addresses
You can also block or allow users of a domain name. This is helpful if people are moving between IP addresses. But it won’t work against anyone who has control of their reverse-DNS IP address mapping.
order allow,deny
deny from forinstance.com
allow from all
This works for subdomains too. In the example above, you will also block visitors from abc forinstance.com.
Block Users by Referrer – .htaccess Guide
If a website contains a link to your site and someone follows it, we call it a ‘referrer’. But this doesn’t only work for clickable hyperlinks to your website. Any page on the internet can link to your images directly. This is called hotlinking. It often steals your bandwidth, it can infringe on your copyright – and you don’t even get extra traffic from it. And it’s not just images either. A stranger can link to your other resources like CSS files and JS scripts, too.
This is bound to happen a little bit and most site owners tolerate it. But it’s the kind of thing that can easily escalate into something more abusive. And there are times when in-text clickable hyperlinks can cause you problems too. Like when they’re from troublesome or nefarious websites. These are just a few of the reasons why you may decide to deny requests that originate with particular referrers.
If you need to do this, you’ll have to activate the mod_rewrite module. Most web hosts enable it automatically. But if yours doesn’t, or you can’t tell if they have, you should get in touch and ask. If they’re reluctant to enable it – maybe think about getting a new host.
.htaccess basics – Directives that block a referrer depend on the mod_rewrite engine.
The code to block by referrer looks like this:
RewriteEngine on
RewriteCond % ^http://.*forinstance.com [NC,OR]
RewriteCond % ^http://.* forinstance2.com [NC,OR]
RewriteCond % ^http://.* forinstance3.com [NC]
RewriteRule .* - [F]
It’s slightly fiddly, so let’s go through it.
RewriteEngine on, on the first line tells the parser that some rewrite directives are on the way. Each of lines 2,3 and 4 blocks a single referring domain. To change this for your own purposes you would alter the domain name part (forinstance) and extension (.com). T
he back-slash in front of the .com is an escape character. The pattern matching used in the domain name is a standard expression. And the dot has a meaning in RegEx. So it must be “escaped” by using “/”.
The NC in the brackets is there to specify that the match shouldn’t be case sensitive. The OR literally means “or”, and indicates that more rules are on the way. As long as the URL is this one, this one or this one, go along with this rewrite rule.
The final line is the rewrite rule itself. The [F] stands for “Forbidden.” If a request comes from a referrer like the ones on the list, then it will be blocked. And a 403 Forbidden error will arrive.
An .htaccess Guide to Blocking Bots and Web Scrapers
Sometimes it isn’t even people trying to eat up your bandwidth, it’s robots. These programs come and lift your site information, typically to republish under some low-quality SEO outfit. There are genuine bots out there, such as the ones that come from the big search engines. But others are almost like cockroaches, scavenging and doing you no good whatsoever.
To date, the industry has identified hundreds of bots. You won’t ever be able to block them all, but at least, as many as possible. Here are some rewrite rules that will trip up 400+ known bots.
<IfModule mod_setenvif.c>
<IfModule mod_headers.c>
SetEnvIfNoCase User-Agent "^ALittle Client" bot
SetEnvIfNoCase User-Agent "^Go-http-client/1.1" bot
SetEnvIfNoCase User-Agent "^TprAdsTxtCrawler" bot
SetEnvIfNoCase User-Agent "^Photon/1.0" bot
SetEnvIfNoCase User-Agent .*Twitterbot/1.0.* bot
SetEnvIfNoCase User-Agent .*Screaming Frog SEO Spider.* bot
SetEnvIfNoCase User-Agent .*SurdotlyBot.* bot
SetEnvIfNoCase User-Agent .*curl.* bot
SetEnvIfNoCase User-Agent .*PixelTools.* bot
SetEnvIfNoCase User-Agent .*DataForSeoBot.* bot
SetEnvIfNoCase User-Agent .*PetalBot.* bot
SetEnvIfNoCase User-Agent .*weborama.* bot
SetEnvIfNoCase User-Agent .*CFNetwork.* bot
SetEnvIfNoCase User-Agent .*Python.* bot
SetEnvIfNoCase User-Agent .*python-requests.* bot
SetEnvIfNoCase User-Agent .*UptimeRobot.* bot
SetEnvIfNoCase User-Agent .*TprAdsTxtCrawler.* bot
SetEnvIfNoCase User-Agent .*Hybrid Advertising.* bot
SetEnvIfNoCase User-Agent .*Crawler.* bot
SetEnvIfNoCase User-Agent "^LinksMasterRoBot" bot
SetEnvIfNoCase User-Agent "^wp_is_mobile" bot
SetEnvIfNoCase User-Agent "^LinkStats" bot
SetEnvIfNoCase User-Agent "^CNCat" bot
SetEnvIfNoCase User-Agent "^linkdexbot" bot
SetEnvIfNoCase User-Agent "^meanpathbot" bot
SetEnvIfNoCase User-Agent "^NetSeer" bot
SetEnvIfNoCase User-Agent "^statdom.ru" bot
SetEnvIfNoCase User-Agent "^StatOnlineRuBot" bot
SetEnvIfNoCase User-Agent "^WebArtexBot" bot
SetEnvIfNoCase User-Agent "^Miralinks Robot" bot
SetEnvIfNoCase User-Agent "^Web-Monitoring" bot
SetEnvIfNoCase User-Agent "^Runet-Research-Crawler" bot
SetEnvIfNoCase User-Agent "^pr-cy.ru" bot
SetEnvIfNoCase User-Agent "^SeopultContentAnalyzer" bot
SetEnvIfNoCase User-Agent "^Seopult" bot
SetEnvIfNoCase User-Agent "^uptimerobot" bot
SetEnvIfNoCase User-Agent "^spbot" bot
SetEnvIfNoCase User-Agent "^rogerbot" bot
SetEnvIfNoCase User-Agent "^sitebot" bot
SetEnvIfNoCase User-Agent "^dotbot" bot
SetEnvIfNoCase User-Agent "^Linux" bot
SetEnvIfNoCase User-Agent "^SemrushBot" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SA" bot
SetEnvIfNoCase User-Agent "^SemrushBot-BA" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SI" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SWA" bot
SetEnvIfNoCase User-Agent "^SemrushBot-CT" bot
SetEnvIfNoCase User-Agent "^SemrushBot-BM" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SEOAB" bot
SetEnvIfNoCase User-Agent "^MJ12bot" bot
SetEnvIfNoCase User-Agent "^Vivaldi" bot
SetEnvIfNoCase User-Agent "^ArchiveBot" bot
SetEnvIfNoCase User-Agent "^archive.org_bot" bot
SetEnvIfNoCase User-Agent "^ia_archiver" bot
SetEnvIfNoCase User-Agent "^ia_archiver-web.archive.org" bot
SetEnvIfNoCase User-Agent "^PaleMoon" bot
SetEnvIfNoCase User-Agent "^Pale Moon" bot
SetEnvIfNoCase User-Agent "Sovetnik" bot
SetEnvIfNoCase User-Agent "sovetnik" bot
SetEnvIfNoCase User-Agent "80legs" bot
SetEnvIfNoCase User-Agent "360Spider" bot
SetEnvIfNoCase User-Agent "^8484 Boston Project" bot
SetEnvIfNoCase User-Agent "Aboundex" bot
SetEnvIfNoCase User-Agent "^Alexibot" bot
SetEnvIfNoCase User-Agent "^asterias" bot
SetEnvIfNoCase User-Agent "^attach" bot
SetEnvIfNoCase User-Agent "^AIBOT" bot
SetEnvIfNoCase User-Agent "^Accelerator" bot
SetEnvIfNoCase User-Agent "^Ants" bot
SetEnvIfNoCase User-Agent "^AhrefsBot" bot
SetEnvIfNoCase User-Agent "^AhrefsSiteAudit" bot
SetEnvIfNoCase User-Agent "^Ask Jeeves" bot
SetEnvIfNoCase User-Agent "^Atomic_Email_Hunter" bot
SetEnvIfNoCase User-Agent "^atSpider" bot
SetEnvIfNoCase User-Agent "^autoemailspider" bot
SetEnvIfNoCase User-Agent "archive.org_bot" bot
SetEnvIfNoCase User-Agent "^a.pr-cy.ru" bot
SetEnvIfNoCase User-Agent "^BackDoorBot" bot
SetEnvIfNoCase User-Agent "^BackWeb" bot
SetEnvIfNoCase User-Agent "Bandit" bot
SetEnvIfNoCase User-Agent "^BatchFTP" bot
SetEnvIfNoCase User-Agent "^Bigfoot" bot
SetEnvIfNoCase User-Agent "^Black.Hole" bot
SetEnvIfNoCase User-Agent "^BlackWidow" bot
SetEnvIfNoCase User-Agent "^BlowFish" bot
SetEnvIfNoCase User-Agent "^BotALot" bot
SetEnvIfNoCase User-Agent "Buddy" bot
SetEnvIfNoCase User-Agent "^BuiltBotTough" bot
SetEnvIfNoCase User-Agent "^Bullseye" bot
SetEnvIfNoCase User-Agent "^BunnySlippers" bot
SetEnvIfNoCase User-Agent "^Baiduspider" bot
SetEnvIfNoCase User-Agent "^Bot\ mailto:[email protected]" bot
SetEnvIfNoCase User-Agent "^Buddy" bot
SetEnvIfNoCase User-Agent "^bwh3_user_agent" bot
SetEnvIfNoCase User-Agent "BLEXBot" bot
SetEnvIfNoCase User-Agent "^Cegbfeieh" bot
SetEnvIfNoCase User-Agent "^CheeseBot" bot
SetEnvIfNoCase User-Agent "^CherryPicker" bot
SetEnvIfNoCase User-Agent "^ChinaClaw" bot
SetEnvIfNoCase User-Agent "Collector" bot
SetEnvIfNoCase User-Agent "Copier" bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bot
SetEnvIfNoCase User-Agent "^cosmos" bot
SetEnvIfNoCase User-Agent "^Crescent" bot
SetEnvIfNoCase User-Agent "^Custo" bot
SetEnvIfNoCase User-Agent "^Cogentbot" bot
SetEnvIfNoCase User-Agent "^China" bot
SetEnvIfNoCase User-Agent "^ContactBot" bot
SetEnvIfNoCase User-Agent "^ContentSmartz" bot
SetEnvIfNoCase User-Agent "^CCBot" bot
SetEnvIfNoCase User-Agent "^Cluuz" bot
SetEnvIfNoCase User-Agent "^DISCo" bot
SetEnvIfNoCase User-Agent "^DIIbot" bot
SetEnvIfNoCase User-Agent "^DittoSpyder" bot
SetEnvIfNoCase User-Agent "^Download\ Demon" bot
SetEnvIfNoCase User-Agent "^Download\ Devil" bot
SetEnvIfNoCase User-Agent "^Download\ Wonder" bot
SetEnvIfNoCase User-Agent "^dragonfly" bot
SetEnvIfNoCase User-Agent "^Drip" bot
SetEnvIfNoCase User-Agent "^DataCha0s" bot
SetEnvIfNoCase User-Agent "^DBrowse" bot
SetEnvIfNoCase User-Agent "^Demo Bot" bot
SetEnvIfNoCase User-Agent "^Dolphin" bot
SetEnvIfNoCase User-Agent "Download" bot
SetEnvIfNoCase User-Agent "^DSurf15" bot
SetEnvIfNoCase User-Agent "^eCatch" bot
SetEnvIfNoCase User-Agent "^EasyDL" bot
SetEnvIfNoCase User-Agent "^ebingbong" bot
SetEnvIfNoCase User-Agent "^EirGrabber" bot
SetEnvIfNoCase User-Agent "^EmailCollector" bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bot
SetEnvIfNoCase User-Agent "^EmailWolf" bot
SetEnvIfNoCase User-Agent "^EroCrawler" bot
SetEnvIfNoCase User-Agent "^Exabot" bot
SetEnvIfNoCase User-Agent "^Express\ WebPictures" bot
SetEnvIfNoCase User-Agent "Extractor" bot
SetEnvIfNoCase User-Agent "^EyeNetIE" bot
SetEnvIfNoCase User-Agent "^EBrowse" bot
SetEnvIfNoCase User-Agent "^Educate Search VxB" bot
SetEnvIfNoCase User-Agent "EmailSpider" bot
SetEnvIfNoCase User-Agent "^ESurf15" bot
SetEnvIfNoCase User-Agent "ExtractorPro" bot
SetEnvIfNoCase User-Agent "^Foobot" bot
SetEnvIfNoCase User-Agent "^focusbot" bot
SetEnvIfNoCase User-Agent "^flunky" bot
SetEnvIfNoCase User-Agent "^FrontPage" bot
SetEnvIfNoCase User-Agent "^FileHound" bot
SetEnvIfNoCase User-Agent "^FlashGet" bot
SetEnvIfNoCase User-Agent "^Flexum" bot
SetEnvIfNoCase User-Agent "^Franklin Locator" bot
SetEnvIfNoCase User-Agent "^FSurf15" bot
SetEnvIfNoCase User-Agent "^Full Web Bot" bot
SetEnvIfNoCase User-Agent "^Go-Ahead-Got-It" bot
SetEnvIfNoCase User-Agent "^gotit" bot
SetEnvIfNoCase User-Agent "^GrabNet" bot
SetEnvIfNoCase User-Agent "^Grafula" bot
SetEnvIfNoCase User-Agent "^GetRight" bot
SetEnvIfNoCase User-Agent "^Gets" bot
SetEnvIfNoCase User-Agent "^GetWeb!" bot
SetEnvIfNoCase User-Agent "^Gigabot" bot
SetEnvIfNoCase User-Agent "^Go!Zilla" bot
SetEnvIfNoCase User-Agent "^GoZilla" bot
SetEnvIfNoCase User-Agent "^Grab.*Site" bot
SetEnvIfNoCase User-Agent "^Grabber" bot
SetEnvIfNoCase User-Agent "^grub-client" bot
SetEnvIfNoCase User-Agent "^gsa-crawler" bot
SetEnvIfNoCase User-Agent "^Guestbook Auto Submitter" bot
SetEnvIfNoCase User-Agent "^Gulliver" bot
SetEnvIfNoCase User-Agent "^Guzzle" bot
SetEnvIfNoCase User-Agent "^GuzzleHttp" bot
SetEnvIfNoCase User-Agent "^Harvest" bot
SetEnvIfNoCase User-Agent "^hloader" bot
SetEnvIfNoCase User-Agent "^HMView" bot
SetEnvIfNoCase User-Agent "^HTTrack" bot
SetEnvIfNoCase User-Agent "^humanlinks" bot
SetEnvIfNoCase User-Agent "HubSpot" bot
SetEnvIfNoCase User-Agent "^IlseBot" bot
SetEnvIfNoCase User-Agent "^Image\ Stripper" bot
SetEnvIfNoCase User-Agent "^Image\ Sucker" bot
SetEnvIfNoCase User-Agent "Indy\ Library" bot
SetEnvIfNoCase User-Agent "^InfoNavibot" bot
SetEnvIfNoCase User-Agent "^InfoTekies" bot
SetEnvIfNoCase User-Agent "^Intelliseek" bot
SetEnvIfNoCase User-Agent "^InterGET" bot
SetEnvIfNoCase User-Agent "^Internet\ Ninja" bot
SetEnvIfNoCase User-Agent "^Iria" bot
SetEnvIfNoCase User-Agent "^IBrowse" bot
SetEnvIfNoCase User-Agent "^Industry Program" bot
SetEnvIfNoCase User-Agent "^inktomi\.com" bot
SetEnvIfNoCase User-Agent "^Internet\ Ninja"" bot
SetEnvIfNoCase User-Agent "^ISC Systems iRc Search" bot
SetEnvIfNoCase User-Agent "^IUPUI Research" bot
SetEnvIfNoCase User-Agent "^ia_archiver" bot
SetEnvIfNoCase User-Agent "^Jakarta" bot
SetEnvIfNoCase User-Agent "^JennyBot" bot
SetEnvIfNoCase User-Agent "^JetCar" bot
SetEnvIfNoCase User-Agent "^JOC" bot
SetEnvIfNoCase User-Agent "^JustView" bot
SetEnvIfNoCase User-Agent "^Jyxobot" bot
SetEnvIfNoCase User-Agent "^Java" bot
SetEnvIfNoCase User-Agent "^jetcar" bot
SetEnvIfNoCase User-Agent "^Kenjin.Spider" bot
SetEnvIfNoCase User-Agent "^Keyword.Density" bot
SetEnvIfNoCase User-Agent "^larbin" bot
SetEnvIfNoCase User-Agent "^LexiBot" bot
SetEnvIfNoCase User-Agent "^lftp" bot
SetEnvIfNoCase User-Agent "^libWeb/clsHTTP" bot
SetEnvIfNoCase User-Agent "^likse" bot
SetEnvIfNoCase User-Agent "^LinkextractorPro" bot
SetEnvIfNoCase User-Agent "^LinkScan/8.1a.Unix" bot
SetEnvIfNoCase User-Agent "^LNSpiderguy" bot
SetEnvIfNoCase User-Agent "^LinkWalker" bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bot
SetEnvIfNoCase User-Agent "^LWP::Simple" bot
SetEnvIfNoCase User-Agent "^LARBIN-EXPERIMENTAL" bot
SetEnvIfNoCase User-Agent "^leech" bot
SetEnvIfNoCase User-Agent "^LeechFTP" bot
SetEnvIfNoCase User-Agent "^LetsCrawl.com" bot
SetEnvIfNoCase User-Agent "^libwww-perl" bot
SetEnvIfNoCase User-Agent "^Lincoln State Web Browser" bot
SetEnvIfNoCase User-Agent "^LMQueueBot" bot
SetEnvIfNoCase User-Agent "^LinkpadBot" bot
SetEnvIfNoCase User-Agent "^Magnet" bot
SetEnvIfNoCase User-Agent "^MegaIndex.ru" bot
SetEnvIfNoCase User-Agent "^Mag-Net" bot
SetEnvIfNoCase User-Agent "^MarkWatch" bot
SetEnvIfNoCase User-Agent "^Mass\ Downloader" bot
SetEnvIfNoCase User-Agent "^Mata.Hari" bot
SetEnvIfNoCase User-Agent "^Memo" bot
SetEnvIfNoCase User-Agent "^Microsoft.URL" bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control" bot
SetEnvIfNoCase User-Agent "^MIDown\ tool" bot
SetEnvIfNoCase User-Agent "^MIIxpc" bot
SetEnvIfNoCase User-Agent "^Mirror" bot
SetEnvIfNoCase User-Agent "^Missigua\ Locator" bot
SetEnvIfNoCase User-Agent "^Mister\ PiX" bot
SetEnvIfNoCase User-Agent "^moget" bot
SetEnvIfNoCase User-Agent "^Mac Finder" bot
SetEnvIfNoCase User-Agent "^MFC Foundation Class Library" bot
SetEnvIfNoCase User-Agent "^Missauga Loca" bot
SetEnvIfNoCase User-Agent "^Missouri College Browse" bot
SetEnvIfNoCase User-Agent "^Mizzu Labs" bot
SetEnvIfNoCase User-Agent "^Mo College" bot
SetEnvIfNoCase User-Agent "^MVAClient" bot
SetEnvIfNoCase User-Agent "^MJ12bot" bot
SetEnvIfNoCase User-Agent "^mfibot" bot
SetEnvIfNoCase User-Agent "^NAMEPROTECT" bot
SetEnvIfNoCase User-Agent "^Navroad" bot
SetEnvIfNoCase User-Agent "^NearSite" bot
SetEnvIfNoCase User-Agent "^NetAnts" bot
SetEnvIfNoCase User-Agent "^Netcraft" bot
SetEnvIfNoCase User-Agent "^NetMechanic" bot
SetEnvIfNoCase User-Agent "^NetSpider" bot
SetEnvIfNoCase User-Agent "^Net\ Vampire" bot
SetEnvIfNoCase User-Agent "^NetZIP" bot
SetEnvIfNoCase User-Agent "^NextGenSearchBot" bot
SetEnvIfNoCase User-Agent "^NG" bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bot
SetEnvIfNoCase User-Agent "^niki-bot" bot
SetEnvIfNoCase User-Agent "^NimbleCrawler" bot
SetEnvIfNoCase User-Agent "^Ninja" bot
SetEnvIfNoCase User-Agent "^NPbot" bot
SetEnvIfNoCase User-Agent "^nutch-1.4" bot
SetEnvIfNoCase User-Agent "^NameOfAgent (CMS Spider)" bot
SetEnvIfNoCase User-Agent "^NASA Search" bot
SetEnvIfNoCase User-Agent "^Net\ Reaper" bot
SetEnvIfNoCase User-Agent "^Ninja" bot
SetEnvIfNoCase User-Agent "^Nsauditor" bot
SetEnvIfNoCase User-Agent "^NetLyzer" bot
SetEnvIfNoCase User-Agent "^Octopus" bot
SetEnvIfNoCase User-Agent "^Offline\ Explorer" bot
SetEnvIfNoCase User-Agent "^Offline\ Navigator" bot
SetEnvIfNoCase User-Agent "^Offline" bot
SetEnvIfNoCase User-Agent "^Openfind" bot
SetEnvIfNoCase User-Agent "^OutfoxBot" bot
SetEnvIfNoCase User-Agent "^PageGrabber" bot
SetEnvIfNoCase User-Agent "^Papa\ Foto" bot
SetEnvIfNoCase User-Agent "^pavuk" bot
SetEnvIfNoCase User-Agent "^pcBrowser" bot
SetEnvIfNoCase User-Agent "^PHP\ version\ tracker" bot
SetEnvIfNoCase User-Agent "^Pockey" bot
SetEnvIfNoCase User-Agent "^ProPowerBot/2.14" bot
SetEnvIfNoCase User-Agent "^ProWebWalker" bot
SetEnvIfNoCase User-Agent "^psbot" bot
SetEnvIfNoCase User-Agent "^Pump" bot
SetEnvIfNoCase User-Agent "^ParseMX" bot
SetEnvIfNoCase User-Agent "^Page.*Saver" bot
SetEnvIfNoCase User-Agent "^PBrowse" bot
SetEnvIfNoCase User-Agent "^PEval" bot
SetEnvIfNoCase User-Agent "^Pita" bot
SetEnvIfNoCase User-Agent "^Poirot" bot
SetEnvIfNoCase User-Agent "^Port Huron Labs" bot
SetEnvIfNoCase User-Agent "^Production Bot" bot
SetEnvIfNoCase User-Agent "^Program Shareware" bot
SetEnvIfNoCase User-Agent "^PSurf15" bot
SetEnvIfNoCase User-Agent "^psycheclone" bot
SetEnvIfNoCase User-Agent "^QueryN.Metasearch" bot
SetEnvIfNoCase User-Agent "^RealDownload" bot
SetEnvIfNoCase User-Agent "Reaper" bot
SetEnvIfNoCase User-Agent "Recorder" bot
SetEnvIfNoCase User-Agent "^ReGet" bot
SetEnvIfNoCase User-Agent "^RepoMonkey" bot
SetEnvIfNoCase User-Agent "^RMA" bot
SetEnvIfNoCase User-Agent "^RookeeBot" bot
SetEnvIfNoCase User-Agent "^Readability" bot
SetEnvIfNoCase User-Agent "^Reaper" bot
SetEnvIfNoCase User-Agent "^RSurf15" bot
SetEnvIfNoCase User-Agent "Siphon" bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bot
SetEnvIfNoCase User-Agent "^SlySearch" bot
SetEnvIfNoCase User-Agent "^SmartDownload" bot
SetEnvIfNoCase User-Agent "^Snake" bot
SetEnvIfNoCase User-Agent "^Snapbot" bot
SetEnvIfNoCase User-Agent "^Snoopy" bot
SetEnvIfNoCase User-Agent "^sogou" bot
SetEnvIfNoCase User-Agent "^SpaceBison" bot
SetEnvIfNoCase User-Agent "^SpankBot" bot
SetEnvIfNoCase User-Agent "^spanner" bot
SetEnvIfNoCase User-Agent "^Sqworm" bot
SetEnvIfNoCase User-Agent "Stripper" bot
SetEnvIfNoCase User-Agent "Sucker" bot
SetEnvIfNoCase User-Agent "^SuperBot" bot
SetEnvIfNoCase User-Agent "^SuperHTTP" bot
SetEnvIfNoCase User-Agent "^Surfbot" bot
SetEnvIfNoCase User-Agent "^suzuran" bot
SetEnvIfNoCase User-Agent "^Szukacz/1.4" bot
SetEnvIfNoCase User-Agent "^SeznamBot" bot
SetEnvIfNoCase User-Agent "^Site-Shot" bot
SetEnvIfNoCase User-Agent "^Slackbot-LinkExpanding" bot
SetEnvIfNoCase User-Agent "^Scrapy" bot
SetEnvIfNoCase User-Agent "^Spider/Bot" bot
SetEnvIfNoCase User-Agent "^Scooter" bot
SetEnvIfNoCase User-Agent "^searchbot [email protected]" bot
SetEnvIfNoCase User-Agent "^SEO search Crawler" bot
SetEnvIfNoCase User-Agent "^SEOsearch" bot
SetEnvIfNoCase User-Agent "^ShablastBot" bot
SetEnvIfNoCase User-Agent "^Snagger" bot
SetEnvIfNoCase User-Agent "^snap.com beta crawler" bot
SetEnvIfNoCase User-Agent "^sogou develop spider" bot
SetEnvIfNoCase User-Agent "^Sogou Orion spider" bot
SetEnvIfNoCase User-Agent "^sogou spider" bot
SetEnvIfNoCase User-Agent "^Sogou web spider" bot
SetEnvIfNoCase User-Agent "^sohu agent" bot
SetEnvIfNoCase User-Agent "^SSurf15" bot
SetEnvIfNoCase User-Agent "^SafeSearch_microdata_crawler_" bot
SetEnvIfNoCase User-Agent "^SafeDNSBot" bot
SetEnvIfNoCase User-Agent "^SafeDNSBot_" bot
SetEnvIfNoCase User-Agent "^tAkeOut" bot
SetEnvIfNoCase User-Agent "^Teleport" bot
SetEnvIfNoCase User-Agent "^Telesoft" bot
SetEnvIfNoCase User-Agent "^TurnitinBot/1.5" bot
SetEnvIfNoCase User-Agent "^The.Intraformant" bot
SetEnvIfNoCase User-Agent "^TheNomad" bot
SetEnvIfNoCase User-Agent "^TightTwatBot" bot
SetEnvIfNoCase User-Agent "^Titan" bot
SetEnvIfNoCase User-Agent "^True_bot" bot
SetEnvIfNoCase User-Agent "^turingos" bot
SetEnvIfNoCase User-Agent "^TurnitinBot" bot
SetEnvIfNoCase User-Agent "^Teleport\ Pro" bot
SetEnvIfNoCase User-Agent "^Triton" bot
SetEnvIfNoCase User-Agent "^TSurf15" bot
SetEnvIfNoCase User-Agent "^Twiceler" bot
SetEnvIfNoCase User-Agent "^URLy.Warning" bot
SetEnvIfNoCase User-Agent "^Under the Rainbow" bot
SetEnvIfNoCase User-Agent "^Yo-yo" bot
SetEnvIfNoCase User-Agent "^Yanga" bot
SetEnvIfNoCase User-Agent "^Vacuum" bot
SetEnvIfNoCase User-Agent "^VCI" bot
SetEnvIfNoCase User-Agent "^VoidEYE" bot
SetEnvIfNoCase User-Agent "^Virusdie_crawler" bot
SetEnvIfNoCase User-Agent "^VadixBot" bot
SetEnvIfNoCase User-Agent "^voyager" bot
SetEnvIfNoCase User-Agent "^Web\ Image\ Collector" bot
SetEnvIfNoCase User-Agent "^Web\ Sucker" bot
SetEnvIfNoCase User-Agent "^WebAuto" bot
SetEnvIfNoCase User-Agent "^WebBandit" bot
SetEnvIfNoCase User-Agent "^Webclipping.com" bot
SetEnvIfNoCase User-Agent "^WebCopier" bot
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bot
SetEnvIfNoCase User-Agent "^WebFetch" bot
SetEnvIfNoCase User-Agent "^WebGo\ IS" bot
SetEnvIfNoCase User-Agent "^Web.Image.Collector" bot
SetEnvIfNoCase User-Agent "^WebLeacher" bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bot
SetEnvIfNoCase User-Agent "^WebReaper" bot
SetEnvIfNoCase User-Agent "^WebSauger" bot
SetEnvIfNoCase User-Agent "^Website\ eXtractor" bot
SetEnvIfNoCase User-Agent "^Website\ Quester" bot
SetEnvIfNoCase User-Agent "^Webster" bot
SetEnvIfNoCase User-Agent "^WebStripper" bot
SetEnvIfNoCase User-Agent "^WebWhacker" bot
SetEnvIfNoCase User-Agent "^WebZIP" bot
SetEnvIfNoCase User-Agent "Whacker" bot
SetEnvIfNoCase User-Agent "^Widow" bot
SetEnvIfNoCase User-Agent "^WISENutbot" bot
SetEnvIfNoCase User-Agent "^WWWOFFLE" bot
SetEnvIfNoCase User-Agent "^WWW-Collector-E" bot
SetEnvIfNoCase User-Agent "^W3C-checklink" bot
SetEnvIfNoCase User-Agent "^Weazel" bot
SetEnvIfNoCase User-Agent "^Web.*Spy" bot
SetEnvIfNoCase User-Agent "^WebAlta" bot
SetEnvIfNoCase User-Agent "^WebCapture" bot
SetEnvIfNoCase User-Agent "^WebMirror" bot
SetEnvIfNoCase User-Agent "^WebRecorder" bot
SetEnvIfNoCase User-Agent "^WebSpy" bot
SetEnvIfNoCase User-Agent "^WebVulnCrawl.unknown" bot
SetEnvIfNoCase User-Agent "^Wells Search" bot
SetEnvIfNoCase User-Agent "^WEP Search" bot
SetEnvIfNoCase User-Agent "^www\.asona\.org" bot
SetEnvIfNoCase User-Agent "^Wget" bot
SetEnvIfNoCase User-Agent "^Xaldon" bot
SetEnvIfNoCase User-Agent "^Xenu" bot
SetEnvIfNoCase User-Agent "^Xaldon\ WebSpider" bot
SetEnvIfNoCase User-Agent "^Zeus" bot
SetEnvIfNoCase User-Agent "^ZmEu" bot
SetEnvIfNoCase User-Agent "^Zyborg" bot
SetEnvIfNoCase User-Agent "^_CommonCrawler_Node_" bot
SetEnvIfNoCase User-Agent "^_Cliqzbot" bot
SetEnvIfNoCase User-Agent "^_Baiduspider" bot
SetEnvIfNoCase User-Agent "^_Exabot" bot
SetEnvIfNoCase User-Agent "^_GrapeshotCrawler" bot
SetEnvIfNoCase User-Agent "^_Gluten_Free_Crawler" bot
SetEnvIfNoCase User-Agent "^_DeuSu" bot
SetEnvIfNoCase User-Agent "^_Dataprovider" bot
SetEnvIfNoCase User-Agent "^_DuckDuckGo-Favicons-Bot" bot
SetEnvIfNoCase User-Agent "^_SeznamBot" bot
SetEnvIfNoCase User-Agent "^_007ac9_Crawler" bot
SetEnvIfNoCase User-Agent "^_wmtips" bot
SetEnvIfNoCase User-Agent "^rv" bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bot
</Limit>
</IfModule>
</IfModule>
..
Specifying a Default File for a Directory
When a server receives a URL request but with no specified file name, it assumes the URL refers to a directory. So, here is a .htaccess guide on what to do. If you request http: forinstance.com, Apache (and most servers) will look for the domain in the root directory. Typically /public_html or something like it, such as /forinstance-com – in order to find the default file. The default file will be called index.html by default. Because when the Internet was young, websites were often just a bunch of docs bundled together. And “home” pages were often no more than an index, so that you knew where everything was.
Of course, nowadays you might not want index.html to be the default page. Perhaps because you may want a different file type. So index.shtml, index.xml, or index.php might be more appropriate. Or maybe you don’t think of your home page as an “index,” and want to call it something else. Like home.html or primary.html.
Set the Default Directory Page
One of the .htaccess basics is letting you set the default page for a directory with ease:
DirectoryIndex [filename goes here]
If you want your default to be home.html it’s as simple as using:
DirectoryIndex home.html
.htaccess Guide to Setting More Than One Default Page
You can also set more than one DirectoryIndex:
DirectoryIndex index.php index.shtml index.html
The way this works is that the web server looks for the first one first. If it can’t find that, it looks for the second one, and on it goes. But why would you need to do this? Wouldn’t you know which file you wanted to use as your default page? Keep in mind that one of the .htaccess basics is that it influences its own directory. And each subdirectory too until it’s overruled by a more local file.
So, an .htaccess file in your root directory can give instructions for lots of subdirectories. And in turn, they could all have their own default page names. So, imagine you put all those rules into just one .htaccess file in the root. Then, it spares you the tedious work of duplicating all the directives it contains at the level of every directory.
.htaccess Guide to URL Rewriting and URL Redirects
An important part of this .htaccess guide is that the most common uses of .htaccess files is URL redirects. So, for example, the URL for a document or resource may have changed. Perhaps because you’ve moved things around on your website, or you’ve changed domain names. Then, in that case, URL redirects can help you.
301 or 302
There are two types of redirect error codes that the server will generate, namely 301 and 302.
301 tells you that something has “Permanently Moved.” Meanwhile, 302 means it “Moved Temporarily.” In most cases, 301 does a perfectly good job. And perhaps more importantly, it gets SEO brownie points since the original URL may pick up from the new page.
It will also make most browsers update their bookmarks and cache the old-to-new mapping. This consequently lets them request the new URL when the original is being looked for. For a permanently changed URL these are all the responses you want.
There’s not much you can gain from using 302 redirects. Mainly, because there’s rarely a reason to change a URL on a temporary basis. Changing one at all is not something that anybody should really want to do. However, sometimes you just have to do it. And there are usually better options to changing it only to change it back later. At least, at the time of writing this .htaccess guide.
Redirect or Rewrite
You can change a URL with .htaccess directives in a couple of ways — the Redirect command and the mod_rewrite engine. The Redirect command tells the browser which other URL it should be looking out for. The mod_rewrite tool will normally “translate” the URL that’s in the request. Turns it into something the file system or CMS can understand. Then, it treats the request as though the translated URL was the one that was requested.
From the perspective of the web browser it’s business as usual. It gets the content it requested and carries on as if nothing happened.
The mod_rewrite tool is also able to produce 301 redirects that work like the Redirect command. But with a greater number of possible rules instead. Including elaborate pattern matching and rewriting instructions – which is beyond what Redirect can do.
Basic Page Redirect – .htaccess guide
For redirecting one page to another URL, the code looks like this:
Redirect 301 /relative-url.html http://forinstance.com/full-url.html
A single space separates each of the four parts of this one-line command, so you have:
- the Redirect command itself
- its type ( 301 – Moved Permanently )
- the original page’s relative URL
- the full URL of the new page.
The relative URL is relative to the directory that contains the .htaccess file. Which will normally be the web root, or the root of the domain.
So, if http://forinstance.com/blog.php had been moved to http://blog.forinstance.com, the code would be:
Redirect 301 /blog.php http://blog.forinstance.com
Redirecting a large section – .htaccess guide
Have you made changes to your directory structure yet? If you haven’t change your page names, redirect all requests for a particular directory to the new one.
Redirect 301 /old-directory http://forinstance.com/new-directory
Redirecting an entire site – .htaccess guide
But how about if your entire site has moved to a new URL? No problem.
Redirect 301 / http://thenewurl.com
Redirecting www to non-www – .htaccess guide
More and more websites are turning their back on the www subdomain. There’s never really been a need for it. It’s a throwback to when website owners used a server to look after many of their own documents. And the www directory was where they put anything they wanted to offer others.
Some still use it to this day, but many have moved on. It’s become such a habit for users to type “www.” in front of every URL. Therefore, making it tricky for you if yours has been short of those letters. However, the mod_rewrite module can help you with this. And you probably have one on your web host’s dashboard already.
Options +FollowSymlinks
RewriteEngine on
RewriteCond % ^www.forinstance.com [NC]
RewriteRule ^(.*)$ http://forinstance.org/$1 [R=301,NC]
But be careful! Many other .htaccess and mod_rewrite guides will give you some version of this code to achieve this:
Options +FollowSymlinks
RewriteEngine on
RewriteCond % !^forinstance.com [NC]
RewriteRule ^(.*)$ http://forinstcance.org/$1 [R=301,NC]
Can you see what’s wrong with it?
All subdomains are redirected to the primary domain! Which means not just www.forinstance.com, but others like blog.forinstance.com and admin.forinstance.com too. Not ideal behaviour!
Redirecting to www – .htaccess guide
So, what happens if you’re using the www subdomain? You should probably set up a redirect to make sure people get to where they’re trying to go. Especially now that fewer people are likely to automatically add that www to the beginning of URLs. All you need to do is reverse the code to achieve this.
RewriteEngine On
RewriteCond % ^forinstance.com [NC
RewriteRule ^(.*) http://www.website.com/$1 [R=301,NC]
One thing not to do:
A number of .htaccess guides recommend redirecting 404 errors to your home page. While this is possible, we’re writing here in our .htaccess guide that it’s actually an awful idea. Because it leaves visitors confused. They will be expecting another page, and instead get your homepage. A 404-error page would have told them exactly what they needed to know – whereas this does not. And anyway, what’s the problem with admitting that a page can’t be found? There’s no shame in it.
Why use the .htaccess basics in this .htaccess guide rather than other approaches? Redirects can be set up with server-side scripting, like in PHP files. They can also be set up from within your CMS – which is pretty much the same thing. But using .htaccess is usually the fastest type of redirect. With PHP-based redirects, or other server-side scripting languages, the entire request must be completed. And the script is actually interpreted before a redirect message is sent to the browser.
As any .htaccess guide will tell you, using .htaccess redirects are much faster. Mainly because the server responds to each request directly. However, be aware that some CMSs handle redirects by updating the .htaccess file in a programmatic way. Like WordPress, for example. This gives you the speed benefits of directly using .htaccess combined with the convenience of managing it from inside your application.
.htaccess Basics – Hiding Your .htaccess File
One of the .htaccess basics in this .htaccess guide is that the file shouldn’t be visible from the web. There’s just no reason for it, aside from maybe wanting to locate your .htpasswd file. And as another rule of the .htaccess guide, random strangers shouldn’t be able to look at details of your implementation. Including rewrite rules, directory settings, and security. Hiding all of that stuff makes it more difficult for hackers to work out ways into your system. Luckily, you can hide your .htaccess file fairly easily using this code:
<Files .htaccess>
order allow,deny
deny from all
</Files>
.htaccess guide to MIME types
MIME types are file types – originally for email (“Multipurpose Internet Mail Extensions”). But don’t just think of them as “file types” because MIME suggests a specific format for specifying them. If you’ve ever written an HTML doc, you’re likely to have specified a MIME type. Probably even without realising it:
<style type=”text/css” src=”/style.css” />
The type attribute refers to a particular MIME type.
MIME types on your server
Occasionally you might find that your web server isn’t set up to deliver a specific file type. And any requests for that type of file just don’t work. Usually, you can get around this by putting the MIME type in your .htaccess file.
AddType text/richtext rtx
This directive has three space-separated parts:
- The AddType command
- The MIME type
- The file extension.
You can associate a number of different file extensions with the same MIME type on one line.
AddType video/mpeg mpg MPEG MPG
Force Download by MIME Type
Want every link to a type of file to automatically download, rather than just open in your browser? Then, use the MIME type application/octet-stream, like this:
AddType application/octet-stream pdf
As before, you can include numerous file extensions:
AddType application/octet-stream rtf txt pdf docx doc
List of File Extensions and MIME Types – .htaccess basics
Here’s an incomplete list of file formats and associated MIME types. If you manage your own website, you may already know your file types. Therefore, you don’t need to paste the whole list into your .htaccess file. But if you run a site with others who could be uploading all sorts of stuff, then yes. This might help to avoid any potential publishing mishaps. This particularly relates to file sharing or project management sites where folks are bound to be sharing lots of files.
AddType application/macbinhex-40 hqx
AddType application/netalive net
AddType application/netalivelink nel
AddType application/octet-stream bin exe
AddType application/oda oda
AddType application/pdf pdf
AddType application/postscript ai eps ps
AddType application/rtf rtf
AddType application/x-bcpio bcpio
AddType application/x-cpio cpio
AddType application/x-csh csh
AddType application/x-director dcr
AddType application/x-director dir
AddType application/x-director dxr
AddType application/x-dvi dvi
AddType application/x-gtar gtar
AddType application/x-hdf hdf
AddType application/x-httpd-cgi cgi
AddType application/x-latex latex
AddType application/x-mif mif
AddType application/x-netcdf nc cdf
AddType application/x-onlive sds
AddType application/x-sh sh
AddType application/x-shar shar
AddType application/x-sv4cpio sv4cpio
AddType application/x-sv4crc sv4crc
AddType application/x-tar tar
AddType application/x-tcl tcl
AddType application/x-tex tex
AddType application/x-texinfo texinfo texi
AddType application/x-troff t tr roff
AddType application/x-troff-man man
AddType application/x-troff-me me
AddType application/x-troff-ms ms
AddType application/x-ustar ustar
AddType application/x-wais-source src
AddType application/zip zip
AddType audio/basic au snd
AddType audio/x-aiff aif aiff aifc
AddType audio/x-midi mid
AddType audio/x-pn-realaudio ram
AddType audio/x-wav wav
AddType image/gif gif GIF
AddType image/ief ief
AddType image/jpeg jpeg jpg jpe JPG
AddType image/tiff tiff tif
AddType image/x-cmu-raster ras
AddType image/x-portable-anymap pnm
AddType image/x-portable-bitmap pbm
AddType image/x-portable-graymap pgm
AddType image/x-portable-pixmap ppm
AddType image/x-rgb rgb
AddType image/x-xbitmap xbm
AddType image/x-xpixmap xpm
AddType image/x-xwindowdump xwd
AddType text/html html htm
AddType text/plain txt
AddType text/richtext rtx
AddType text/tab-separated-values tsv
AddType text/x-server-parsed-html shtml sht
AddType text/x-setext etx
AddType video/mpeg mpeg mpg mpe
AddType video/quicktime qt mov
AddType video/x-msvideo avi
AddType video/x-sgi-movie movie
AddType x-world/x-vrml wrl
Block Hotlinking – .htaccess Guide
Hotlinking is where you link to resources from other domains rather than hosting the files yourself. A good example would be a video that you really like on someone else’s site. You can either download it, upload it to your site (assuming no copyright, of course) and embed in your page.
<img src=”http://yourdomain.com/video.mpg”>
The hotlinking route saves you the bother and the bandwidth. And no, that doesn’t mean we condone it—quite the opposite in fact.
<img src=”http://originaldomain.com/video.mpg”>
This kind of thing also goes on with CSS and JS files – but it mostly happens with pictures and video. Sites like Wikipedia don’t really mind you doing this. And there are others who want you to do it because it helps their SEO needs. Then there are the likes of JQuery, which uses a CDN to share their JS libraries so you don’t have to host them yourself. But a lot of web hosts see hotlinking as a way of stealing their material and hogging their bandwidth.
If your site’s not really big, then you don’t want to be getting thousands of requests every day. Especially considering they don’t bring visitors to your site or benefit you in any way. So, if hotlinking is only raising your blood pressure, then you can block it. Simply by adding some mod_rewrite rules to your .htaccess file.
RewriteEngine on
RewriteCond % !^$
RewriteCond % !^http://(www.)?forinstance.com/.*$ [NC]
RewriteRule .(gif|jpg|jpeg|png|js|css)$ - [F]
Don’t forget to change forinstance.com in line 3 to your genuine domain name. This way, we catch any requests that don’t originate from your domain. And check them to see if they match one of the file extensions you’ve identified in line 4. If there’s any match – request denied. You can also easily add other file extensions to the list by editing the final line.
Enabling CGI Everywhere
CGI – Common Gateway Interface, is a server-side method that includes non-HTML scripts (like SSI or Perl) in web pages. CGI scripts are normally in a folder named /cgi-bin. The server’s configurations are to treat any resource in that directory as a script, instead of a page.
The problem is URLs which reference CGI resources must have /cgi-bin/ in them. So they can place implementation details into your URL – an inverse pattern you should steer clear of for a few reasons. Meanwhile, an elaborate site may need a better structure than just having loads of scripts crammed into one/cgi-bin folder.
For your server to parse CGI scripts, regardless of directory location, just put this in your .htaccess file:
AddHandler cgi-script .cgi
Options +ExecCGI
If you have other file extensions you’d like to process as CGI scripts, just add them to the first line.
Scripts as Source Code
Most of the time, all the scripts go in your web directory because they need to run as scripts. But maybe you want site visitors to be able to view the source code, instead of just executing it. Your .htaccess file can help you do this by stripping the script handler for particular types of file. And then putting them in a handler for text instead.
RemoveHandler cgi-script .pl .cgi .php .py
AddType text/plain .pl .cgi .php .py
Alternatively, you can specify for these file extensions to download by default, instead of just appearing on display.
RemoveHandler cgi-script .pl .cgi .php .py
AddType application/octet-stream .pl .cgi .php .py
Be on your guard with both of these, though. Because if you’re still using these scripts for the rest of your website, that directive in your web root’s .htaccess file is going to cause you headaches. You’re better off putting scripts you only want to display into their own designated directory. And then putting the directive into an .htaccess file in that same folder.
Configuring PHP Settings
Sometimes you need to tweak PHP settings, and this is best done using a file called php.ini. The thing is, some hosting companies, particularly shared hosters, won’t let their customers do that. But you can get around this, by embedding php.ini rules in your .htaccess file. Here’s the syntax:
php_value [setting name] [value]
So, let’s say you want to increase the maximum file upload size. You’d just say:
php_value upload_max_filesize 12M
You can’t specify every PHP setting in a .htaccess file. For instance, you can’t disable_classes like this. To see a full list of all php.ini settings, check out the official php.ini directives guide.
When Not to Use .htaccess
When you first edit your .htaccess file you can suddenly feel as powerful as a sysadmin. But try not to let absolute power corrupt you. Because you may find yourself misusing the .htaccess file. When all you have is a hammer, then every task can start to look like a nail. But at least sometimes, when something seems like an .htaccess task, your directive is better off somewhere else.
Further Up the Tree
When you feel like putting a directive in an .htaccess file, you should probably choose the httpd.conf file instead. It’s a configuration settings file for the whole server. The proper home of PHP settings too is the php.ini file, and most languages have their own equivalents.
So say you put directives higher up in the tree, in the httpd.conf, php.ini, or other appropriate file for that language. Then, you can embed those settings in the server’s parsing engine. With .htaccess, you must check and interpret the directives every time there’s a request.
This isn’t so bad if you’re running a low-traffic site with just a few .htaccess directives. But it isn’t difficult to see if your site is traffic-heavy and has to churn through lots of directives. You’re effectively putting the brakes on the whole thing.
It’s a shame that a lot of shared hosting providers won’t let their customers into the httpd.conf or php.ini files. Hence, forcing them to settle for the slower .htaccess file.
This doubly penalizes them when you compare them side-by-side with custom VPS configurations. Because shared hosting is also usually under-resourced. That’s why a site with a decent level of traffic is probably better off with VPS instead of shared hosting.
Conclusion
Congratulations on completing this extensive guide to mastering .htaccess! In this article, we’ve explored the intricacies of .htaccess files, from understanding their origins to implementing advanced configurations.
As you’ve learned, .htaccess files are powerful tools that grant you fine-grained control over server settings.
We hope this guide has provided you with the knowledge and confidence to leverage any aspect of dealing with .htaccess files effectively.
If there’s anything we missed or topics you’d like to see covered in more detail, please let us know in the comments below. Your feedback is appreciated as it helps us create content that serves your needs.
3 Comments
Thanks for the tutorial. I’m trying to fix web site managed by others that somehow lost its .htaccess file. I know what the file does, but I’ve never had to re-create it once the site’s been setup on a third-party host with plesk. Should I just create a file that denies public access and points to the error pages? I’m not sure what other parameters need to be included. Is there a tool that will help me do that?
For IP blocks, is this format correct?
order deny,allow
deny from
deny from
deny from
allow from all
Hi Kevin,
This seems to be correct, indeed.
As for deny, it may be done in 1 row like:
Deny from 111.111.111.111 222.222.222.222 333.333.333.333