Your Complete .htaccess Guide: Including .htaccess Basics and More

Often underestimated, .htaccess files have significant influence over server configurations, directly impacting website performance, security, and functionality. Whether you’re an experienced webmaster or a newcomer to web development, grasping the intricacies of .htaccess is essential for optimizing your online presence.

This is why we created this ultimate resource for unlocking the potential of .htaccess files. In this in-depth guide, we explore all the key aspects of .htaccess and empower you to harness its capabilities effectively.

Throughout this article, we delve into the origins and core functionalities of .htaccess. So, bookmark this .htaccess guide, and prepare to delve deep into the intricacies of all things .htaccess.

So, bookmark this .htaccess guide for any .htaccess tutorial you may need. We cover all the .htaccess basics and more for your convenience. htaccess configures the way that a server deals with a variety of requests. Quite a few servers support it, like Apache – which most commercial hosting providers tend to favor. htaccess files work at directory level, which lets them supersede universal configuration settings of .htaccess commands further up the directory tree.

Why is it called .htaccess? This type of file was initially used to limit user access to specific directories, and the name has just stuck. It uses a subset of Apache’s http.conf settings directives that give a sysadmin control over who has access to each directory.

It looks to an associated .htpasswd file for the usernames and passwords of those people who have permission to access them. .htaccess still performs this valuable function, but it’s a file that’s grown in versatility to do more besides that. So we’ll have the .htaccess basics and more explained in this article.

Where is the .htaccess file - plesk .htaccess guide

Where will I find the .htaccess file?

An .htaccess tutorial will tell you that you could find one in every folder (directory) on your server. But typically, the web root folder (the one that contains everything of your website) will have one. It will usually have a name like public_html or www. If you’ve got a directory with numerous website subdirectories, you’ll typically find an .htaccess file in the main root ( public_html ) directory. And one in all of the subdirectories (/sitename) too.

Why can’t I find the .htaccess file?

Most file systems – file names that start with a dot ( . ) will be hidden. So by default, you won’t be able to see them. However, you can still get to them. If you look at your FTP client or File Manager, you’ll likely find a setting to “show hidden files.” It may be in some other location depending on which program you use. But you’ll usually find it if you look under “Preferences”, “Settings”, “Folder Options” or “View.”

What if I don’t have an .htaccess file?

The first thing to establish is that you definitely don’t have one. Check that you have set the system to “show hidden files” (or whatever it’s called on your system). So that you can be sure it really isn’t there. You should have a .htaccess file as they’re frequently created by default, but it’s always worth checking.

If you’ve looked everywhere and you still can’t find one, never fear because .htaccess basics are not hard to understand. And we’ve got a .htaccess guide for you here. You can make one by opening a text editor and creating a new document. It should not have the .txt or any other file extension. Just .htaccess, and make sure it’s saved in ASCII format (it shouldn’t be in UTF-8 or anything) as .htaccess. Transfer it to the right directory using FTP or the file manager in your web browser.

Handling an .htaccess error code - Plesk

Handling an error code

One of your simple .htaccess basics is setting up error documents. Any .htaccess guide like this one will tell you that when a server receives a request, it responds by offering a document. Just like with HTML pages. Otherwise, it can pull that response from a particular application (as with Content Management Systems and other web apps).

If this process trips up, then the server reports an error and its corresponding code. Different types of errors have different error codes. And you’ve probably seen a 404 “Not Found” error quite a few times. It’s not the only one though.

Client Request Errors

  • 400 — Bad Request
  • 401 — Authorization Required
  • 402 — Payment Required (not used yet)
  • 403 — Forbidden
  • 404 — Not Found
  • 405 — Method Not Allowed
  • 406 — Not Acceptable (encoding)
  • 407 — Proxy Authentication Required
  • 408 — Request Timed Out
  • 409 — Conflicting Request
  • 410 — Gone
  • 411 — Content Length Required
  • 412 — Precondition Failed
  • 413 — Request Entity Too Long
  • 414 — Request URI Too Long
  • 415 — Unsupported Media Type.

Server Errors

  • 500 — Internal Server Error
  • 501 — Not Implemented
  • 502 — Bad Gateway
  • 503 — Service Unavailable
  • 504 — Gateway Timeout
  • 505 — HTTP Version Not Supported

What Happens by Default?

When there’s no specification on how to approach error-handling, the server just sends the message to the browser. Which in turn gives the user a general error message, but this isn’t especially helpful.

Creating Error Documents

At this point in your .htaccess guide, you’ll need an HTML document for each error code. You can call them anything, but you may want to consider a name that’s appropriate. Such as not-found.html or just 404.html.

Then, in the .htaccess file, determine which document goes with which error type.

ErrorDocument 400 /errors/bad-request.html

ErrorDocument 401 /errors/auth-reqd.html

ErrorDocument 403 /errors/forbid.html

ErrorDocument 404 /errors/not-found.html

ErrorDocument 500 /errors/server-err.html

Just note that each one gets its own line – and you’re done.

Alternatives to .htaccess – .htaccess guide for error-handling

Most CMS, like WordPress and Drupal – and web apps, will deal with these errors codes in their own way.

Password protection with .htaccess - Plesk

Password Protection With .htaccess

As we’ve said, .htaccess files were originally used to limit which users could get into certain directories. So let’s take a look at that in our .htaccess tutorial first.

.htpasswd –  this file holds usernames and passwords for the .htaccess system

Each one sits on its own line like this:

username:encryptedpassword

for example:

jamesbrown:523xT67mD1

Note that this password isn’t the actual one, it’s just a cryptographic hash of the password. This means that it’s been put through an encryption algorithm, and this is what came out. It works in the other direction too. So each time a user logs in, the password text goes through that same algorithm. If it matches with what the user typed, they get access.

This is a highly secure way of storing passwords. Because even if someone gets into your .htpasswd file, all they’re seeing is hashed passwords – not the real ones. And there’s no way to use them to reconstruct the password either, because the algorithm is a one-way-street.

You can choose from a few different hashing algorithms:

  • bcrypt — The securest one but chugging through the encryption process slows it down as a result. Apache and NGINX are compatible.
  • md5 — The latest versions of Apache use this as their default hashing algorithm, but NGINX doesn’t support it.

Insecure Algorithms — These are best avoided.

  • crypt() — was previously the default hashing function, but isn’t a secure option.
  • SHA and Salted SHA.

.htaccess Guide to Adding Usernames and Passwords with CLI

You can use the command line or an SSH terminal to create an .htpasswd file and add username-password pairs to it directly.

.htpasswd is the command for dealing with the .htpasswd file.

Simply use the command with the -c option to create a new .htpasswd file. Then enter the directory path (the actual path on the server, not the URL). You can also add a user if you want to.

> htpasswd -c /usr/local/blah/.htpasswd jamesbrown

This makes a new .htpasswd file in the /blah/ directory, along with a record for a user called jamesbrown. It will then ask you for a password – also encrypted and stored using md5 encryption.

If an .htpasswd file already exists at that location, the new user just becomes part of the existing file. So it won’t create a new one. Otherwise, if you’d rather use the bcrypt hashing algorithm, go with the -b option.

Password hashing without the command line

If you’re only familiar with .htaccess basics, you may choose not to use the command line or SSH terminal. In that case, you can just create an .htpasswd file. Then, just use a text editor to fill everything in before uploading using FTP or file manager.

Of course, that leaves you with the task of encrypting the passwords. But that shouldn’t be a problem because there are lots of password encryption programs online. Many other .htaccess tutorials will probably approve of the htpasswd generator at Aspirine.org. Because it offers a few choices of algorithms that let you determine how strong the password is. Once you run it, copy your hashed password into the .htpasswd file.

You’ll only need one.htpasswd file for all your.htaccess files. So there’s no need to have one for each. One will do the job for the whole main server directory or web hosting account. But don’t put your .htpasswd file in a directory that anyone can access. So, not in public_html or www or any subdirectory. It’s safer from a security standpoint to put it somewhere that is only accessible from within the server itself.

Quick .htaccess Tutorial: How to use .htpasswd with .htaccess

If you want to have a .htaccess file for every directory, you can assign a set of users to have access to it. To grant universal access, do nothing, because this is enabled by default. If you want to limit who can get access, then your .htaccess file should look like this:

AuthUserFile /usr/local/etc/.htpasswd

AuthName "Name of Secure Area"

AuthType Basic

<Limit GET POST>

require valid-user

</Limit>

Line one shows the location of where your usernames and passwords are. Line two defines the name for the area you want to keep secure, and you can call it anything. Line three specifies “Basic” authentication, which is fine in most instances.

The <Limit> tag defines what is being limited. In this instance, the ability to GET or POST to any file in the directory. Within the pair of <Limit> tags is a list of who is allowed to access files.

In this example, access files can be accessed by any valid user. If you only want certain users to have access you can name them.

AuthUserFile /usr/local/etc/.htpasswd

AuthName "Name of Secure Area"

AuthType Basic

<Limit GET POST>

require user janebrown

require user jamesbrown

</Limit>

You can also grant/deny access based on the group where you put users – which is a real-time saver. You can do this by creating a group file and adding names. Give your group file a name, such as .htpeople, and have it look something like this:

admin: janebrown jamesbrown

staff: zappafrank agrenmorgen

Now it’s become something that you can refer to in your .htaccess file:

AuthUserFile /usr/local/etc/.htpasswd

AuthGroupFile /usr/local/etc/.htpeople

AuthName "Admin Area"

AuthType Basic

<Limit GET POST>

require group admin

</Limit>

The .htaccess guide for .htpasswd alternatives

It only makes sense to use .htaccess/.htpasswd to limit server file access if you have lots of static files. This approach appeared in the early days of websites, where they consisted of lots of HTML docs and other resources. If you’re using WordPress, you’ll have a feature that lets you do this as part of the system.

Enabling Server Side Includes (SSI) – .htaccess Tutorial

SSI is a simple scripting language which you would mainly use to embed HTML documents into other HTML documents. So you can easily re-use frequently-used elements like menus and headers.

<!-- include virtual="header.shtml" -->

It’s also got conditional directives (if, else, etc.) and variables, which makes it a complete scripting language. Although one that’s hard to use if you have anything more complicated in your project than one or two includes. If it gets to that point then a developer will usually be reaching for PHP or Perl instead.

Server Side Includes are enabled by default with some web-hosting servers. If yours isn’t, you can use your .htaccess file to enable it, like this:

AddType text/html .shtml

AddHandler server-parsed .shtml

Options Indexes FollowSymLinks Includes

This should enable SSI for all files that have the .shtml extension. You can tell SSI to parse .html files using a directive like this:

AddHandler server-parsed .html

Why bother?

Well, this way lets you use SSI without alerting anyone to the fact that you are doing so. On top of that, if you change implementations later, you can hold on to your .html file extensions. The only fly in the ointment here is that every .html file will then be parsed with SSI. And if you have many .html files that don’t need SSI parsing, it makes the server work needlessly harder. Thus, bogging it down for no extra benefit.

SSI on your Index page

To avoid parsing every single.html file without using SSI on your index home, you have to stipulate in your .htaccess file. Because when the web server looks for the directory index page, it will be hunting for index.html by default. If you aren’t parsing .html files, you must name your index page “named index.shtml” if you want SSI to work. Because then, your server won’t automatically look for it. To make that happen just add:

DirectoryIndex index.shtml index.html

This lets the web server know that the index.shtml file is the main one for the directory. The second parameter, index.html is a failsafe. This gets referred when it can’t find index.shtml.

IP blacklisting - IP whitelisting - .htaccess - Plesk

IP Blacklisting and IP Whitelisting with .htaccess

If you’ve had problems from certain users/IP addresses, there are .htaccess basics you can use to blacklist/block. Otherwise, you can do the opposite and whitelist/approve everyone from particular addresses if you want to exclude everybody else.

Blacklisting by IP

This will let you blacklist addresses (numbers are examples):

order allow,deny

deny from 444.33.55.6

deny from 735.34.6.

allow from all

The first line says to evaluate the allow directives before the deny directives. This makes allow from all the default state. In this case, only those which match the deny directives will be denied. If you switched it round to say deny,allow, then the last thing it looked at would be the allow from all directive. This allows everybody, and overrides the deny statements.

Take note of the third line, which says deny from 735.34.6. This isn’t a complete IP address, but that’s okay because it denies every IP address in that block. Ergo, anything that begins with 735.34.6. You can include as many IP addresses as you like, one on each line, with a deny from directive.

Whitelisting by IP

The opposite of blacklisting is whitelisting — restricting everyone except those you specify. As you might suspect, the order directive has to be turned back to front. So that you deny access to everyone at first, but then allow certain addresses after.

order deny,allow

deny from all

allow from 111.22.3.4

allow from 789.56.4.

Domain names instead of IP addresses

You can also block or allow users of a domain name. This is helpful if people are moving between IP addresses. But it won’t work against anyone who has control of their reverse-DNS IP address mapping.

order allow,deny

deny from forinstance.com

allow from all

This works for subdomains too. In the example above, you will also block visitors from abc forinstance.com.

Block Users by Referrer – .htaccess Guide

If a website contains a link to your site and someone follows it, we call it a ‘referrer’. But this doesn’t only work for clickable hyperlinks to your website. Any page on the internet can link to your images directly. This is called hotlinking. It often steals your bandwidth, it can infringe on your copyright – and you don’t even get extra traffic from it. And it’s not just images either. A stranger can link to your other resources like CSS files and JS scripts, too.

This is bound to happen a little bit and most site owners tolerate it. But it’s the kind of thing that can easily escalate into something more abusive. And there are times when in-text clickable hyperlinks can cause you problems too. Like when they’re from troublesome or nefarious websites. These are just a few of the reasons why you may decide to deny requests that originate with particular referrers.

If you need to do this, you’ll have to activate the mod_rewrite module. Most web hosts enable it automatically. But if yours doesn’t, or you can’t tell if they have, you should get in touch and ask. If they’re reluctant to enable it – maybe think about getting a new host.

.htaccess basics – Directives that block a referrer depend on the mod_rewrite engine.

The code to block by referrer looks like this:

RewriteEngine on

RewriteCond % ^http://.*forinstance.com [NC,OR]

RewriteCond % ^http://.* forinstance2.com [NC,OR]

RewriteCond % ^http://.* forinstance3.com [NC]

RewriteRule .* - [F]

It’s slightly fiddly, so let’s go through it.

RewriteEngine on, on the first line tells the parser that some rewrite directives are on the way. Each of lines 2,3 and 4 blocks a single referring domain. To change this for your own purposes you would alter the domain name part (forinstance) and extension (.com). T

he back-slash in front of the .com is an escape character. The pattern matching used in the domain name is a standard expression. And the dot has a meaning in RegEx. So it must be “escaped” by using “/”.

The NC in the brackets is there to specify that the match shouldn’t be case sensitive. The OR literally means “or”, and indicates that more rules are on the way. As long as the URL is this one, this one or this one, go along with this rewrite rule.

The final line is the rewrite rule itself. The [F] stands for “Forbidden.” If a request comes from a referrer like the ones on the list, then it will be blocked. And a 403 Forbidden error will arrive.

Block bots and web scrapers - Plesk .htaccess Guide

An .htaccess Guide to Blocking Bots and Web Scrapers

Sometimes it isn’t even people trying to eat up your bandwidth, it’s robots. These programs come and lift your site information, typically to republish under some low-quality SEO outfit. There are genuine bots out there, such as the ones that come from the big search engines. But others are almost like cockroaches, scavenging and doing you no good whatsoever.

To date, the industry has identified hundreds of bots. You won’t ever be able to block them all, but at least, as many as possible. Here are some rewrite rules that will trip up 400+ known bots.

<IfModule mod_setenvif.c>
<IfModule mod_headers.c>
SetEnvIfNoCase User-Agent "^ALittle Client" bot
SetEnvIfNoCase User-Agent "^Go-http-client/1.1" bot
SetEnvIfNoCase User-Agent "^TprAdsTxtCrawler" bot
SetEnvIfNoCase User-Agent "^Photon/1.0" bot
SetEnvIfNoCase User-Agent .*Twitterbot/1.0.* bot
SetEnvIfNoCase User-Agent .*Screaming Frog SEO Spider.* bot
SetEnvIfNoCase User-Agent .*SurdotlyBot.* bot
SetEnvIfNoCase User-Agent .*curl.* bot
SetEnvIfNoCase User-Agent .*PixelTools.* bot
SetEnvIfNoCase User-Agent .*DataForSeoBot.* bot
SetEnvIfNoCase User-Agent .*PetalBot.* bot
SetEnvIfNoCase User-Agent .*weborama.* bot
SetEnvIfNoCase User-Agent .*CFNetwork.* bot
SetEnvIfNoCase User-Agent .*Python.* bot
SetEnvIfNoCase User-Agent .*python-requests.* bot
SetEnvIfNoCase User-Agent .*UptimeRobot.* bot
SetEnvIfNoCase User-Agent .*TprAdsTxtCrawler.* bot
SetEnvIfNoCase User-Agent .*Hybrid Advertising.* bot
SetEnvIfNoCase User-Agent .*Crawler.* bot
SetEnvIfNoCase User-Agent "^LinksMasterRoBot" bot
SetEnvIfNoCase User-Agent "^wp_is_mobile" bot
SetEnvIfNoCase User-Agent "^LinkStats" bot
SetEnvIfNoCase User-Agent "^CNCat" bot
SetEnvIfNoCase User-Agent "^linkdexbot" bot
SetEnvIfNoCase User-Agent "^meanpathbot" bot
SetEnvIfNoCase User-Agent "^NetSeer" bot
SetEnvIfNoCase User-Agent "^statdom.ru" bot
SetEnvIfNoCase User-Agent "^StatOnlineRuBot" bot
SetEnvIfNoCase User-Agent "^WebArtexBot" bot
SetEnvIfNoCase User-Agent "^Miralinks Robot" bot
SetEnvIfNoCase User-Agent "^Web-Monitoring" bot
SetEnvIfNoCase User-Agent "^Runet-Research-Crawler" bot
SetEnvIfNoCase User-Agent "^pr-cy.ru" bot
SetEnvIfNoCase User-Agent "^SeopultContentAnalyzer" bot
SetEnvIfNoCase User-Agent "^Seopult" bot
SetEnvIfNoCase User-Agent "^uptimerobot" bot
SetEnvIfNoCase User-Agent "^spbot" bot
SetEnvIfNoCase User-Agent "^rogerbot" bot
SetEnvIfNoCase User-Agent "^sitebot" bot
SetEnvIfNoCase User-Agent "^dotbot" bot
SetEnvIfNoCase User-Agent "^Linux" bot
SetEnvIfNoCase User-Agent "^SemrushBot" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SA" bot
SetEnvIfNoCase User-Agent "^SemrushBot-BA" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SI" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SWA" bot
SetEnvIfNoCase User-Agent "^SemrushBot-CT" bot
SetEnvIfNoCase User-Agent "^SemrushBot-BM" bot
SetEnvIfNoCase User-Agent "^SemrushBot-SEOAB" bot
SetEnvIfNoCase User-Agent "^MJ12bot" bot
SetEnvIfNoCase User-Agent "^Vivaldi" bot
SetEnvIfNoCase User-Agent "^ArchiveBot" bot
SetEnvIfNoCase User-Agent "^archive.org_bot" bot
SetEnvIfNoCase User-Agent "^ia_archiver" bot
SetEnvIfNoCase User-Agent "^ia_archiver-web.archive.org" bot
SetEnvIfNoCase User-Agent "^PaleMoon" bot
SetEnvIfNoCase User-Agent "^Pale Moon" bot
SetEnvIfNoCase User-Agent "Sovetnik" bot
SetEnvIfNoCase User-Agent "sovetnik" bot
SetEnvIfNoCase User-Agent "80legs" bot
SetEnvIfNoCase User-Agent "360Spider" bot
SetEnvIfNoCase User-Agent "^8484 Boston Project" bot
SetEnvIfNoCase User-Agent "Aboundex" bot
SetEnvIfNoCase User-Agent "^Alexibot" bot
SetEnvIfNoCase User-Agent "^asterias" bot
SetEnvIfNoCase User-Agent "^attach" bot
SetEnvIfNoCase User-Agent "^AIBOT" bot
SetEnvIfNoCase User-Agent "^Accelerator" bot
SetEnvIfNoCase User-Agent "^Ants" bot
SetEnvIfNoCase User-Agent "^AhrefsBot" bot
SetEnvIfNoCase User-Agent "^AhrefsSiteAudit" bot
SetEnvIfNoCase User-Agent "^Ask Jeeves" bot
SetEnvIfNoCase User-Agent "^Atomic_Email_Hunter" bot
SetEnvIfNoCase User-Agent "^atSpider" bot
SetEnvIfNoCase User-Agent "^autoemailspider" bot
SetEnvIfNoCase User-Agent "archive.org_bot" bot
SetEnvIfNoCase User-Agent "^a.pr-cy.ru" bot
SetEnvIfNoCase User-Agent "^BackDoorBot" bot
SetEnvIfNoCase User-Agent "^BackWeb" bot
SetEnvIfNoCase User-Agent "Bandit" bot
SetEnvIfNoCase User-Agent "^BatchFTP" bot
SetEnvIfNoCase User-Agent "^Bigfoot" bot
SetEnvIfNoCase User-Agent "^Black.Hole" bot
SetEnvIfNoCase User-Agent "^BlackWidow" bot
SetEnvIfNoCase User-Agent "^BlowFish" bot
SetEnvIfNoCase User-Agent "^BotALot" bot
SetEnvIfNoCase User-Agent "Buddy" bot
SetEnvIfNoCase User-Agent "^BuiltBotTough" bot
SetEnvIfNoCase User-Agent "^Bullseye" bot
SetEnvIfNoCase User-Agent "^BunnySlippers" bot
SetEnvIfNoCase User-Agent "^Baiduspider" bot
SetEnvIfNoCase User-Agent "^Bot\ mailto:[email protected]" bot
SetEnvIfNoCase User-Agent "^Buddy" bot
SetEnvIfNoCase User-Agent "^bwh3_user_agent" bot
SetEnvIfNoCase User-Agent "BLEXBot" bot
SetEnvIfNoCase User-Agent "^Cegbfeieh" bot
SetEnvIfNoCase User-Agent "^CheeseBot" bot
SetEnvIfNoCase User-Agent "^CherryPicker" bot
SetEnvIfNoCase User-Agent "^ChinaClaw" bot
SetEnvIfNoCase User-Agent "Collector" bot
SetEnvIfNoCase User-Agent "Copier" bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bot
SetEnvIfNoCase User-Agent "^cosmos" bot
SetEnvIfNoCase User-Agent "^Crescent" bot
SetEnvIfNoCase User-Agent "^Custo" bot
SetEnvIfNoCase User-Agent "^Cogentbot" bot
SetEnvIfNoCase User-Agent "^China" bot
SetEnvIfNoCase User-Agent "^ContactBot" bot
SetEnvIfNoCase User-Agent "^ContentSmartz" bot
SetEnvIfNoCase User-Agent "^CCBot" bot
SetEnvIfNoCase User-Agent "^Cluuz" bot
SetEnvIfNoCase User-Agent "^DISCo" bot
SetEnvIfNoCase User-Agent "^DIIbot" bot
SetEnvIfNoCase User-Agent "^DittoSpyder" bot
SetEnvIfNoCase User-Agent "^Download\ Demon" bot
SetEnvIfNoCase User-Agent "^Download\ Devil" bot
SetEnvIfNoCase User-Agent "^Download\ Wonder" bot
SetEnvIfNoCase User-Agent "^dragonfly" bot
SetEnvIfNoCase User-Agent "^Drip" bot
SetEnvIfNoCase User-Agent "^DataCha0s" bot
SetEnvIfNoCase User-Agent "^DBrowse" bot
SetEnvIfNoCase User-Agent "^Demo Bot" bot
SetEnvIfNoCase User-Agent "^Dolphin" bot
SetEnvIfNoCase User-Agent "Download" bot
SetEnvIfNoCase User-Agent "^DSurf15" bot
SetEnvIfNoCase User-Agent "^eCatch" bot
SetEnvIfNoCase User-Agent "^EasyDL" bot
SetEnvIfNoCase User-Agent "^ebingbong" bot
SetEnvIfNoCase User-Agent "^EirGrabber" bot
SetEnvIfNoCase User-Agent "^EmailCollector" bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bot
SetEnvIfNoCase User-Agent "^EmailWolf" bot
SetEnvIfNoCase User-Agent "^EroCrawler" bot
SetEnvIfNoCase User-Agent "^Exabot" bot
SetEnvIfNoCase User-Agent "^Express\ WebPictures" bot
SetEnvIfNoCase User-Agent "Extractor" bot
SetEnvIfNoCase User-Agent "^EyeNetIE" bot
SetEnvIfNoCase User-Agent "^EBrowse" bot
SetEnvIfNoCase User-Agent "^Educate Search VxB" bot
SetEnvIfNoCase User-Agent "EmailSpider" bot
SetEnvIfNoCase User-Agent "^ESurf15" bot
SetEnvIfNoCase User-Agent "ExtractorPro" bot
SetEnvIfNoCase User-Agent "^Foobot" bot
SetEnvIfNoCase User-Agent "^focusbot" bot
SetEnvIfNoCase User-Agent "^flunky" bot
SetEnvIfNoCase User-Agent "^FrontPage" bot
SetEnvIfNoCase User-Agent "^FileHound" bot
SetEnvIfNoCase User-Agent "^FlashGet" bot
SetEnvIfNoCase User-Agent "^Flexum" bot
SetEnvIfNoCase User-Agent "^Franklin Locator" bot
SetEnvIfNoCase User-Agent "^FSurf15" bot
SetEnvIfNoCase User-Agent "^Full Web Bot" bot
SetEnvIfNoCase User-Agent "^Go-Ahead-Got-It" bot
SetEnvIfNoCase User-Agent "^gotit" bot
SetEnvIfNoCase User-Agent "^GrabNet" bot
SetEnvIfNoCase User-Agent "^Grafula" bot
SetEnvIfNoCase User-Agent "^GetRight" bot
SetEnvIfNoCase User-Agent "^Gets" bot
SetEnvIfNoCase User-Agent "^GetWeb!" bot
SetEnvIfNoCase User-Agent "^Gigabot" bot
SetEnvIfNoCase User-Agent "^Go!Zilla" bot
SetEnvIfNoCase User-Agent "^GoZilla" bot
SetEnvIfNoCase User-Agent "^Grab.*Site" bot
SetEnvIfNoCase User-Agent "^Grabber" bot
SetEnvIfNoCase User-Agent "^grub-client" bot
SetEnvIfNoCase User-Agent "^gsa-crawler" bot
SetEnvIfNoCase User-Agent "^Guestbook Auto Submitter" bot
SetEnvIfNoCase User-Agent "^Gulliver" bot
SetEnvIfNoCase User-Agent "^Guzzle" bot
SetEnvIfNoCase User-Agent "^GuzzleHttp" bot
SetEnvIfNoCase User-Agent "^Harvest" bot
SetEnvIfNoCase User-Agent "^hloader" bot
SetEnvIfNoCase User-Agent "^HMView" bot
SetEnvIfNoCase User-Agent "^HTTrack" bot
SetEnvIfNoCase User-Agent "^humanlinks" bot
SetEnvIfNoCase User-Agent "HubSpot" bot
SetEnvIfNoCase User-Agent "^IlseBot" bot
SetEnvIfNoCase User-Agent "^Image\ Stripper" bot
SetEnvIfNoCase User-Agent "^Image\ Sucker" bot
SetEnvIfNoCase User-Agent "Indy\ Library" bot
SetEnvIfNoCase User-Agent "^InfoNavibot" bot
SetEnvIfNoCase User-Agent "^InfoTekies" bot
SetEnvIfNoCase User-Agent "^Intelliseek" bot
SetEnvIfNoCase User-Agent "^InterGET" bot
SetEnvIfNoCase User-Agent "^Internet\ Ninja" bot
SetEnvIfNoCase User-Agent "^Iria" bot
SetEnvIfNoCase User-Agent "^IBrowse" bot
SetEnvIfNoCase User-Agent "^Industry Program" bot
SetEnvIfNoCase User-Agent "^inktomi\.com" bot
SetEnvIfNoCase User-Agent "^Internet\ Ninja"" bot
SetEnvIfNoCase User-Agent "^ISC Systems iRc Search" bot
SetEnvIfNoCase User-Agent "^IUPUI Research" bot
SetEnvIfNoCase User-Agent "^ia_archiver" bot
SetEnvIfNoCase User-Agent "^Jakarta" bot
SetEnvIfNoCase User-Agent "^JennyBot" bot
SetEnvIfNoCase User-Agent "^JetCar" bot
SetEnvIfNoCase User-Agent "^JOC" bot
SetEnvIfNoCase User-Agent "^JustView" bot
SetEnvIfNoCase User-Agent "^Jyxobot" bot
SetEnvIfNoCase User-Agent "^Java" bot
SetEnvIfNoCase User-Agent "^jetcar" bot
SetEnvIfNoCase User-Agent "^Kenjin.Spider" bot
SetEnvIfNoCase User-Agent "^Keyword.Density" bot
SetEnvIfNoCase User-Agent "^larbin" bot
SetEnvIfNoCase User-Agent "^LexiBot" bot
SetEnvIfNoCase User-Agent "^lftp" bot
SetEnvIfNoCase User-Agent "^libWeb/clsHTTP" bot
SetEnvIfNoCase User-Agent "^likse" bot
SetEnvIfNoCase User-Agent "^LinkextractorPro" bot
SetEnvIfNoCase User-Agent "^LinkScan/8.1a.Unix" bot
SetEnvIfNoCase User-Agent "^LNSpiderguy" bot
SetEnvIfNoCase User-Agent "^LinkWalker" bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bot
SetEnvIfNoCase User-Agent "^LWP::Simple" bot
SetEnvIfNoCase User-Agent "^LARBIN-EXPERIMENTAL" bot
SetEnvIfNoCase User-Agent "^leech" bot
SetEnvIfNoCase User-Agent "^LeechFTP" bot
SetEnvIfNoCase User-Agent "^LetsCrawl.com" bot
SetEnvIfNoCase User-Agent "^libwww-perl" bot
SetEnvIfNoCase User-Agent "^Lincoln State Web Browser" bot
SetEnvIfNoCase User-Agent "^LMQueueBot" bot
SetEnvIfNoCase User-Agent "^LinkpadBot" bot
SetEnvIfNoCase User-Agent "^Magnet" bot
SetEnvIfNoCase User-Agent "^MegaIndex.ru" bot
SetEnvIfNoCase User-Agent "^Mag-Net" bot
SetEnvIfNoCase User-Agent "^MarkWatch" bot
SetEnvIfNoCase User-Agent "^Mass\ Downloader" bot
SetEnvIfNoCase User-Agent "^Mata.Hari" bot
SetEnvIfNoCase User-Agent "^Memo" bot
SetEnvIfNoCase User-Agent "^Microsoft.URL" bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control" bot
SetEnvIfNoCase User-Agent "^MIDown\ tool" bot
SetEnvIfNoCase User-Agent "^MIIxpc" bot
SetEnvIfNoCase User-Agent "^Mirror" bot
SetEnvIfNoCase User-Agent "^Missigua\ Locator" bot
SetEnvIfNoCase User-Agent "^Mister\ PiX" bot
SetEnvIfNoCase User-Agent "^moget" bot
SetEnvIfNoCase User-Agent "^Mac Finder" bot
SetEnvIfNoCase User-Agent "^MFC Foundation Class Library" bot
SetEnvIfNoCase User-Agent "^Missauga Loca" bot
SetEnvIfNoCase User-Agent "^Missouri College Browse" bot
SetEnvIfNoCase User-Agent "^Mizzu Labs" bot
SetEnvIfNoCase User-Agent "^Mo College" bot
SetEnvIfNoCase User-Agent "^MVAClient" bot
SetEnvIfNoCase User-Agent "^MJ12bot" bot
SetEnvIfNoCase User-Agent "^mfibot" bot
SetEnvIfNoCase User-Agent "^NAMEPROTECT" bot
SetEnvIfNoCase User-Agent "^Navroad" bot
SetEnvIfNoCase User-Agent "^NearSite" bot
SetEnvIfNoCase User-Agent "^NetAnts" bot
SetEnvIfNoCase User-Agent "^Netcraft" bot
SetEnvIfNoCase User-Agent "^NetMechanic" bot
SetEnvIfNoCase User-Agent "^NetSpider" bot
SetEnvIfNoCase User-Agent "^Net\ Vampire" bot
SetEnvIfNoCase User-Agent "^NetZIP" bot
SetEnvIfNoCase User-Agent "^NextGenSearchBot" bot
SetEnvIfNoCase User-Agent "^NG" bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bot
SetEnvIfNoCase User-Agent "^niki-bot" bot
SetEnvIfNoCase User-Agent "^NimbleCrawler" bot
SetEnvIfNoCase User-Agent "^Ninja" bot
SetEnvIfNoCase User-Agent "^NPbot" bot
SetEnvIfNoCase User-Agent "^nutch-1.4" bot
SetEnvIfNoCase User-Agent "^NameOfAgent (CMS Spider)" bot
SetEnvIfNoCase User-Agent "^NASA Search" bot
SetEnvIfNoCase User-Agent "^Net\ Reaper" bot
SetEnvIfNoCase User-Agent "^Ninja" bot
SetEnvIfNoCase User-Agent "^Nsauditor" bot
SetEnvIfNoCase User-Agent "^NetLyzer" bot
SetEnvIfNoCase User-Agent "^Octopus" bot
SetEnvIfNoCase User-Agent "^Offline\ Explorer" bot
SetEnvIfNoCase User-Agent "^Offline\ Navigator" bot
SetEnvIfNoCase User-Agent "^Offline" bot
SetEnvIfNoCase User-Agent "^Openfind" bot
SetEnvIfNoCase User-Agent "^OutfoxBot" bot
SetEnvIfNoCase User-Agent "^PageGrabber" bot
SetEnvIfNoCase User-Agent "^Papa\ Foto" bot
SetEnvIfNoCase User-Agent "^pavuk" bot
SetEnvIfNoCase User-Agent "^pcBrowser" bot
SetEnvIfNoCase User-Agent "^PHP\ version\ tracker" bot
SetEnvIfNoCase User-Agent "^Pockey" bot
SetEnvIfNoCase User-Agent "^ProPowerBot/2.14" bot
SetEnvIfNoCase User-Agent "^ProWebWalker" bot
SetEnvIfNoCase User-Agent "^psbot" bot
SetEnvIfNoCase User-Agent "^Pump" bot
SetEnvIfNoCase User-Agent "^ParseMX" bot
SetEnvIfNoCase User-Agent "^Page.*Saver" bot
SetEnvIfNoCase User-Agent "^PBrowse" bot
SetEnvIfNoCase User-Agent "^PEval" bot
SetEnvIfNoCase User-Agent "^Pita" bot
SetEnvIfNoCase User-Agent "^Poirot" bot
SetEnvIfNoCase User-Agent "^Port Huron Labs" bot
SetEnvIfNoCase User-Agent "^Production Bot" bot
SetEnvIfNoCase User-Agent "^Program Shareware" bot
SetEnvIfNoCase User-Agent "^PSurf15" bot
SetEnvIfNoCase User-Agent "^psycheclone" bot
SetEnvIfNoCase User-Agent "^QueryN.Metasearch" bot
SetEnvIfNoCase User-Agent "^RealDownload" bot
SetEnvIfNoCase User-Agent "Reaper" bot
SetEnvIfNoCase User-Agent "Recorder" bot
SetEnvIfNoCase User-Agent "^ReGet" bot
SetEnvIfNoCase User-Agent "^RepoMonkey" bot
SetEnvIfNoCase User-Agent "^RMA" bot
SetEnvIfNoCase User-Agent "^RookeeBot" bot
SetEnvIfNoCase User-Agent "^Readability" bot
SetEnvIfNoCase User-Agent "^Reaper" bot
SetEnvIfNoCase User-Agent "^RSurf15" bot
SetEnvIfNoCase User-Agent "Siphon" bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bot
SetEnvIfNoCase User-Agent "^SlySearch" bot
SetEnvIfNoCase User-Agent "^SmartDownload" bot
SetEnvIfNoCase User-Agent "^Snake" bot
SetEnvIfNoCase User-Agent "^Snapbot" bot
SetEnvIfNoCase User-Agent "^Snoopy" bot
SetEnvIfNoCase User-Agent "^sogou" bot
SetEnvIfNoCase User-Agent "^SpaceBison" bot
SetEnvIfNoCase User-Agent "^SpankBot" bot
SetEnvIfNoCase User-Agent "^spanner" bot
SetEnvIfNoCase User-Agent "^Sqworm" bot
SetEnvIfNoCase User-Agent "Stripper" bot
SetEnvIfNoCase User-Agent "Sucker" bot
SetEnvIfNoCase User-Agent "^SuperBot" bot
SetEnvIfNoCase User-Agent "^SuperHTTP" bot
SetEnvIfNoCase User-Agent "^Surfbot" bot
SetEnvIfNoCase User-Agent "^suzuran" bot
SetEnvIfNoCase User-Agent "^Szukacz/1.4" bot
SetEnvIfNoCase User-Agent "^SeznamBot" bot
SetEnvIfNoCase User-Agent "^Site-Shot" bot
SetEnvIfNoCase User-Agent "^Slackbot-LinkExpanding" bot
SetEnvIfNoCase User-Agent "^Scrapy" bot
SetEnvIfNoCase User-Agent "^Spider/Bot" bot
SetEnvIfNoCase User-Agent "^Scooter" bot
SetEnvIfNoCase User-Agent "^searchbot [email protected]" bot
SetEnvIfNoCase User-Agent "^SEO search Crawler" bot
SetEnvIfNoCase User-Agent "^SEOsearch" bot
SetEnvIfNoCase User-Agent "^ShablastBot" bot
SetEnvIfNoCase User-Agent "^Snagger" bot
SetEnvIfNoCase User-Agent "^snap.com beta crawler" bot
SetEnvIfNoCase User-Agent "^sogou develop spider" bot
SetEnvIfNoCase User-Agent "^Sogou Orion spider" bot
SetEnvIfNoCase User-Agent "^sogou spider" bot
SetEnvIfNoCase User-Agent "^Sogou web spider" bot
SetEnvIfNoCase User-Agent "^sohu agent" bot
SetEnvIfNoCase User-Agent "^SSurf15" bot
SetEnvIfNoCase User-Agent "^SafeSearch_microdata_crawler_" bot
SetEnvIfNoCase User-Agent "^SafeDNSBot" bot
SetEnvIfNoCase User-Agent "^SafeDNSBot_" bot
SetEnvIfNoCase User-Agent "^tAkeOut" bot
SetEnvIfNoCase User-Agent "^Teleport" bot
SetEnvIfNoCase User-Agent "^Telesoft" bot
SetEnvIfNoCase User-Agent "^TurnitinBot/1.5" bot
SetEnvIfNoCase User-Agent "^The.Intraformant" bot
SetEnvIfNoCase User-Agent "^TheNomad" bot
SetEnvIfNoCase User-Agent "^TightTwatBot" bot
SetEnvIfNoCase User-Agent "^Titan" bot
SetEnvIfNoCase User-Agent "^True_bot" bot
SetEnvIfNoCase User-Agent "^turingos" bot
SetEnvIfNoCase User-Agent "^TurnitinBot" bot
SetEnvIfNoCase User-Agent "^Teleport\ Pro" bot
SetEnvIfNoCase User-Agent "^Triton" bot
SetEnvIfNoCase User-Agent "^TSurf15" bot
SetEnvIfNoCase User-Agent "^Twiceler" bot
SetEnvIfNoCase User-Agent "^URLy.Warning" bot
SetEnvIfNoCase User-Agent "^Under the Rainbow" bot
SetEnvIfNoCase User-Agent "^Yo-yo" bot
SetEnvIfNoCase User-Agent "^Yanga" bot
SetEnvIfNoCase User-Agent "^Vacuum" bot
SetEnvIfNoCase User-Agent "^VCI" bot
SetEnvIfNoCase User-Agent "^VoidEYE" bot
SetEnvIfNoCase User-Agent "^Virusdie_crawler" bot
SetEnvIfNoCase User-Agent "^VadixBot" bot
SetEnvIfNoCase User-Agent "^voyager" bot
SetEnvIfNoCase User-Agent "^Web\ Image\ Collector" bot
SetEnvIfNoCase User-Agent "^Web\ Sucker" bot
SetEnvIfNoCase User-Agent "^WebAuto" bot
SetEnvIfNoCase User-Agent "^WebBandit" bot
SetEnvIfNoCase User-Agent "^Webclipping.com" bot
SetEnvIfNoCase User-Agent "^WebCopier" bot
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bot
SetEnvIfNoCase User-Agent "^WebFetch" bot
SetEnvIfNoCase User-Agent "^WebGo\ IS" bot
SetEnvIfNoCase User-Agent "^Web.Image.Collector" bot
SetEnvIfNoCase User-Agent "^WebLeacher" bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bot
SetEnvIfNoCase User-Agent "^WebReaper" bot
SetEnvIfNoCase User-Agent "^WebSauger" bot
SetEnvIfNoCase User-Agent "^Website\ eXtractor" bot
SetEnvIfNoCase User-Agent "^Website\ Quester" bot
SetEnvIfNoCase User-Agent "^Webster" bot
SetEnvIfNoCase User-Agent "^WebStripper" bot
SetEnvIfNoCase User-Agent "^WebWhacker" bot
SetEnvIfNoCase User-Agent "^WebZIP" bot
SetEnvIfNoCase User-Agent "Whacker" bot
SetEnvIfNoCase User-Agent "^Widow" bot
SetEnvIfNoCase User-Agent "^WISENutbot" bot
SetEnvIfNoCase User-Agent "^WWWOFFLE" bot
SetEnvIfNoCase User-Agent "^WWW-Collector-E" bot
SetEnvIfNoCase User-Agent "^W3C-checklink" bot
SetEnvIfNoCase User-Agent "^Weazel" bot
SetEnvIfNoCase User-Agent "^Web.*Spy" bot
SetEnvIfNoCase User-Agent "^WebAlta" bot
SetEnvIfNoCase User-Agent "^WebCapture" bot
SetEnvIfNoCase User-Agent "^WebMirror" bot
SetEnvIfNoCase User-Agent "^WebRecorder" bot
SetEnvIfNoCase User-Agent "^WebSpy" bot
SetEnvIfNoCase User-Agent "^WebVulnCrawl.unknown" bot
SetEnvIfNoCase User-Agent "^Wells Search" bot
SetEnvIfNoCase User-Agent "^WEP Search" bot
SetEnvIfNoCase User-Agent "^www\.asona\.org" bot
SetEnvIfNoCase User-Agent "^Wget" bot
SetEnvIfNoCase User-Agent "^Xaldon" bot
SetEnvIfNoCase User-Agent "^Xenu" bot
SetEnvIfNoCase User-Agent "^Xaldon\ WebSpider" bot
SetEnvIfNoCase User-Agent "^Zeus" bot
SetEnvIfNoCase User-Agent "^ZmEu" bot
SetEnvIfNoCase User-Agent "^Zyborg" bot
SetEnvIfNoCase User-Agent "^_CommonCrawler_Node_" bot
SetEnvIfNoCase User-Agent "^_Cliqzbot" bot
SetEnvIfNoCase User-Agent "^_Baiduspider" bot
SetEnvIfNoCase User-Agent "^_Exabot" bot
SetEnvIfNoCase User-Agent "^_GrapeshotCrawler" bot
SetEnvIfNoCase User-Agent "^_Gluten_Free_Crawler" bot
SetEnvIfNoCase User-Agent "^_DeuSu" bot
SetEnvIfNoCase User-Agent "^_Dataprovider" bot
SetEnvIfNoCase User-Agent "^_DuckDuckGo-Favicons-Bot" bot
SetEnvIfNoCase User-Agent "^_SeznamBot" bot
SetEnvIfNoCase User-Agent "^_007ac9_Crawler" bot
SetEnvIfNoCase User-Agent "^_wmtips" bot
SetEnvIfNoCase User-Agent "^rv" bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bot
</Limit>
</IfModule>
</IfModule>

..

Specifying a Default File for a Directory

When a server receives a URL request but with no specified file name, it assumes the URL refers to a directory. So, here is a .htaccess guide on what to do. If you request http: forinstance.com, Apache (and most servers) will look for the domain in the root directory. Typically /public_html or something like it, such as /forinstance-com – in order to find the default file. The default file will be called index.html by default. Because when the Internet was young, websites were often just a bunch of docs bundled together. And “home” pages were often no more than an index, so that you knew where everything was.

Of course, nowadays you might not want index.html to be the default page. Perhaps because you may want a different file type. So index.shtml, index.xml, or index.php might be more appropriate. Or maybe you don’t think of your home page as an “index,” and want to call it something else. Like home.html or primary.html.

Set the default directory page - .htaccess

Set the Default Directory Page

One of the .htaccess basics is letting you set the default page for a directory with ease:

DirectoryIndex [filename goes here]

If you want your default to be home.html it’s as simple as using:

DirectoryIndex home.html

.htaccess Guide to Setting More Than One Default Page

You can also set more than one DirectoryIndex:

DirectoryIndex index.php index.shtml index.html

The way this works is that the web server looks for the first one first. If it can’t find that, it looks for the second one, and on it goes. But why would you need to do this? Wouldn’t you know which file you wanted to use as your default page? Keep in mind that one of the .htaccess basics is that it influences its own directory. And each subdirectory too until it’s overruled by a more local file.

So, an .htaccess file in your root directory can give instructions for lots of subdirectories. And in turn, they could all have their own default page names. So, imagine you put all those rules into just one .htaccess file in the root. Then, it spares you the tedious work of duplicating all the directives it contains at the level of every directory.

.htaccess Guide to URL Rewriting and URL Redirects

An important part of this .htaccess guide is that the most common uses of .htaccess files is URL redirects. So, for example, the URL for a document or resource may have changed. Perhaps because you’ve moved things around on your website, or you’ve changed domain names. Then, in that case, URL redirects can help you.

301 or 302

There are two types of redirect error codes that the server will generate, namely 301 and 302.

301 tells you that something has “Permanently Moved.” Meanwhile, 302 means it “Moved Temporarily.” In most cases, 301 does a perfectly good job. And perhaps more importantly, it gets SEO brownie points since the original URL may pick up from the new page.

It will also make most browsers update their bookmarks and cache the old-to-new mapping. This consequently lets them request the new URL when the original is being looked for. For a permanently changed URL these are all the responses you want.

There’s not much you can gain from using 302 redirects. Mainly, because there’s rarely a reason to change a URL on a temporary basis. Changing one at all is not something that anybody should really want to do. However, sometimes you just have to do it. And there are usually better options to changing it only to change it back later. At least, at the time of writing this .htaccess guide.

Redirect or Rewrite

You can change a URL with .htaccess directives in a couple of ways — the Redirect command and the mod_rewrite engine. The Redirect command tells the browser which other URL it should be looking out for. The mod_rewrite tool will normally “translate” the URL that’s in the request. Turns it into something the file system or CMS can understand. Then, it treats the request as though the translated URL was the one that was requested.

From the perspective of the web browser it’s business as usual. It gets the content it requested and carries on as if nothing happened.

The mod_rewrite tool is also able to produce 301 redirects that work like the Redirect command. But with a greater number of possible rules instead. Including elaborate pattern matching and rewriting instructions – which is beyond what Redirect can do.

Basic Page Redirect – .htaccess guide

For redirecting one page to another URL, the code looks like this:

Redirect 301 /relative-url.html http://forinstance.com/full-url.html

A single space separates each of the four parts of this one-line command, so you have:

  1. the Redirect command itself
  2. its type ( 301 – Moved Permanently )
  3. the original page’s relative URL
  4. the full URL of the new page.

The relative URL is relative to the directory that contains the .htaccess file. Which will normally be the web root, or the root of the domain.

So, if http://forinstance.com/blog.php had been moved to http://blog.forinstance.com, the code would be:

Redirect 301 /blog.php http://blog.forinstance.com

Redirecting a large section – .htaccess guide

Have you made changes to your directory structure yet? If you haven’t change your page names, redirect all requests for a particular directory to the new one.

Redirect 301 /old-directory http://forinstance.com/new-directory

Redirecting an entire site – .htaccess guide

But how about if your entire site has moved to a new URL? No problem.

Redirect 301 / http://thenewurl.com

Redirecting www to non-www – .htaccess guide

More and more websites are turning their back on the www subdomain. There’s never really been a need for it. It’s a throwback to when website owners used a server to look after many of their own documents. And the www directory was where they put anything they wanted to offer others.

Some still use it to this day, but many have moved on. It’s become such a habit for users to type “www.” in front of every URL. Therefore, making it tricky for you if yours has been short of those letters. However, the mod_rewrite module can help you with this. And you probably have one on your web host’s dashboard already.

Options +FollowSymlinks

RewriteEngine on

RewriteCond % ^www.forinstance.com [NC]

RewriteRule ^(.*)$ http://forinstance.org/$1 [R=301,NC]

But be careful! Many other .htaccess and mod_rewrite guides will give you some version of this code to achieve this:

Options +FollowSymlinks

RewriteEngine on

RewriteCond % !^forinstance.com [NC]

RewriteRule ^(.*)$ http://forinstcance.org/$1 [R=301,NC]

Can you see what’s wrong with it?

All subdomains are redirected to the primary domain! Which means not just www.forinstance.com, but others like blog.forinstance.com and admin.forinstance.com too. Not ideal behaviour!

Redirecting to www – .htaccess guide

So, what happens if you’re using the www subdomain? You should probably set up a redirect to make sure people get to where they’re trying to go. Especially now that fewer people are likely to automatically add that www to the beginning of URLs. All you need to do is reverse the code to achieve this.

RewriteEngine On

RewriteCond % ^forinstance.com [NC

RewriteRule ^(.*) http://www.website.com/$1 [R=301,NC]

One thing not to do:

A number of .htaccess guides recommend redirecting 404 errors to your home page. While this is possible, we’re writing here in our .htaccess guide that it’s actually an awful idea. Because it leaves visitors confused. They will be expecting another page, and instead get your homepage. A 404-error page would have told them exactly what they needed to know – whereas this does not. And anyway, what’s the problem with admitting that a page can’t be found? There’s no shame in it.

Why use the .htaccess basics in this .htaccess guide rather than other approaches? Redirects can be set up with server-side scripting, like in PHP files. They can also be set up from within your CMS – which is pretty much the same thing. But using .htaccess is usually the fastest type of redirect. With PHP-based redirects, or other server-side scripting languages, the entire request must be completed. And the script is actually interpreted before a redirect message is sent to the browser.

As any .htaccess guide will tell you, using .htaccess redirects are much faster. Mainly because the server responds to each request directly. However, be aware that some CMSs handle redirects by updating the .htaccess file in a programmatic way. Like WordPress, for example. This gives you the speed benefits of directly using .htaccess combined with the convenience of managing it from inside your application.

.htaccess Basics – Hiding Your .htaccess File

One of the .htaccess basics in this .htaccess guide is that the file shouldn’t be visible from the web. There’s just no reason for it, aside from maybe wanting to locate your .htpasswd file. And as another rule of the .htaccess guide, random strangers shouldn’t be able to look at details of your implementation. Including rewrite rules, directory settings, and security. Hiding all of that stuff makes it more difficult for hackers to work out ways into your system. Luckily, you can hide your .htaccess file fairly easily using this code:

<Files .htaccess>

order allow,deny

deny from all

</Files>

MIME types - .htaccess - Plesk

.htaccess guide to MIME types

MIME types are file types – originally for email (“Multipurpose Internet Mail Extensions”). But don’t just think of them as “file types” because MIME suggests a specific format for specifying them. If you’ve ever written an HTML doc, you’re likely to have specified a MIME type. Probably even without realising it:

<style type=”text/css” src=”/style.css” />

The type attribute refers to a particular MIME type.

MIME types on your server

Occasionally you might find that your web server isn’t set up to deliver a specific file type. And any requests for that type of file just don’t work. Usually, you can get around this by putting the MIME type in your .htaccess file.

AddType text/richtext rtx

This directive has three space-separated parts:

  1. The AddType command
  2. The MIME type
  3. The file extension.

You can associate a number of different file extensions with the same MIME type on one line.

AddType video/mpeg mpg MPEG MPG  

Force Download by MIME Type

Want every link to a type of file to automatically download, rather than just open in your browser? Then, use the MIME type application/octet-stream, like this:

AddType application/octet-stream pdf

As before, you can include numerous file extensions:

AddType application/octet-stream rtf txt pdf docx doc

List of File Extensions and MIME Types – .htaccess basics

Here’s an incomplete list of file formats and associated MIME types. If you manage your own website, you may already know your file types. Therefore, you don’t need to paste the whole list into your .htaccess file. But if you run a site with others who could be uploading all sorts of stuff, then yes. This might help to avoid any potential publishing mishaps. This particularly relates to file sharing or project management sites where folks are bound to be sharing lots of files.

AddType application/macbinhex-40 hqx

AddType application/netalive net

AddType application/netalivelink nel

AddType application/octet-stream bin exe

AddType application/oda oda

AddType application/pdf pdf

AddType application/postscript ai eps ps

AddType application/rtf rtf

AddType application/x-bcpio bcpio

AddType application/x-cpio cpio

AddType application/x-csh csh

AddType application/x-director dcr

AddType application/x-director dir

AddType application/x-director dxr

AddType application/x-dvi dvi

AddType application/x-gtar gtar

AddType application/x-hdf hdf

AddType application/x-httpd-cgi cgi

AddType application/x-latex latex

AddType application/x-mif mif

AddType application/x-netcdf nc cdf

AddType application/x-onlive sds

AddType application/x-sh sh

AddType application/x-shar shar

AddType application/x-sv4cpio sv4cpio

AddType application/x-sv4crc sv4crc

AddType application/x-tar tar

AddType application/x-tcl tcl

AddType application/x-tex tex

AddType application/x-texinfo texinfo texi

AddType application/x-troff t tr roff

AddType application/x-troff-man man

AddType application/x-troff-me me

AddType application/x-troff-ms ms

AddType application/x-ustar ustar

AddType application/x-wais-source src

AddType application/zip zip

AddType audio/basic au snd

AddType audio/x-aiff aif aiff aifc

AddType audio/x-midi mid

AddType audio/x-pn-realaudio ram

AddType audio/x-wav wav

AddType image/gif gif GIF

AddType image/ief ief

AddType image/jpeg jpeg jpg jpe JPG

AddType image/tiff tiff tif

AddType image/x-cmu-raster ras

AddType image/x-portable-anymap pnm

AddType image/x-portable-bitmap pbm

AddType image/x-portable-graymap pgm

AddType image/x-portable-pixmap ppm

AddType image/x-rgb rgb

AddType image/x-xbitmap xbm

AddType image/x-xpixmap xpm

AddType image/x-xwindowdump xwd

AddType text/html html htm

AddType text/plain txt

AddType text/richtext rtx

AddType text/tab-separated-values tsv

AddType text/x-server-parsed-html shtml sht

AddType text/x-setext etx

AddType video/mpeg mpeg mpg mpe

AddType video/quicktime qt mov

AddType video/x-msvideo avi

AddType video/x-sgi-movie movie

AddType x-world/x-vrml wrl

Block hotlinking - .htaccess - Plesk

Block Hotlinking – .htaccess Guide

Hotlinking is where you link to resources from other domains rather than hosting the files yourself. A good example would be a video that you really like on someone else’s site. You can either download it, upload it to your site (assuming no copyright, of course) and embed in your page.

<img src=”http://yourdomain.com/video.mpg”>

The hotlinking route saves you the bother and the bandwidth. And no, that doesn’t mean we condone it—quite the opposite in fact.

<img src=”http://originaldomain.com/video.mpg”>

This kind of thing also goes on with CSS and JS files – but it mostly happens with pictures and video. Sites like Wikipedia don’t really mind you doing this. And there are others who want you to do it because it helps their SEO needs. Then there are the likes of JQuery, which uses a CDN to share their JS libraries so you don’t have to host them yourself. But a lot of web hosts see hotlinking as a way of stealing their material and hogging their bandwidth.

If your site’s not really big, then you don’t want to be getting thousands of requests every day. Especially considering they don’t bring visitors to your site or benefit you in any way. So, if hotlinking is only raising your blood pressure, then you can block it. Simply by adding some mod_rewrite rules to your .htaccess file.

RewriteEngine on

RewriteCond % !^$

RewriteCond % !^http://(www.)?forinstance.com/.*$ [NC]

RewriteRule .(gif|jpg|jpeg|png|js|css)$ - [F]

Don’t forget to change forinstance.com in line 3 to your genuine domain name. This way, we catch any requests that don’t originate from your domain. And check them to see if they match one of the file extensions you’ve identified in line 4. If there’s any match – request denied. You can also easily add other file extensions to the list by editing the final line.

Enabling CGI Everywhere

CGI – Common Gateway Interface, is a server-side method that includes non-HTML scripts (like SSI or Perl) in web pages. CGI scripts are normally in a folder named /cgi-bin. The server’s configurations are to treat any resource in that directory as a script, instead of a page.

The problem is URLs which reference CGI resources must have /cgi-bin/ in them. So they can place implementation details into your URL – an inverse pattern you should steer clear of for a few reasons. Meanwhile, an elaborate site may need a better structure than just having loads of scripts crammed into one/cgi-bin folder.

For your server to parse CGI scripts, regardless of directory location, just put this in your .htaccess file:

AddHandler cgi-script .cgi

Options +ExecCGI

If you have other file extensions you’d like to process as CGI scripts, just add them to the first line.

Scripts as Source Code

Most of the time, all the scripts go in your web directory because they need to run as scripts. But maybe you want site visitors to be able to view the source code, instead of just executing it. Your .htaccess file can help you do this by stripping the script handler for particular types of file. And then putting them in a handler for text instead.

RemoveHandler cgi-script .pl .cgi .php .py

AddType text/plain .pl .cgi .php .py

Alternatively, you can specify for these file extensions to download by default, instead of just appearing on display.

RemoveHandler cgi-script .pl .cgi .php .py

AddType application/octet-stream .pl .cgi .php .py

Be on your guard with both of these, though. Because if you’re still using these scripts for the rest of your website, that directive in your web root’s .htaccess file is going to cause you headaches. You’re better off putting scripts you only want to display into their own designated directory. And then putting the directive into an .htaccess file in that same folder.

PHP settings - .htaccess - Plesk

Configuring PHP Settings

Sometimes you need to tweak PHP settings, and this is best done using a file called php.ini. The thing is, some hosting companies, particularly shared hosters, won’t let their customers do that. But you can get around this, by embedding php.ini rules in your .htaccess file. Here’s the syntax:

php_value [setting name] [value]

So, let’s say you want to increase the maximum file upload size. You’d just say:

php_value upload_max_filesize  12M

You can’t specify every PHP setting in a .htaccess file. For instance, you can’t disable_classes like this. To see a full list of all php.ini settings, check out the official php.ini directives guide.

When Not to Use .htaccess

When you first edit your .htaccess file you can suddenly feel as powerful as a sysadmin. But try not to let absolute power corrupt you. Because you may find yourself misusing the .htaccess file. When all you have is a hammer, then every task can start to look like a nail. But at least sometimes, when something seems like an .htaccess task, your directive is better off somewhere else.

Further Up the Tree

When you feel like putting a directive in an .htaccess file, you should probably choose the httpd.conf file instead. It’s a configuration settings file for the whole server. The proper home of PHP settings too is the php.ini file, and most languages have their own equivalents.

So say you put directives higher up in the tree, in the httpd.conf, php.ini, or other appropriate file for that language. Then, you can embed those settings in the server’s parsing engine. With .htaccess, you must check and interpret the directives every time there’s a request.

This isn’t so bad if you’re running a low-traffic site with just a few .htaccess directives. But it isn’t difficult to see if your site is traffic-heavy and has to churn through lots of directives. You’re effectively putting the brakes on the whole thing.

It’s a shame that a lot of shared hosting providers won’t let their customers into the httpd.conf or php.ini files. Hence, forcing them to settle for the slower .htaccess file.

This doubly penalizes them when you compare them side-by-side with custom VPS configurations. Because shared hosting is also usually under-resourced. That’s why a site with a decent level of traffic is probably better off with VPS instead of shared hosting.

Conclusion

Congratulations on completing this extensive guide to mastering .htaccess! In this article, we’ve explored the intricacies of .htaccess files, from understanding their origins to implementing advanced configurations.
As you’ve learned, .htaccess files are powerful tools that grant you fine-grained control over server settings.
We hope this guide has provided you with the knowledge and confidence to leverage any aspect of dealing with .htaccess files effectively.

If there’s anything we missed or topics you’d like to see covered in more detail, please let us know in the comments below. Your feedback is appreciated as it helps us create content that serves your needs.

.htaccess troubleshooting basics - Plesk

3 Comments

  1. Thanks for the tutorial. I’m trying to fix web site managed by others that somehow lost its .htaccess file. I know what the file does, but I’ve never had to re-create it once the site’s been setup on a third-party host with plesk. Should I just create a file that denies public access and points to the error pages? I’m not sure what other parameters need to be included. Is there a tool that will help me do that?

  2. For IP blocks, is this format correct?

    order deny,allow
    deny from
    deny from
    deny from
    allow from all

    • Hi Kevin,

      This seems to be correct, indeed.
      As for deny, it may be done in 1 row like:
      Deny from 111.111.111.111 222.222.222.222 333.333.333.333

Add a Comment

Your email address will not be published. Required fields are marked *

GET LATEST NEWS AND TIPS

  • Yes, please, I agree to receiving my personal Plesk Newsletter! WebPros International GmbH and other WebPros group companies may store and process the data I provide for the purpose of delivering the newsletter according to the WebPros Privacy Policy. In order to tailor its offerings to me, Plesk may further use additional information like usage and behavior data (Profiling). I can unsubscribe from the newsletter at any time by sending an email to [email protected] or use the unsubscribe link in any of the newsletters.

  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden

Related Posts

Knowledge Base