Comprehensive Guide To htaccess

Bookmark this .htaccess guide for any .htaccess tutorial you may need. We cover all the .htaccess basics and more for your convenience. htaccess configures the way that a server deals with a variety of requests. Quite a few servers support it, like Apache – which most commercial hosting providers tend to favor. htaccess files work at directory level, which lets them supersede universal configuration settings of .htaccess commands further up the directory tree.

Why is it called .htaccess? This type of file was initially used to limit user access to specific directories, and the name has just stuck. It uses a subset of Apache’s http.conf settings directives that give a sysadmin control over who has access to each directory.

It looks to an associated .htpasswd file for the usernames and passwords of those people who have permission to access them. .htaccess still performs this valuable function, but it’s a file that’s grown in versatility to do more besides that. So we’ll have the .htaccess basics and more explained in this article.

Where will I find the .htaccess file?

An .htaccess tutorial will tell you that you could find one in every folder (directory) on your server. But typically, the web root folder (the one that contains everything of your website) will have one. It will usually have a name like public_html or www. If you’ve got a directory with numerous website subdirectories, you’ll typically find an .htaccess file in the main root ( public_html ) directory. And one in all of the subdirectories (/sitename) too.

 

Why can’t I find the .htaccess file?

Most file systems – file names that start with a dot ( . ) will be hidden. So by default, you won’t be able to see them. However, you can still get to them. If you look at your FTP client or File Manager, you’ll likely find a setting to “show hidden files.” It may be in some other location depending on which program you use. But you’ll usually find it if you look under “Preferences”, “Settings”, “Folder Options” or “View.”

 

What if I don’t have an .htaccess file?

The first thing to establish is that you definitely don’t have one. Check that you have set the system to “show hidden files” (or whatever it’s called on your system). So that you can be sure it really isn’t there. You should have a .htaccess file as they’re frequently created by default, but it’s always worth checking.

If you’ve looked everywhere and you still can’t find one, never fear because .htaccess basics are not hard to understand. And we’ve got a .htaccess guide for you here. You can make one by opening a text editor and creating a new document. It should not have the .txt or any other file extension. Just .htaccess, and make sure it’s saved in ASCII format (it shouldn’t be in UTF-8 or anything) as .htaccess. Transfer it to the right directory using FTP or the file manager in your web browser.

Handling an error code

One of your simple .htaccess basics is setting up error documents. Any .htaccess guide like this one will tell you that when a server receives a request, it responds by offering a document. Just like with HTML pages. Otherwise, it can pull that response from a particular application (as with Content Management Systems and other web apps).

If this process trips up, then the server reports an error and its corresponding code. Different types of errors have different error codes. And you’ve probably seen a 404 “Not Found” error quite a few times. It’s not the only one though.

Client Request Errors

  • 400 — Bad Request
  • 401 — Authorization Required
  • 402 — Payment Required (not used yet)
  • 403 — Forbidden
  • 404 — Not Found
  • 405 — Method Not Allowed
  • 406 — Not Acceptable (encoding)
  • 407 — Proxy Authentication Required
  • 408 — Request Timed Out
  • 409 — Conflicting Request
  • 410 — Gone
  • 411 — Content Length Required
  • 412 — Precondition Failed
  • 413 — Request Entity Too Long
  • 414 — Request URI Too Long
  • 415 — Unsupported Media Type.

Server Errors

  • 500 — Internal Server Error
  • 501 — Not Implemented
  • 502 — Bad Gateway
  • 503 — Service Unavailable
  • 504 — Gateway Timeout
  • 505 — HTTP Version Not Supported
 

What Happens by Default?

When there’s no specification on how to approach error-handling, the server just sends the message to the browser. Which in turn gives the user a general error message, but this isn’t especially helpful.

Creating Error Documents

At this point in your .htaccess guide, you’ll need an HTML document for each error code. You can call them anything, but you may want to consider a name that’s appropriate. Such as not-found.html or just 404.html.

Then, in the .htaccess file, determine which document goes with which error type.

ErrorDocument 400 /errors/bad-request.html

ErrorDocument 401 /errors/auth-reqd.html

ErrorDocument 403 /errors/forbid.html

ErrorDocument 404 /errors/not-found.html

ErrorDocument 500 /errors/server-err.html

Just note that each one gets its own line – and you’re done.

 

Alternatives to .htaccess – .htaccess guide for error-handling

Most CMS, like WordPress and Drupal – and web apps, will deal with these errors codes in their own way.

Password Protection With .htaccess

As we’ve said, .htaccess files were originally used to limit which users could get into certain directories. So let’s take a look at that in our .htaccess tutorial first.

.htpasswd –  this file holds usernames and passwords for the .htaccess system

Each one sits on its own line like this:

username:encryptedpassword

for example:

jamesbrown:523xT67mD1

Note that this password isn’t the actual one, it’s just a cryptographic hash of the password. This means that it’s been put through an encryption algorithm, and this is what came out. It works in the other direction too. So each time a user logs in, the password text goes through that same algorithm. If it matches with what the user typed, they get access.

This is a highly secure way of storing passwords. Because even if someone gets into your .htpasswd file, all they’re seeing is hashed passwords – not the real ones. And there’s no way to use them to reconstruct the password either, because the algorithm is a one-way-street.

You can choose from a few different hashing algorithms:

  • bcrypt — The securest one but chugging through the encryption process slows it down as a result. Apache and NGINX are compatible.
  • md5 — The latest versions of Apache use this as their default hashing algorithm, but NGINX doesn’t support it.

Insecure Algorithms — These are best avoided.

  • crypt() — was previously the default hashing function, but isn’t a secure option.
  • SHA and Salted SHA.
 

.htaccess Guide to Adding Usernames and Passwords with CLI

You can use the command line or an SSH terminal to create an .htpasswd file and add username-password pairs to it directly.

.htpasswd is the command for dealing with the .htpasswd file.

Simply use the command with the -c option to create a new .htpasswd file. Then enter the directory path (the actual path on the server, not the URL). You can also add a user if you want to.

> htpasswd -c /usr/local/blah/.htpasswd jamesbrown

This makes a new .htpasswd file in the /blah/ directory, along with a record for a user called jamesbrown. It will then ask you for a password – also encrypted and stored using md5 encryption.

If an .htpasswd file already exists at that location, the new user just becomes part of the existing file. So it won’t create a new one. Otherwise, if you’d rather use the bcrypt hashing algorithm, go with the -b option.

 

Password hashing without the command line

If you’re only familiar with .htaccess basics, you may choose not to use the command line or SSH terminal. In that case, you can just create an .htpasswd file. Then, just use a text editor to fill everything in before uploading using FTP or file manager.

Of course, that leaves you with the task of encrypting the passwords. But that shouldn’t be a problem because there are lots of password encryption programs online. Many other .htaccess tutorials will probably approve of the htpasswd generator at Aspirine.org. Because it offers a few choices of algorithms that let you determine how strong the password is. Once you run it, copy your hashed password into the .htpasswd file.

You’ll only need one.htpasswd file for all your.htaccess files. So there’s no need to have one for each. One will do the job for the whole main server directory or web hosting account. But don’t put your .htpasswd file in a directory that anyone can access. So, not in public_html or www or any subdirectory. It’s safer from a security standpoint to put it somewhere that is only accessible from within the server itself.

 

Quick .htaccess Tutorial: How to use .htpasswd with .htaccess

If you want to have a .htaccess file for every directory, you can assign a set of users to have access to it. To grant universal access, do nothing, because this is enabled by default. If you want to limit who can get access, then your .htaccess file should look like this:

AuthUserFile /usr/local/etc/.htpasswd

AuthName "Name of Secure Area"

AuthType Basic

<Limit GET POST>

require valid-user

</Limit>

Line one shows the location of where your usernames and passwords are. Line two defines the name for the area you want to keep secure, and you can call it anything. Line three specifies “Basic” authentication, which is fine in most instances.

The <Limit> tag defines what is being limited. In this instance, the ability to GET or POST to any file in the directory. Within the pair of <Limit> tags is a list of who is allowed to access files.

In this example, access files can be accessed by any valid user. If you only want certain users to have access you can name them.

AuthUserFile /usr/local/etc/.htpasswd

AuthName "Name of Secure Area"

AuthType Basic

<Limit GET POST>

require user janebrown

require user jamesbrown

</Limit>

You can also grant/deny access based on the group where you put users – which is a real-time saver. You can do this by creating a group file and adding names. Give your group file a name, such as .htpeople, and have it look something like this:

admin: janebrown jamesbrown

staff: zappafrank agrenmorgen

Now it’s become something that you can refer to in your .htaccess file:

AuthUserFile /usr/local/etc/.htpasswd

AuthGroupFile /usr/local/etc/.htpeople

AuthName "Admin Area"

AuthType Basic

<Limit GET POST>

require group admin

</Limit>

 

The .htaccess guide for .htpasswd alternatives

It only makes sense to use .htaccess/.htpasswd to limit server file access if you have lots of static files. This approach appeared in the early days of websites, where they consisted of lots of HTML docs and other resources. If you’re using WordPress, you’ll have a feature that lets you do this as part of the system.

 

Enabling Server Side Includes (SSI) – .htaccess Tutorial

SSI is a simple scripting language which you would mainly use to embed HTML documents into other HTML documents. So you can easily re-use frequently-used elements like menus and headers.

<!-- include virtual="header.shtml" -->

It’s also got conditional directives (if, else, etc.) and variables, which makes it a complete scripting language. Although one that’s hard to use if you have anything more complicated in your project than one or two includes. If it gets to that point then a developer will usually be reaching for PHP or Perl instead.

 

Server Side Includes are enabled by default with some web-hosting servers. If yours isn’t, you can use your .htaccess file to enable it, like this:

AddType text/html .shtml

AddHandler server-parsed .shtml

Options Indexes FollowSymLinks Includes

This should enable SSI for all files that have the .shtml extension. You can tell SSI to parse .html files using a directive like this:

AddHandler server-parsed .html

 

Why bother?

Well, this way lets you use SSI without alerting anyone to the fact that you are doing so. On top of that, if you change implementations later, you can hold on to your .html file extensions. The only fly in the ointment here is that every .html file will then be parsed with SSI. And if you have many .html files that don’t need SSI parsing, it makes the server work needlessly harder. Thus, bogging it down for no extra benefit.

 

SSI on your Index page

To avoid parsing every single.html file without using SSI on your index home, you have to stipulate in your .htaccess file. Because when the web server looks for the directory index page, it will be hunting for index.html by default. If you aren’t parsing .html files, you must name your index page “named index.shtml” if you want SSI to work. Because then, your server won’t automatically look for it. To make that happen just add:

DirectoryIndex index.shtml index.html

This lets the web server know that the index.shtml file is the main one for the directory. The second parameter, index.html is a failsafe. This gets referred when it can’t find index.shtml.

IP Blacklisting and IP Whitelisting with .htaccess

If you’ve had problems from certain users/IP addresses, there are .htaccess basics you can use to blacklist/block. Otherwise, you can do the opposite and whitelist/approve everyone from particular addresses if you want to exclude everybody else.

Blacklisting by IP

This will let you blacklist addresses (numbers are examples):

order allow,deny

deny from 444.33.55.6

deny from 735.34.6.

allow from all

The first line says to evaluate the allow directives before the deny directives. This makes allow from all the default state. In this case, only those which match the deny directives will be denied. If you switched it round to say deny,allow, then the last thing it looked at would be the allow from all directive. This allows everybody, and overrides the deny statements.

Take note of the third line, which says deny from 735.34.6. This isn’t a complete IP address, but that’s okay because it denies every IP address in that block. Ergo, anything that begins with 735.34.6. You can include as many IP addresses as you like, one on each line, with a deny from directive.

Whitelisting by IP

The opposite of blacklisting is whitelisting — restricting everyone except those you specify. As you might suspect, the order directive has to be turned back to front. So that you deny access to everyone at first, but then allow certain addresses after.

order deny,allow

deny from all

allow from 111.22.3.4

allow from 789.56.4.

Domain names instead of IP addresses

You can also block or allow users of a domain name. This is helpful if people are moving between IP addresses. But it won’t work against anyone who has control of their reverse-DNS IP address mapping.

order allow,deny

deny from forinstance.com

allow from all

This works for subdomains too. In the example above, you will also block visitors from abc forinstance.com.

Block Users by Referrer – .htaccess Guide

If a website contains a link to your site and someone follows it, we call it a ‘referrer’. But this doesn’t only work for clickable hyperlinks to your website. Any page on the internet can link to your images directly. This is called hotlinking. It often steals your bandwidth, it can infringe on your copyright – and you don’t even get extra traffic from it. And it’s not just images either. A stranger can link to your other resources like CSS files and JS scripts, too.

This is bound to happen a little bit and most site owners tolerate it. But it’s the kind of thing that can easily escalate into something more abusive. And there are times when in-text clickable hyperlinks can cause you problems too. Like when they’re from troublesome or nefarious websites. These are just a few of the reasons why you may decide to deny requests that originate with particular referrers.

If you need to do this, you’ll have to activate the mod_rewrite module. Most web hosts enable it automatically. But if yours doesn’t, or you can’t tell if they have, you should get in touch and ask. If they’re reluctant to enable it – maybe think about getting a new host.

.htaccess basics – Directives that block a referrer depend on the mod_rewrite engine.

The code to block by referrer looks like this:

RewriteEngine on

RewriteCond % ^http://.*forinstance.com [NC,OR]

RewriteCond % ^http://.* forinstance2.com [NC,OR]

RewriteCond % ^http://.* forinstance3.com [NC]

RewriteRule .* - [F]

It’s slightly fiddly, so let’s go through it.

RewriteEngine on, on the first line tells the parser that some rewrite directives are on the way. Each of lines 2,3 and 4 blocks a single referring domain. To change this for your own purposes you would alter the domain name part (forinstance) and extension (.com). T

he back-slash in front of the .com is an escape character. The pattern matching used in the domain name is a standard expression. And the dot has a meaning in RegEx. So it must be “escaped” by using “/”.

The NC in the brackets is there to specify that the match shouldn’t be case sensitive. The OR literally means “or”, and indicates that more rules are on the way. As long as the URL is this one, this one or this one, go along with this rewrite rule.

The final line is the rewrite rule itself. The [F] stands for “Forbidden.” If a request comes from a referrer like the ones on the list, then it will be blocked. And a 403 Forbidden error will arrive.

An .htaccess Guide to Blocking Bots and Web Scrapers

Sometimes it isn’t even people trying to eat up your bandwidth, it’s robots. These programs come and lift your site information, typically to republish under some low-quality SEO outfit. There are genuine bots out there, such as the ones that come from the big search engines. But others are almost like cockroaches, scavenging and doing you no good whatsoever.

To date, the industry has identified hundreds of bots. You won’t ever be able to block them all, but at least, as many as possible. Here are some rewrite rules that will trip up 350+ known bots.

Specifying a Default File for a Directory

When a server receives a URL request but with no specified file name, it assumes the URL refers to a directory. So, here is a .htaccess guide on what to do. If you request http: forinstance.com, Apache (and most servers) will look for the domain in the root directory. Typically /public_html or something like it, such as /forinstance-com – in order to find the default file. The default file will be called index.html by default. Because when the Internet was young, websites were often just a bunch of docs bundled together. And “home” pages were often no more than an index, so that you knew where everything was.

Of course, nowadays you might not want index.html to be the default page. Perhaps because you may want a different file type. So index.shtml, index.xml, or index.php might be more appropriate. Or maybe you don’t think of your home page as an “index,” and want to call it something else. Like home.html or primary.html.

  • Apache, Apache Config, Htaccess
  • 0 Users Found This Useful
Was this answer helpful?

Related Articles

Websites and Domains

As described in the chapter Quick Start with ApaxonHost, creating your web presence always starts...

Restricting Access to Content

If you want to limit public access to specific areas of your website, you can secure...

Site Descriptions

You can add some notes, or a description, to a site. This description will be...

Previewing Websites

After a website has been created, it does not become available on the Internet right...

Viewing Website Error Log

To view the error log for a particular website Go to Websites & Domains >...