What is Web Hosting?

The pieces, and how they fit together

At this point in human development, the Web and it’s websites are an integral part of most people’s lives. In a few short years, the Web has touched and transformed the majority of our society’s communication, information, and commerce interactions. As someone interested in, or currently, hosting a website, it’s important to know the necessary components that make the Web and it’s websites possible.

What is the Web?

The Web is really just a huge network. That means it’s a way to send information between computers over wires and/or via radio waves. Of course, now “computers” can mean anything from a laptop to a game machine to a mobile phone to some cars and even refrigerators but all of these contain at least one computer.

The system is kind of like the mail system, where computers send “packets” of information instead of envelopes and packages. In order for the packet to reach it’s proper destination, it needs to have an address.

DNS and the IP Address

The addresses on the Internet are called IP addresses (Internet Protocol address), and it’s currently a set of four 1 to 3 digit numbers separated by periods – for example, the Itabix main site’s IP address is 69.194.230.223 at the time of this writing.

Okay, 69.194.230.223 makes sense to a computer or a geek, but for most of us humans we need an easier way to remember the address of a website or an email address, so the Internet includes what’s called DNS – Domain Name System – that translates between the human name – the domain name – and the computer number. Each of those translations is called a lookup.

The DNS servers, or domain name servers, that serve as the directories for the web are scattered throughout the web so that translating between domain names and IP addresses is as fast as possible. There are the root nameservers, which are kind of the bosses of the system and handle what computers can act as nameservers and who relay DNS changes to the other nameservers, and then lots of lesser nameservers. Itabix actually has 12 nameservers for all of our domains, scattered around the world – six times as many as most hosts – to increase lookup speed and as protection in case a nameserver goes down.

All of these nameservers talk to each other. The nameservers where your domain is registered (in our case, the 12 nameservers) are called the “authoritative” nameservers, meaning that they are the ones trusted with the definitive word on which IP addresses your domain name is associated with. When a change, addition or deletion of a domain name is made, the authoritative nameserver sends a message to one of the root nameservers, which disperses the change out to all the nameservers. This is called “propogation”, and it can take up to 36 hours. During this time, some nameservers on the Web will have the old information and some will have the new information, which is why webmasters sometimes have to make choices about when and how a nameserver change is made. For example, during the propogation of your nameserver change, some of the email people send you will end up with your old mailserver and some will end up with your new mailserver.

Domain Names – www.yoursite.com

When you type www.google.com into your browser’s address bar, your browser sends out a message to a DNS server asking it for the location (the computer) that is associated with that domain name. The message is then directed to the appropriate computer along with the address to reply to, and if there is a reply to that message the computer sends the requested information back to that “reply address” and it pops up as a page, a file, or an email.

That information can include a file that is downloaded to your computer, including images, or it can be a web page – code that instructs your browser how to “paint” the page you want to look at (where to put text, images, graphics, etc.)

There’s a TON of what are called TLD’s – top level domains. The most popular ones are .com, .net, .org (for non-profits), .info, .us. Recently .xxx was added for sexually explicit sites. Although there’s no technical difference between these, the .com domains are most in demand because they’re the easiest to remember. If you know a company’s name (Itabix, for example) you’re going to go first to that name as a .com (Itabix.com).

By the way, there’s nothing sacred about the www. at the beginning of a domain name. When Tim Berners-Lee was creating the framework for the Internet, he used www. to mean a website on the World Wide Web (WWW) but of course, all websites are on the WWW. Back then, there were a bunch of other possible uses for the Internet (ftp. is still a useful prepend for sites using file transfer protocol to transfer files, so that’s still around). At this point, most sites do not differentiate, you can use the www. or not (ours actually redirects from www.itabix.com to itabix.com). There are still a few that require www., such as www.nasa.gov.

Servers

The computer that passes the data in and out is called a “server” because it’s main purpose is to serve data to a lot of computers (to transfer data in and out). Depending on the type of data it serves, it may be called a webserver, a mailserver, a fileserver, a nameserver, a media server, etc. In fact, it’s possible to have a webserver, mailserver, media server, and ftp server all on the same machine (server) or even to have a webserver running on a bunch of machines (distributed server) – so really when we say “webserver”, we’re talking about the program running on the computer(s) as well as the server itself.

A webserver is not like a “regular” desktop computer, most are designed to be fault-tolerant and to access and transfer information extremely fast, and they are usually connected to a very fast, redundant Internet connection that is close to the Internet backbone (the servers and data lines that compose the Internet’s core) so that there’s minimal lag between when you request data and when it arrives back to your computer.

Most webservers in the world run the Linux operating system, a derivative of UNIX that is arguably much more stable than Windows, and which is free and completely documented and changeable (Open Source). Because it is contributed to and checked by literally hundreds of thousands of very smart people, it is lean, fast and secure.

Running out of IP’s

So you would think that each domain name and each computer on the Web has a corresponding IP address, and in fact that’s how it was on the Web for the first ten years. Then it became apparent that the Web was going to be bigger than anyone imagined – the original IP address allotment of 4.29 billion addresses were quickly being scooped up. There’s a new protocol which the Web is slowly transitioning to (IPV6, Internet Protocol Version 6, the current version is IPV4), but we needed something to tide us over so there’s a couple of workarounds for the problem.

For your computer, which is commonly called the “client”, one part of the solution is called NAT – network address translation – and it means a lot of computers can share one IP address.

For domain names, each webserver keeps a list of it’s websites and their domain names. When a request comes in, it includes the domain name so the webserver knows which one is being called. It’s therefore possible for one IP address to be associated with hundreds or thousands of websites.

The Site

A website was originally just a collection of files containing HTML code (HyperText Markup Language) that, when loaded into your browser, instructed it on how to format text and where to put images. It was originally developed to make it easy for professors and such to distribute information easily.[pullquote align=”left”]Fact: When Bill Clinton was elected President, there were a total of 50 websites[/pullquote]Once websites started to become popular with the general public, people wanted a way to make it more like printed pages – more formatting ability and dynamic content (content that changes depending on any number of factors like user information or preference, time of day, location of the person, etc.). So the original HTML has been extended and coupled with CSS (cascading stylesheets, a way to set the position, formatting, color, font, etc. for objects in a page) and javascript (a way to run a program from a web page that is capable of interacting with the viewer, the server, and other computers on the Web). This combination is called DHTML (dynamic HTML) and it is the core of what’s called Web 2.0, the new interactive Internet.

Database-driven Websites

Also, people wanted a way to have more options for displaying information and formatting – specifically to display information from a database instead of a static unchanging file. Originally this type of website was very expensive and was reserved for large corporations and government agencies. But now, with programs such as WordPress, it is becoming the norm for websites to be database-driven.

With a database-driven website, the “page” is actually assembled “on-demand” when the browser requests it, so it’s possible to include pretty much any information you want within the page. Because of this, database-driven websites bring several benefits – content is completely changeable however you want to “program” it to interact with the viewer or any information available about them, the time, other computers, other information, etc. Also, it means that changing your website really means changing data in a database so it’s much more flexible and can be done through forms in a web browser, instead of using a program such as Dreamweaver. In this way, a person with average computer skills is able to create and manage a website.

WordPress and other content management systems usually use add-ons (called Plugins with WordPress) that add to it’s functionality for specific tasks, one of the most common being a shopping cart. There are literally thousands of plugins available for WordPress, so most of the time you don’t need to pay a programmer to make a custom application for you.

URL’s and Hyperlinks

All of these resources on the Web would be useless if we couldn’t easily get from one to another. The job of linking between all this information is the “hyperlink”, or these days just called a “link”. A link is a destination address that can include both a website and a “page” or file, or even a place on a page (called an anchor). Even images are displayed using specialized links.

The destination address is called a URL, Universal Resource Locator. It includes the domain name and the path to the resource (such as http://www.mysite.com/Folder1/page1.html) and it can also include things called subdomains such as http://clients.mysite.com which is a website of it’s own (although it may include parts of the main site). Subdomains are used to separate content from the rest of the website, such as a secure shopping cart or an employee site.

The idea of “hyperlinks” was very innovative when it was developed; up until html, most information was displayed linearly (like reading a book — you couldn’t jump from one thing to another in the middle (unless you opened another book). Hyperlinks make information more of a “web” rather than a line, tying together information in a truly relational way.

Secure Sockets

One thing people quickly learned once sensitive information started being passed over the web, was that ill-intentioned people could “peek” at the data stream as it went by and grab information, such as passwords and credit card numbers. The answer to that was a form of encryption called SSL or Secure Socket Layers. With SSL, a secure server passes a public encryption key and a “certificate” to your browser. The public key allows the browser to decode and encode information that is passed between the webserver and the client (your browser), the certificate authenticates the key (says it’s really associated with that domain).

The encryption key is a way to ensure that only that client will be able to see the information sent by that server. Browsers have two ways for you to be sure a site or page is secure – the address in the location bar of the browser will start with https:// instead of http://, and it will show a little lock somewhere, either by the location bar or at the bottom of the browser.

There are two encryption keys – a private key that only the webserver (and the certifying authority, if there is one) knows, and a client key that is passed to the client. As I said, there is also a certificate saying that the key is tied to that domain. You can actually generate these keys and certificate for free on a webserver, but then no one would know if “mysite.com” is really owned by you. So there are certification authorities such as Thawte and Verisign that people pay yearly to check into whether you are real or not, and then they certify the keys. If you go to a site with an unregistered secure certificate, you’ll get a warning and the option to “trust” the site, or to leave it.

Most websites don’t need a secure certificate and SSL, it’s mostly used for shopping carts and for sites that accept, send or display sensitive information. One thing about secure sites is that they have to have their own IP address, called a dedicated IP, which usually costs a few dollars a month.

You don’t need a secure certificate and SSL to sell things over the Web, you can use services such as PayPal or Google Checkout to manage the secure payment. In this case, the customer will be redirected to the secure site for the transaction, and then returned to your site afterward.

Components of Web Hosting

So to have a website we need a few things:

1. A webserver (that’s us) – a computer with high-speed access to the Web and that is designed to be fault-tolerant and secure.
2. A domain name (we sell those)
3a. Files for pages, or
3b. A program such as WordPress and a database, to display “pages”.
4. Images and files
5. Other miscellaneous possibilities such as a media server, specialized programming for specific applications, etc.
6. Possibly a secure site (for shopping carts or data protection)