In the post “The IP Networking Fundamentals for Dummies” I have described how computers connected to the Internet communicate each other. The base requirement for successful communication between computers is unique recipient (or sender) identification, so communication background is able to deliver a message to exact recipient. For those purposes the Internet is using IP addresses: unique numeric identifier of the message source or destination.
But using IP addresses for users is not comfortable. The main reason for this is in our brains. The brain is able to remember words easily, but with numbers, especially long numbers (and IP address can have up to 12 digits), the brain is not so familiar.
You can try it easily. Read next two lines and then stop reading of the article and try to do something else for 5-10 minutes. After this time period try to write content of those two lines on the paper exactly as you remember it from your reading and then compare your result with original.
Done? What are your results? I believe that you remember content of the second line easily, but remember exact IP values from first line was more complicated to you.
In light of previous facts we need something more human compatible to specify addresses of computers on the Internet. We need something like the Internet Yellow pages. Something which assigns human readable names to IP addresses.
In dark ages of the ARPANET, predecessor of the Internet, computers used own file called “hosts” to define human readable labels for IP addresses. In this file administrator of the system can specify human readable “hostname” and associated IP address. The hostname is used as term for human readable label of the computer or other device connected to a computer network. This device or computer is called “host”.
Here you can find an example of the hosts file:
You can see that format of the file is simple. It is line oriented file and each line can define one or more labels aka hostnames for one IP address. Values on one line are divided by whitespace characters, typically spaces or tabulators. IP address is first line value, all next values are labels associated with this IP address. In our example the hostname router is translated to IP 192.168.1.1 and ubuntu or www.ubuntu.com are hostnames for IP 126.96.36.199.
This simple solution for hostname management works great for small amount of managed hosts. Content of hosts file has been defined by system administrator to host and host users needs. But how the administrator knew, which IP address use and what services are managed by this IP?
The ARPANET, predecessor of the Internet had registration authority NIC (Network Information Center) where all hosts was registered and NIC provided information about all registered hosts in one large text file HOSTS.TXT. Administrators of connected computers can picked up info about interesting hosts and can define appropriate records in own hosts file(s).
When the ARPANET size started to grow rapidly, network names management by described mechanism started to be problematic for many reasons. One of them was resource consumption and manageability of this centralized hosts.txt file at NIC. Next one was possible ambiguity of host names on different hosts, because content of hosts file was not managed by a central authority. Hosts content was dependent on each computer administrator and then each ARPANET host can had different hosts file and different label for same IP.
Due those reasons, a new solution for hostname management had to be found. Then the DNS (Domain Name System) service has been specified in 1983.
DNS is hierarchical system for maintaining human readable hostnames.
Typical hostname specified by DNS looks like that:
The DNS name is constructed from parts (called labels) and divided by dot character. Each part can be long up to 63 ASCII characters, DNS name can have up to 127 labels and whole DNS name cannot exceed length of 253 characters. Those limits can be restricted by additional rules by some domain registries.
Each character in label can be technically any character which can be represented by octet (byte). But in practice only characters a-z, A-Z, numeric chars 0-9 and hyphen (-) are allowed in most cases. In those days some domains allow to use special national characters from their national alphabets. Details are beyond the scope of the article. More details you can see here IDNA IDNA
Domain names are used as case insensitive, then
will provide same results.
The DNS names are hierarchically organized from right to left. The highest DNS hierarchy part is the most right part of the DNS name. First invisible part of each DNS name is DNS root (root domain). The DNS root has no text representation and purpose of the DNS root will be described later.
So first visible part of the DNS name is the top-level domain (TLD). In previous examples we used com and net TLDs. Each label on the left from this point specifies subdomain of the previous domain. So again, from previous examples, the example.com is subdomain of the com TLD. And www.example.com is subdomain from example.com. As I mentioned before this “descending tree” can have up to 127 levels.
In table below you can see decomposed DNS names to their parts:
|COM domain name||Net domain name|
|Top Level Domain (TLD)||com||net|
|Second Level Domain||example||example|
|Third Level Domain||www||www|
Because essential requirement for DNS names is to be unique in whole DNS space, we cannot have two same DNS names assigned to different computers. It is same as requirement for IP addresses – no two computers with same IP address.
But we can have two or more DNS names associated with one IP address. Typical use of this is one server which serves more services. For example www.example.com and ftp.example.com can points on one IP and computer with this IP address associated will works as Web server and FTP server.
All DNS names from previous examples are Fully Qualified Domain Names (FQDN). It means that the name represents full DNS hierarchy, including the TLD domain in relation to the root domain and this name is unique in whole DNS namespace.
Very often is FQDN constructed as concatenation of the host’s local name and parent domain name. Host’s local name have to be unique inside the parent domain, but we can have hosts with same name in different domains, so host www in domains example.com and example.net are valid and possible.
In light of previously noticed facts, DNS name associated with IP address is hostname. DNS names www.example.com and example.com are hostnames because they are associated with IP 188.8.131.52. But test.example.com is not hostname, because it has not associated IP address.
Opposite to FDQN names are Relative DNS names. They are very often used inside parent domains to shorten hostnames and to increase user comfort.
DNS resolvers manage those relative names by special way. Resolver tries to resolve only this relative DNS name first. If it failed, then resolver tries to join specific domain name(s) from own DNS suffixes list with the host name and tries to resolve results of the join. Resolve process will stop on first success or when all possible combinations are tried. If the example.com domain will be specified in resolver DNS suffixes list, then for www relative DNS name we will get result www.example.com.
DNS Hierarchy and Distributed Database System
DNS is hierarchical system for maintaining human readable hostnames maintained by distributed database. As I mentioned before in part targeted on ARPANET history, the centralized database concept reached its limits quickly. DNS system has been designed as more powerful replacement of this system.
Main idea of the DNS system is to divide whole DNS namespace to independent zones. Each zone is maintained by authoritative database called “nameserver”. All authoritative nameservers are organized in tree hierarchy. Each node of the tree is represented by authoritative nameserver. And all nameservers together create whole DNS distributed Database system.
Any part of zone maintained by nameserver can be delegate to different name server and then new zone is created. Content of this zone is then managed by this delegated nameserver, which became to be authoritative for this zone.
The strict requirement in DNS system is to have one primary name server and one or more secondary nameserver(s) for each domain. This requirement grants that DNS will works even if one of the nameservers managing a domain starts to be in any trouble.
On following schema is presented real structure of several domains:
As you can see, the hierarchy always starts with root nameserver. Whole Internet is managed by 13 root nameservers. On the schema is presented only d.root-servers.net, but really exist servers a.root-servers.net ~ m.root-servers.net. Those root servers have synchronized content and manages records for all known Top Level Domain nameservers.
Next in the hierarchy are nameservers for TLD domains. List of all registered TLD domains can be found on IANA pages, exactly here: http://www.iana.org/domains/root/db/
Exist three types of TLD domains:
- country code top-level domains (ccTLD)
- generic top-level domains (gTLD)
- infrastructure top-level domain ARPA
From real life, you probably know first two categories, generic TLDs:
|aero||the air transport industry|
|asia||entities in the Asia-Pacific region|
|biz||business related sites|
|cat||entities using Catalan language|
|com||primarily commercial organizations, but unrestricted|
|gov||United States government organizations and other entities|
|int||International organizations (un.int for example)|
|jobs||Sites related to jobs and employment|
|mil||United States military|
|mobi||mobile devices related sites|
|name||families and individuals|
|net||originally for network infrastructures, now unrestricted|
|org||originally for organizations not clearly falling within the other gTLDs, now unrestricted|
|tel||services involving connections between the telephone network and the Internet|
|travel||travel agents, airlines, hoteliers, tourism bureaus, etc.|
or country code TLDs (.us, .cz, .uk etc.) Country code TLDs are constructed from 2 chars in most cases, with several exceptions.
IN 2011 the IANA removed most restrictions for construction of generic top-level domain names (gTLDs) and then TLD boom has started. Now we can find thousands TLDs;from city names to company names, from technologies to hobbies… And list is still growing. You can find it here: IANA Root Zone Database
On the third level of the hierarchy you can find nasmeservers handling real domain content. Very often these name servers are owned by domain registrars and when you registered a domain, then registrar offer an interface to the nameserver for you to handle your domain content. But large companies as Dell for example has own nameservers to handle domain content.
On the schema you can see one interesting fact. The dell.com and dell.cz domain is handled by same nameserver ns1.us.dell.com. It is completely possible state, so in tree hierarchy, any node on lower level can have more than one parent from upper level(s).
Last interesting thing what you can see on the schema is delegation of some dell.com subdomains to next nameservers level. For example ins.del.com subdomain is handled by own nameserver auspc3dns4.us.dell.com.
DNS record Types
Now, when we defined the DNS organization, we have to mention which kinds of records are managed by the DNS system. In next table, you can find most common DNS record types. If you can see all other DNS record types then full list of DSN record types you can find here http://en.wikipedia.org/wiki/List_of_DNS_record_types
|A||IPv4 IP address|
|AAAA||IPv6 IP address|
|CNAME||DNS name alias. Contains different DNS name which have to be resolved again|
|MX||Mail exchange system for the domain|
|NS||Nameserver record to delegate the domain to another nameserver|
First two record types, A and AAAA specified the DNS name mapping to IP address. A record is used for classical IPv4 address mapping. To support the new IPv6 protocol, the new record type AAAA has been added to DNS system to grant DNS functionality with IPv6 protocol.
Next record type is CNAME. This record is used to map one DNS name to another. In practice it means that you can define one or more DNS names pointing to next one DNS name with A or AAAA record. On the previously presented DNS hierarchy schema, you can find that support.dell.cz is CNAME alias for wwwredirect.ins.euro.dell.com. If you can trace the next dell DNS name support.dell.de, then you will find that it has the same CNAME record: wwwredirect.ins.euro.dell.com.
The MX record is another important item. It specified “Mail Exchange system” for the domain. In practice it specified the IP address of the mail server for the domain. If you write an e-mail to address email@example.com, then your mail system will check the MX record for example.com domain, and then will try to contact mail server on this IP address and send your message for the “test” account to the server.
The NS record is used for domain delegation to another DNS nameserver. This mechanism has been described earlier so here is enough to notice that this delegation is provided by this record type.
DNS Resolve Process
Now, when we have specified domain name format and DNS distributed DB hierarchy, we can describe a mechanism how to resolve the DNS name to associated IP address. DNS system component responsible for this process is called “Resolver”. Resolver initiate and sequencing queries to DNS nameservers to resolve the DNS name to final result (IP address).
The resolver can initiate two kinds of the DNS query to the nameserver:
- Non-recursive query
- Recursive query
If the DNS nameserver obtains a Non-recursive query, then it checks if it is authoritative nameserver for queried domain name and if it is, then it provides authoritative resolve answer with appropriate DNS record. Otherwise the nameserver generates a reply where it specifies other nameserver which can client contact to resolve the query.
If the DNS nameserver obtains a Recursive query, then at first it provides same step as in non-recursive query. The nameserver checks if it is authoritative nameserver for queried domain name and if it is, then it provides authoritative resolve answer with appropriate DNS record. Otherwise the nameserver query other nameserver(s), which can resolve the query to provide full query resolution for the client.
To be more specific, I can describe here the recursive resolution algorithm with more details:
- The resolver contacts a nameserver with the query.
- If the server is authoritative for the request then server resolve query to final result and provide reply to the resolver.
- If the server is not authoritative for the request, then it replies with address of other nameserver which can be contacted to resolve the query.
- The resolver returns to point 1) but query the nameserver which ID obtains in step 3).
These steps are provided recursively until the authoritative nameserver is reached and query finally resolved.
The recursive query always provide the query resolution result to the client, or provides an error if the resolution cannot be completed, for example because if the requested DNS name doesn’t exists, or all authoritative servers (primary, and all secondary servers) are out of order or disconnected from network.
The DNS nameserver doesn’t need to support recursive query. In this case it returns error and the resolver have to provide recursive query resolution by another way.
Maybe you have a question, how is granted, that the resolver will get a result always? And how is granted that the nameserver know which other nameserver can be better to resolve the query?
The answer is simple. The resolver must always know how to contact at least one nameserver. And each nameserver must know how to contact at least one root DNS nameserver. If those conditions are met, then is granted that we always reach the authoritative nameserver for the query by climbing the DNS tree hierarchy, because if the nameserver queried in first iteration is not authoritative for queried name, it replies with root nameserver ID, and then resolution can be successfully completed, because root DNS nameserver can reply with appropriate TLD nameserver, this TLD nameserver can replied with domain nameserver etc.
I believe, that one example is better than two pages of description, so I’ll provide here an example how the DNS system provides resolution for the DNS name www.zive.cz, the most popular computer magazine here in Czech Republic.
To obtain the example I have used the DiG tool, the part of the most popular DNS server BIND. (http://www.isc.org/software/bind) and can be installed from Linux package repositories.
If you are using only Windows OS, then you can download DiG it from this page: http://members.shaw.ca/nicholas.fong/dig/
Finally, you can use DiG online on this web page: http://www.kloth.net/services/dig.php
Here is the DiG output for www.zive.cz:
From previous dump, you can see that DiG has started with a localhost nameserver (127.0.0.1) from which it obtains list of available root nameservers:
It selects the F.ROOT-SERVERS.NET and queried him again. From this Nameserver DiG obtains again list of TLD nameservers, now authoritative for the .cz domain:
From this reply the DiG selects the f.ns.nic.cz nameserver and queried it again. Then f.ns.nic.cz replies that zive.cz is managed by two nameservers:
Finally, the DiG queried the ns.cpress.cz and obtains Final response with queried A record which said that www.zive.cz corresponds with IP 184.108.40.206.
If you take look on the DNS hierarchy schema provided earlier, you can see that DiG result corresponds with this schema.
From previous paragraphs you can see that the recursive query always provide deterministic translation result, but this query is expensive for DNS resources and time because to provide full translation result the query can generate several requests to different name servers. If all computers connected to the Internet will provide queries only by this way, then we reach another bottleneck on the root DNS nameservers quickly, because those serves will have to reply trillions of queries every day.
To avoid this kind of problem, the DNS system contains caching mechanism. Any DNS resolver or DNS nameserver can implement own DSN record cache.
When the resolver obtains final record for a query, the record contains one special field called TTL (Time to Live). This field specified validity time for resolved record and is filled by authoritative nameserver for the record. Resolver or nameserver can store the resolved record in own DNS records cache and can return this cached record instead of new resolve process if the same query will arrive in TTL specified time period.
The next cache improvement is that cache can be implemented on any level where the DNS requests are resolved. Very often your ISP implements the caching DNS nameserver inside its infrastructure and assigns this DNS nameserver to your computer by DHCP or forces you to do this manually during the network connection configuration.
If you are using wireless router to connect your computers to the internet at home, then those routers implements the DNS cache too and assign own IP as DNS server to your computer by DHCP.
The Windows operating system has own DNS resolver and it contains own DNS cache, for example. If you can see the content of your Windows DNS cache, use this command inside the command line console:
To completely flush the cache content, you can use
The Linux distributions have the DNS cache as optional component. Very often it is handled by nscd daemon. To get the DNS cache working on the Ubuntu, install the nscd daemon by command:
I don’t find any command how to dump the cache content of the nscd daemon, only flush the nscd cache content is possible by restart the daemon:
As you can see, that caching of the DNS records is widely implemented and it improves the DNS system efficiency rapidly and grants that DNS nameservers are not overloaded by many DNS requests.
Hosts file DNS overriding
To cover the DNS resolving process completely, I must mention here that hosts file described on the beginning of the article is still in game in modern operating systems.
On windows platform the file is located in:
At Linux you can find the file on path:
The file content was not changed and holds the format described before. The content of the file is used during the DNS name resolution process and is preceded to any other resolve method, so if you specified here this record:
Then example.com will point to the same address as Ubuntu.com and if you will type example.com to the web browser, you will get Ubuntu homepage instead of the classical example.com notification.
This behavior can be very useful if you are debugging an own web site with web server running on localhost for example. By line:
You will grant that the web page requests for this domain will be served by your localhost web server instead of the webhosting server for the domain.
I believe that you will find this article useful and information presented here gives you brief description of the interesting DNS technology. I have compiled this article from my own knowledge and many public sources. During the article writing process I have tried to be so accurate as possible but if you find any piece of information here wrong or incomplete, please feel free to contact me in comments or directly by e-mail.
Note: This article was originally published on my previous site polach.cc. Because I had to change the domain I have moved the article here from the previous site.