Thursday, July 19, 2012

Hidden Internet: The Unexplored, Hidden And Deep Web And Internet

The tussle between Anonymity and Traceability has been going on for many years. Law Enforcement Agencies are pushing for lesser Anonymity and greater Traceability whereas Civil Liberty Groups and Netizens are demanding greater Anonymity and Privacy. The battle is epic and it is not going to end soon.

Anonymity has both uses and misuses. Just like any legitimate Invention and Technology, Internet can be both abused for criminal activities and used for greater benefit of Human race. Similarly, Internet has also many benefits and it is used in numerous manner, some known while other still unknown.

While the known part can be viewed and analysed through numerous methods including search results through search engines yet a majority of World Wide Web (WWW) is still out of the plain sight and reach of most of us. This hidden Web is known by many as Deep Web though I personally prefer to call it “Hidden Internet”.

The Hidden Internet may be residing in plain sight or it may be hidden by using special techniques and methodologies. For instance, access to a Website or Blog may be restricted to its owners alone through use of robots.txt file. However, even such restricted Blog can be accessed through use of cracking methods or by the owner company of the concerned Blog.

Further, there are many Crawlers that do not comply with the settings and restrictions placed by robots.txt files. This may expose those files and documents that are otherwise not intended to be disclosed. This is where Google Hacking comes into picture.

By its very nature, Hidden Internet is designed to defeat indexing of its contents by search engines. Its contents are visible and accessible to only selective few who have not only the knowledge of such contents but also have means and methods to access the same.

Hidden Internet is different from Dark Internet as in the case of former the Computers storing and processing the contents are still accessible though to selective few alone. Dark Internet on the other hand is a group of Computers that are simply out of the Internet and cannot be accessed at all.

According to an estimate based upon the study of University of California, Berkeley in the year 2001, Hidden Internet consists of about 7,500 terabytes of information. Another study in 2004 has indicated that there are around 300,000 deep web sites in the entire Hidden Internet and around 14,000 deep web sites existed in the Russian part of the Web in 2006. Thus, Hidden Internet is much bigger and carries more information that our present accessible Internet.

The contents and information stored in the Hidden Internet can be found in the form of Dynamic Contents, Unlinked Contents, Private Web, Contextual Web, Limited Access Contents, Scripted Contents, Non-HTML/Text Contents, etc. These contents and information is not available for normal search engines for indexing. Search engines are now planning to tackle this issue and they are devising methods to access contents and information residing in the Hidden Internet.

In fact, some search engines have been specifically designed to access contents of Hidden Internet. However, there is still a long road to cover by search engines and Law Enforcement Agencies around the World to tackle the vices of Hidden Internet. Efforts in the direction of making the entire search process “Automatic” are going on at global level.

The more difficult challenge is to categorise and map the information extracted from multiple Hidden Internet sources according to end-user needs. Hidden Internet search reports cannot display URLs like traditional search reports. End users expect their search tools to not only find what they are looking for quickly, but to be intuitive and user-friendly. In order to be meaningful, the search reports have to offer some depth to the nature of content that underlie the sources or else the end-user will be lost in the sea of URLs that do not indicate what content lies beneath them.

The format in which search results are to be presented varies widely by the particular topic of the search and the type of content being exposed. The challenge is to find and map similar data elements from multiple disparate sources so that search results may be exposed in a unified format on the search report irrespective of their source.

I would try to cover the Security, Forensics and Law Enforcement Issues of Hidden Internet in my subsequent posts. This post is intended to provide the basic level information about Hidden Internet while discussing our subsequent posts and nothing more.

No comments:

Post a Comment