The tussle between Anonymity and Traceability has
been going on for many years. Law Enforcement Agencies are pushing
for lesser Anonymity and greater Traceability whereas Civil Liberty
Groups and Netizens are demanding greater Anonymity and Privacy. The
battle is epic and it is not going to end soon.
Anonymity has both uses and misuses. Just like any
legitimate Invention and Technology, Internet can be both abused for
criminal activities and used for greater benefit of Human race.
Similarly, Internet has also many benefits and it is used in numerous
manner, some known while other still unknown.
While the known part can be viewed and analysed
through numerous methods including search results through search
engines yet a majority of World Wide Web (WWW) is still out of the
plain sight and reach of most of us. This hidden Web is known by many
as Deep Web though I personally prefer to call it “Hidden
Internet”.
The Hidden Internet may be residing in plain sight
or it may be hidden by using special techniques and methodologies.
For instance, access to a Website or Blog may be restricted to its
owners alone through use of robots.txt file. However, even such
restricted Blog can be accessed through use of cracking methods or by
the owner company of the concerned Blog.
Further, there are many Crawlers that do not comply
with the settings and restrictions placed by robots.txt files. This
may expose those files and documents that are otherwise not intended
to be disclosed. This is where Google Hacking comes into picture.
By its very nature, Hidden Internet is designed to
defeat indexing of its contents by search engines. Its contents are
visible and accessible to only selective few who have not only the
knowledge of such contents but also have means and methods to access
the same.
Hidden Internet is different from Dark Internet as
in the case of former the Computers storing and processing the
contents are still accessible though to selective few alone. Dark
Internet on the other hand is a group of Computers that are simply
out of the Internet and cannot be accessed at all.
According to an estimate based upon the study of
University of California, Berkeley in the year 2001, Hidden Internet
consists of about 7,500 terabytes of information. Another study in
2004 has indicated that there are around 300,000 deep web sites in
the entire Hidden Internet and around 14,000 deep web sites existed
in the Russian part of the Web in 2006. Thus, Hidden Internet is much
bigger and carries more information that our present accessible
Internet.
The contents and information stored in the Hidden
Internet can be found in the form of Dynamic Contents, Unlinked
Contents, Private Web, Contextual Web, Limited Access Contents,
Scripted Contents, Non-HTML/Text Contents, etc. These contents and
information is not available for normal search engines for indexing.
Search engines are now planning to tackle this issue and they are
devising methods to access contents and information residing in the
Hidden Internet.
In fact, some search engines have been specifically
designed to access contents of Hidden Internet. However, there is
still a long road to cover by search engines and Law Enforcement
Agencies around the World to tackle the vices of Hidden Internet.
Efforts in the direction of making the entire search process
“Automatic” are going on at global level.
The more difficult challenge is to categorise
and map the information extracted from multiple Hidden
Internet sources according to end-user needs. Hidden Internet search
reports cannot display URLs like traditional search reports. End
users expect their search tools to not only find what they are
looking for quickly, but to be intuitive and user-friendly. In order
to be meaningful, the search reports have to offer some depth to the
nature of content that underlie the sources or else the end-user will
be lost in the sea of URLs that do not indicate what content lies
beneath them.
The format in which search results are to be
presented varies widely by the particular topic of the search and the
type of content being exposed. The challenge is to find and map
similar data elements from multiple disparate sources so that search
results may be exposed in a unified format on the search report
irrespective of their source.
I would try to cover the Security, Forensics and Law
Enforcement Issues of Hidden Internet in my subsequent posts. This
post is intended to provide the basic level information about Hidden
Internet while discussing our subsequent posts and nothing more.
No comments:
Post a Comment