|
|
Scope
This paper addresses the type of information that can be
easily collected by a website, methods of
translating an IP address to a personal or corporate identity, and
possible ramifications
of the collection of this information.
What is an IP address?
Every computer connected to the Internet is identified by a unique number
called an Internet Protocol (IP) address. An individual computer is assigned an
IP address by an internet service provider (ISP).
When a computer sends data to another computer it addresses
the data with the IP address of the destination and includes
its own IP address as a return address. So whenever you
visit a webpage, send an email, or do anything on the Internet,
you are telling another computer your IP address.
Your IP address, at the time you accessed this web page,
was 38.107.191.102.
The web server hosting
this page was able to determine your IP address when you visited the
page.
For one device
to send data to and then receive data from another device such as sending a
search query to Google and the receiving the results in your web browser,
the sender must know the IP address of the receiver and the receiver must
know the IP address of the sender to send data back to it.
This is just
like sending a letter in the mail to a destination and writing the return
address on the envelope so the receiver knows where to send his response.
IP addresses are assigned in a hierarchical structure that facilitates
routing of information throughout the Internet. As a result,
IP addresses that are numerically close to each other tend to be assigned
to computers that are geographically close to each other.
Therefore, it typically possible to determine the general
geographic
location of a user.
What is personally identifiable information?
Personally identifiable information is information that can
be used to determine the identity of a particular individual.
The
Children's Online Privacy Protection Act (15 U.S.C 6501) includes in
its definition for "personal information"
individually identifiable information about an individual
collected online, including -
(A) a first and last name;
(B) a home or other physical address including street name
and name of a city or town;
(C) an e-mail address;
(D) a telephone number;
(E) a Social Security number;
(F) any other identifier that the [Federal Trade]
Commission determines permits the physical or online
contacting of a specific individual;
The last part of this definition (F) addresses privacy from the
standpoint of the right to be left alone.
The proposed Online Privacy Protection Act of 2003
(HR 69 of the 108th Congress) seeks to add:
(G) information that is maintained with, or can be searched or
retrieved by means of, data described in subparagraphs (A) through (F).
to the definitions listed above in a bill to protect those not protected
by the Children's Online Privacy Protect Act.
This addendum would include the use of numbers (such as IP addresses)
as personally identifiable information anytime it is associated with
any of the other definitions.
Information Collected
When a user visits a website not only can all of their activities including
the time of access, all pages viewed be collected but
also information about software installed on the user's computer.
When an IP address is collected along with this information,
the web surfing habits of companies and individuals can be tracked.
Simply by viewing a webpage with a web browser,
the following information about the visiting user and his computer
can be collected:
- IP Address
- Web pages accessed
- Time of access
- Any form information submitted including search queries
or personal information
- Web browser software
- Operating system
Turning an IP address into a name
Corporate identity
The identity of the administrator of an IP address can be easily found
by typing the IP address into a
WHOIS query tool. The
administrator of an IP address is typically an organization that
assigns IP addresses to individual computers such as an internet
service provider (ISP), university, or company.
Try a WHOIS
lookup with your IP address.
Individual identity
ISPs
IP addresses are allocated to computers by ISPs, therefore, an
ISP has the ability to track customers by associating
IP addresses with login names or other personally
identifying information. Doing so allows ISPs to charge users
based on connectivity and enforce terms of service. It is
reasonable to assume that an ISP will always know all IP addresses
in use and which customer is using which IP address.
Website operators
A website operator can store and associate IP addresses with all data
and pages a user views. Therefore, if a user ever submits personal
information through a web form, the operator can correlate the personal
information collected from an IP address with all other data associated
with that IP address including pages viewed, what information the user
viewed from the website, and probable geographic location of the user.
Some ISPs always assign the same IP address to the same computer. In
this situation the computer has a "static IP address." Alternatively,
a computer can be assigned a "dynamic IP address" meaning the computer
does not receive the same IP address every time it connects to the
Internet.
Users with static IP addresses can be tracked over long periods of
time, so if website operators traded information with each other,
correlations based on static IP address would be even more accurate
than correlating by name because IP addresses used on the Internet
are globally unique identifiers.
No computers on the Internet can
have the same IP address at the same time whereas many different people
may have the same name as others.
If a user has a dynamic IP address, only information collected
within a short time period (minutes to hours depending on user's and
ISP's behavior) can be correlated to track a single user. If information
over days or weeks is correlated using a dynamic IP address, the
information may represent several users rather than one distinct
user.
Email
When a user sends an email using Outlook, Eudora, or some other
email client, the IP address of the user is
embedded in the header of the email. So any person who receives
an email from another will know the sender's IP address. Email
is, in laws and many privacy policies, considered personally
identifiable or identifying information because it provides a
mechanism to directly contact a particular person. Since an
IP address becomes associated with an email address when a user
sends an email from his computer, the IP address also becomes
personally identifiable or identifying information.
Public postings
Some web applications record and publicly post the IP addresses of users.
A collaborative application called a Swiki used by the College of Computing
at the Georgia Institute of Technology posts
host names of
participants who make modifications to content. These
host names can be resolved to IP addresses with a
DNS lookup
utility. If a student posts his name on the Swiki, the IP address
can be correlated with the change that resulted in adding his name.
Therefore, the pages serve as something similar to a phonebook and gives
anyone with access to the Internet the ability to associate a student's name
with his IP address. Once the name is determined, the
online directory of
the Institute can provide both email and mailing addresses of the student.
Ramifications
The online history and behavior including pages visited, searches
performed, and posts made to message boards can be tracked by IP address.
Recently World Market Watch, Inc. published
a database
of web surfing trends of 200,000 organizations including
companies, universities, government and other organizations.
The report contains statistics on web browsers,
operating systems, and search engines used as well as activity and
visiting habits of organizations.
Remember, using only a
WHOIS query tool,
IP addresses collected could be correlated to company names to generate
such a database.
Some information collected in this manner can be mildly embarrassing such
as, according to the Market Watch Report, employees of Microsoft used
the Google search engine
three times more often than their own MSN search engine.
Brower type and operating system information collected can arm
malicious hackers with information that allows them to more
efficiently exploit security vulnerabilities and direct attacks
at a particular company.
Even if one website operator does not know the name or have personal
information associated with an IP address, trading information with
other website operators could yield such information and allow correlation
with email address, name, phone number and/or mailing address.
ISP logs of IP addresses assigned to customers made it possible for
the RIAA to
sue
file-swappers for copyright infringement.
Since an IP address
is used to contact a computer on the Internet, the IP address alone
is the only thing necessarily to launch an attack on a user's computer.
Current state
To assess some current opinions of the IP address as personally identifiable
information, I explored the privacy policies of various websites that
specifically commented on the IP address as personally identifiable information.
AltaVista
Description: Search engine
URL: http://www.altavista.com
Privacy Policy
The policy states that "web server logs automatically receive and record
anonymous information from your web browser including your IP Address."
However, AltaVista later defines "anonymous information" as
"information that ordinarily cannot be traced back to a particular person."
Google
Description: Search engine
URL: http://www.google.com
Privacy Policy
Google's privacy policy states that Google "collects limited non-personally
identifying information your browser makes available whenever you visit a
website. This log information includes your Internet Protocol address."
The keyword in this description is "identifying." In an
amicus brief concerning a case centered around the question
of the nature of an IP address as personally identifiable information,
the Electronic Frontier Foundation
mentions:
"Identifying" is the inflected form of "identify" which means "to recognize
as being or show to be the very person or thing known, described or claimed;
fix the identity of." Webster's New World Dictionary 696 (2nd college ed.
1986). "Identifiable," on the other hand, means "subject to identification;
capable of being identified". Webster's Third New Int'l Dict. 1123 (1986).
Therefore, "identifiable" is a lesser standard that may lead to
"identifying" but may not be an identifier by itself.
Slashdot
Description: News and forum
URL: http://www.slashdot.org
According to a statement by Rob Malda, creator of Slashdot, on 26 October 2004 at the
Georgia Institute of Technology, Slashdot does not store IP addresses
in plain text but encrypts them to increase confidentiality of users.
The Weather Channel
Description: Weather information
URL: http://www.weather.com
Privacy
Policy
The privacy policy of The Weather Channel defines IP addresses as
"random numbers assigned to individual computers which The Weather Channel uses to administer our Web site and help diagnose problems with our servers."
This definition of IP address is inaccurate and is misleading about
the nature of the IP address as personally identifiable information because
IP addresses are not random but assigned in an ordered way
that can identify a person using the address.
It is also interesting to note that weather.com includes a certificate
from TRUSTe, an organization that
seeks to enable
"individuals and organizations to establish trusting relationships based on
respect for personal identity and information in the evolving networked
world" (http://www.truste.com/about/mission_statement.php).
In
amicus
brief filled by TRUSTe in Klimas v. Comcast Corporation,
TRUSTe states that
"most Internet users do not even know what their IP address is" and that
if IP address were considered "personally identifiable information" then
"[t]his change would not be limited to IP addresses but would also include
any other anonymous identifiers that were 'capable' of being traced back to
the user no matter how accurate or difficult that process may be."
The premise of TRUSTe's argument for not recognizing IP addresses as
personally identifiable information is based on the typical Internet user's
ability to identify someone based on IP address. In the world of technology,
this kind of argument can quickly disintegrate with the promulgation
of information, collection of large databases, and easy to use software.
And the argument brings the question at what point does information
or ability to identify become ubiquitous enough and at what resolution
to create an uncomfortable invasion of privacy.
Appalachian State University Office of Cultural Affairs
Description: University office
URL: http://www.oca.appstate.edu/
Privacy
Policy
This privacy policy recognizes the ability for an IP address to be
personally identifiable by stating:
IP addresses are automatically logged by our server software. IP addresses may or may not be considered Personally Identifiable Information, depending on the practices of the user's ISP.
Conclusion
An IP address is a unique identifier that permits anyone who knows
it to contact a person's computer, identify the general geographic
location of a user, and track their online habits.
The only question that remains is ease of correlating an IP address
with a person's unique identity.
For ISPs it is a simple task to determine the identity of a customer
based on his IP address. For entities other than ISPs, it depends on
the behavior of the person and his ISP. For users with static IPs,
correlation can be very easy because the user can be tracked over long
periods of time. Every email the user sends, every form filled out,
every website visited announces and spreads an element
of his identify, his IP address.
A fundamental notion of privacy is the right to be left alone.
Since only an IP address is needed to contact and attack a person's
computer, knowledge of a person's IP address can infringe of this
right.
Many privacy policies do not recognized IP addresses as personally
identifiable information. This is misleading, especially for
users not familiar with the basics of Internet architecture.
Because IP addresses can be telling of physical location and
identity for many Internet users, both users and website
privacy policies should recognize IP addresses as personally
identifiable information.
Notwithstanding any language to the contrary, nothing contained
herein constitutes nor is intended to constitute an offer, inducement,
promise, or contract of any kind. The data contained herein is for
informational purposes only and is not represented to be error free.
Any links to non-Georgia Tech information are provided as a courtesy.
They are not intended to nor do they constitute an endorsement by the
author.
|
|