IP Address as a personal identifier
By: Jason Hurley
1 January 2005
 

Scope

This paper addresses the type of information that can be easily collected by a website, methods of translating an IP address to a personal or corporate identity, and possible ramifications of the collection of this information.

What is an IP address?

Every computer connected to the Internet is identified by a unique number called an Internet Protocol (IP) address. An individual computer is assigned an IP address by an internet service provider (ISP). When a computer sends data to another computer it addresses the data with the IP address of the destination and includes its own IP address as a return address. So whenever you visit a webpage, send an email, or do anything on the Internet, you are telling another computer your IP address.

Your IP address, at the time you accessed this web page, was 38.107.191.81. The web server hosting this page was able to determine your IP address when you visited the page.

For one device to send data to and then receive data from another device such as sending a search query to Google and the receiving the results in your web browser, the sender must know the IP address of the receiver and the receiver must know the IP address of the sender to send data back to it.

This is just like sending a letter in the mail to a destination and writing the return address on the envelope so the receiver knows where to send his response.

IP addresses are assigned in a hierarchical structure that facilitates routing of information throughout the Internet. As a result, IP addresses that are numerically close to each other tend to be assigned to computers that are geographically close to each other. Therefore, it typically possible to determine the general geographic location of a user.

What is personally identifiable information?

Personally identifiable information is information that can be used to determine the identity of a particular individual.

The Children's Online Privacy Protection Act (15 U.S.C 6501) includes in its definition for "personal information"
individually identifiable information about an individual collected online, including -

(A) a first and last name;
(B) a home or other physical address including street name and name of a city or town;
(C) an e-mail address;
(D) a telephone number;
(E) a Social Security number;
(F) any other identifier that the [Federal Trade] Commission determines permits the physical or online contacting of a specific individual;
The last part of this definition (F) addresses privacy from the standpoint of the right to be left alone.

The proposed Online Privacy Protection Act of 2003 (HR 69 of the 108th Congress) seeks to add:
(G) information that is maintained with, or can be searched or retrieved by means of, data described in subparagraphs (A) through (F).
to the definitions listed above in a bill to protect those not protected by the Children's Online Privacy Protect Act. This addendum would include the use of numbers (such as IP addresses) as personally identifiable information anytime it is associated with any of the other definitions.

Information Collected

When a user visits a website not only can all of their activities including the time of access, all pages viewed be collected but also information about software installed on the user's computer. When an IP address is collected along with this information, the web surfing habits of companies and individuals can be tracked.

Simply by viewing a webpage with a web browser, the following information about the visiting user and his computer can be collected:
  1. IP Address
  2. Web pages accessed
  3. Time of access
  4. Any form information submitted including search queries or personal information
  5. Web browser software
  6. Operating system

Turning an IP address into a name

Corporate identity

The identity of the administrator of an IP address can be easily found by typing the IP address into a WHOIS query tool. The administrator of an IP address is typically an organization that assigns IP addresses to individual computers such as an internet service provider (ISP), university, or company.

Try a WHOIS lookup with your IP address.

Individual identity

ISPs

IP addresses are allocated to computers by ISPs, therefore, an ISP has the ability to track customers by associating IP addresses with login names or other personally identifying information. Doing so allows ISPs to charge users based on connectivity and enforce terms of service. It is reasonable to assume that an ISP will always know all IP addresses in use and which customer is using which IP address.

Website operators

A website operator can store and associate IP addresses with all data and pages a user views. Therefore, if a user ever submits personal information through a web form, the operator can correlate the personal information collected from an IP address with all other data associated with that IP address including pages viewed, what information the user viewed from the website, and probable geographic location of the user.

Some ISPs always assign the same IP address to the same computer. In this situation the computer has a "static IP address." Alternatively, a computer can be assigned a "dynamic IP address" meaning the computer does not receive the same IP address every time it connects to the Internet.

Users with static IP addresses can be tracked over long periods of time, so if website operators traded information with each other, correlations based on static IP address would be even more accurate than correlating by name because IP addresses used on the Internet are globally unique identifiers. No computers on the Internet can have the same IP address at the same time whereas many different people may have the same name as others.

If a user has a dynamic IP address, only information collected within a short time period (minutes to hours depending on user's and ISP's behavior) can be correlated to track a single user. If information over days or weeks is correlated using a dynamic IP address, the information may represent several users rather than one distinct user.

Email

When a user sends an email using Outlook, Eudora, or some other email client, the IP address of the user is embedded in the header of the email. So any person who receives an email from another will know the sender's IP address. Email is, in laws and many privacy policies, considered personally identifiable or identifying information because it provides a mechanism to directly contact a particular person. Since an IP address becomes associated with an email address when a user sends an email from his computer, the IP address also becomes personally identifiable or identifying information.

Public postings

Some web applications record and publicly post the IP addresses of users. A collaborative application called a Swiki used by the College of Computing at the Georgia Institute of Technology posts host names of participants who make modifications to content. These host names can be resolved to IP addresses with a DNS lookup utility. If a student posts his name on the Swiki, the IP address can be correlated with the change that resulted in adding his name. Therefore, the pages serve as something similar to a phonebook and gives anyone with access to the Internet the ability to associate a student's name with his IP address. Once the name is determined, the online directory of the Institute can provide both email and mailing addresses of the student.

Ramifications

The online history and behavior including pages visited, searches performed, and posts made to message boards can be tracked by IP address.

Recently World Market Watch, Inc. published a database of web surfing trends of 200,000 organizations including companies, universities, government and other organizations. The report contains statistics on web browsers, operating systems, and search engines used as well as activity and visiting habits of organizations. Remember, using only a WHOIS query tool, IP addresses collected could be correlated to company names to generate such a database.

Some information collected in this manner can be mildly embarrassing such as, according to the Market Watch Report, employees of Microsoft used the Google search engine three times more often than their own MSN search engine. Brower type and operating system information collected can arm malicious hackers with information that allows them to more efficiently exploit security vulnerabilities and direct attacks at a particular company.

Even if one website operator does not know the name or have personal information associated with an IP address, trading information with other website operators could yield such information and allow correlation with email address, name, phone number and/or mailing address.

ISP logs of IP addresses assigned to customers made it possible for the RIAA to sue file-swappers for copyright infringement.

Since an IP address is used to contact a computer on the Internet, the IP address alone is the only thing necessarily to launch an attack on a user's computer.

Current state

To assess some current opinions of the IP address as personally identifiable information, I explored the privacy policies of various websites that specifically commented on the IP address as personally identifiable information.

AltaVista

Description: Search engine
URL: http://www.altavista.com
Privacy Policy

The policy states that "web server logs automatically receive and record anonymous information from your web browser including your IP Address." However, AltaVista later defines "anonymous information" as "information that ordinarily cannot be traced back to a particular person."

Google

Description: Search engine
URL: http://www.google.com
Privacy Policy

Google's privacy policy states that Google "collects limited non-personally identifying information your browser makes available whenever you visit a website. This log information includes your Internet Protocol address."

The keyword in this description is "identifying." In an amicus brief concerning a case centered around the question of the nature of an IP address as personally identifiable information, the Electronic Frontier Foundation mentions:
"Identifying" is the inflected form of "identify" which means "to recognize as being or show to be the very person or thing known, described or claimed; fix the identity of." Webster's New World Dictionary 696 (2nd college ed. 1986). "Identifiable," on the other hand, means "subject to identification; capable of being identified". Webster's Third New Int'l Dict. 1123 (1986).
Therefore, "identifiable" is a lesser standard that may lead to "identifying" but may not be an identifier by itself.

Slashdot

Description: News and forum
URL: http://www.slashdot.org

According to a statement by Rob Malda, creator of Slashdot, on 26 October 2004 at the Georgia Institute of Technology, Slashdot does not store IP addresses in plain text but encrypts them to increase confidentiality of users.

The Weather Channel

Description: Weather information
URL: http://www.weather.com
Privacy Policy

The privacy policy of The Weather Channel defines IP addresses as "random numbers assigned to individual computers which The Weather Channel uses to administer our Web site and help diagnose problems with our servers."

This definition of IP address is inaccurate and is misleading about the nature of the IP address as personally identifiable information because IP addresses are not random but assigned in an ordered way that can identify a person using the address. It is also interesting to note that weather.com includes a certificate from TRUSTe, an organization that seeks to enable "individuals and organizations to establish trusting relationships based on respect for personal identity and information in the evolving networked world" (http://www.truste.com/about/mission_statement.php).

In amicus brief filled by TRUSTe in Klimas v. Comcast Corporation, TRUSTe states that "most Internet users do not even know what their IP address is" and that if IP address were considered "personally identifiable information" then "[t]his change would not be limited to IP addresses but would also include any other anonymous identifiers that were 'capable' of being traced back to the user no matter how accurate or difficult that process may be."

The premise of TRUSTe's argument for not recognizing IP addresses as personally identifiable information is based on the typical Internet user's ability to identify someone based on IP address. In the world of technology, this kind of argument can quickly disintegrate with the promulgation of information, collection of large databases, and easy to use software. And the argument brings the question at what point does information or ability to identify become ubiquitous enough and at what resolution to create an uncomfortable invasion of privacy.

Appalachian State University Office of Cultural Affairs

Description: University office
URL: http://www.oca.appstate.edu/
Privacy Policy
This privacy policy recognizes the ability for an IP address to be personally identifiable by stating:
IP addresses are automatically logged by our server software. IP addresses may or may not be considered Personally Identifiable Information, depending on the practices of the user's ISP.

Conclusion

An IP address is a unique identifier that permits anyone who knows it to contact a person's computer, identify the general geographic location of a user, and track their online habits.

The only question that remains is ease of correlating an IP address with a person's unique identity. For ISPs it is a simple task to determine the identity of a customer based on his IP address. For entities other than ISPs, it depends on the behavior of the person and his ISP. For users with static IPs, correlation can be very easy because the user can be tracked over long periods of time. Every email the user sends, every form filled out, every website visited announces and spreads an element of his identify, his IP address.

A fundamental notion of privacy is the right to be left alone. Since only an IP address is needed to contact and attack a person's computer, knowledge of a person's IP address can infringe of this right.

Many privacy policies do not recognized IP addresses as personally identifiable information. This is misleading, especially for users not familiar with the basics of Internet architecture.

Because IP addresses can be telling of physical location and identity for many Internet users, both users and website privacy policies should recognize IP addresses as personally identifiable information.





Notwithstanding any language to the contrary, nothing contained herein constitutes nor is intended to constitute an offer, inducement, promise, or contract of any kind. The data contained herein is for informational purposes only and is not represented to be error free. Any links to non-Georgia Tech information are provided as a courtesy. They are not intended to nor do they constitute an endorsement by the author.