Abstract
Personalized web search (PWS) has demonstrated its effectiveness in improving the quality of various search services on the Internet. However, evidences show that users’ reluctance to disclose their private information during search has become a major barrier for the wide proliferation of PWS. We study privacy protection in PWS applications that model user preferences as hierarchical user profiles. We propose a PWS framework called UPS that can adaptively generalize profiles by queries while respecting userspecified privacy requirements. Our runtime generalization aims at striking a balance between two predictive metrics that evaluate the utility of personalization and the privacy risk of exposing the generalized profile. We present two greedy algorithms, namely GreedyDP and GreedyIL, for runtime generalization. We also provide an online prediction mechanism for deciding whether personalizing a query is beneficial. Extensive experiments demonstrate the effectiveness of our framework. The experimental results also reveal that GreedyIL significantly outperforms GreedyDP in terms of efficiency.
Introduction
THE web search engine has long become the most
important portal for ordinary people looking for useful
information on the web. However, users might experience
failure when search engines return irrelevant results that do
not meet their real intentions. Such irrelevance is largely
due to the enormous variety of users’ contexts and
backgrounds, as well as the ambiguity of texts. Personalized
web search (PWS) is a general category of search techniques
aiming at providing better search results, which are tailored
for individual user needs. As the expense, user information
has to be collected and analyzed to figure out the user
intention behind the issued query.
The solutions to PWS can generally be categorized into
two types, namely click-log-based methods and profile-based
ones. The click-log based methods are straightforward—
they simply impose bias to clicked pages in the user’s query
history. Although this strategy has been demonstrated to
perform consistently and considerably well [1], it can only
work on repeated queries from the same user, which is a
strong limitation confining its applicability. In contrast,
profile-based methods improve the search experience with
complicated user-interest models generated from user
profiling techniques. Profile-based methods can be potentially
effective for almost all sorts of queries, but are
reported to be unstable under some circumstances [1].