Outsourced Similarity Search on Metric Data Assets

This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential.

Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

Existing System:

In the literature, a number of concepts for securing databases have been studied. Private information retrieval techniques hide the user’s query, e.g., the data item searched for, but not the data being queried. To outsource valuable data to an insecure server, such techniques are clearly not appropriate. Digital watermarking establishes the data owner’s identity on the data. Additional information stored in the data helps prove ownership, but it cannot prevent an attacker from illegally copying the dataset.

Proposed System:

We introduce approaches that shift search functionality to the server. The proposed Metric Preserving Transformation (MPT) stores relative distance information at the server with respect to a private set of anchor objects. This method guarantees correctness of the final search result, but at the cost of two rounds of communication. The proposed Flexible Distance-based Hashing (FDH) methods finishes in just a single round of communication, but does not guarantee retrieval of the exact result.

Modules:

  • Outsourcing Data
  • Nearest Neighbor Query
  • Brute-force Secure Solution (BRUTE)
  • Anonymization - based Solution (ANONY)

Tools Used:

Front End : HTML, Java, JSP
Back End : MySQL