, (*1)
Item Similarity: content-based, schema-less recommendation service
A simple recommendation service which computes the similarity of items., (*2)
Since this is part of my ongoing MSc project, README will be improved by October., (*3)
Concept
Similarity Computation
The similarity between two items is computed as follows:, (*4)
Given the following two JSON documents:, (*5)
a = {
"brand": "Addi",
"model": "Speedy",
"colors": ["black", "white"],
"category": "Shoes",
"size": 42
}
b = {
"brand": "Prima",
"model": "Kazak",
"colors": ["red", "white"],
"category": "Sweater",
"sleeves": "long"
}
First, any item features which are not in both documents are discared:, (*6)
a = {
"brand": "Addi",
"model": "Speedy",
"colors": "black,white",
"category": "Shoes",
}
b = {
"brand": "Prima",
"model": "Kazak",
"colors": "red,white",
"category": "Sweater",
}
Second, the documents are converted into lists with the keys as a prefix to the values:, (*7)
a = ["brand_Addi", "model_Ayak", "colors_black", "colors_white", "category_Shoes"]
b = ["brand_Addi", "model_Kazak", "colors_red", "colors_white", "category_Sweater"]
Finally, the variant of the tanimoto coefficient is calculated:, (*8)
nA = number of features in A
nB = number of features in B
nAB = number of intersecting features
score = nAB / (nA + nB - nAB)
Similarity index
The index is kept in a MongoDB collection with a document for each feature. This
document also keeps track of its similarity score against other documents. Every time
a new record is processed, the similarity to other documents is computed and stored.
This score is then added to the other document as well. Thus when a similarity
score is requested for a document, the end result is already pre-computed., (*9)
API
The index is managed by POST and DELETE requests. The score is fetched via GET., (*10)
The route prefix {index} allows maintaining more than one index within an instance., (*11)
POST /{index} Posts a document to the index and calculates the similarity score, (*12)
DELETE /{index} Deletes a document, (*13)
GET /{index}?itemIds=1,2,3 Returns similar items for the items in the GET parameter., (*14)
Installation
$ git clone https://github.com/halk/item-similarity
$ cd item-similarity
$ cp config/config.php.dist config/config.php
Please see recowise-vagrant for provisioning details., (*15)
Tests
$ cp phpunit.xml.dist phpunit.xml
$ phpunit