halk/item-similarity

Content-based, schema-less recommendation service

Saturday, August 22, 2015
by halk
Repository
1 Watchers
0 Stars
0 Installations

PHP
0 Dependents
0 Suggesters
0 Forks
0 Open issues
1 Versions
0 % Grown

, _(*1)

Item Similarity: content-based, schema-less recommendation service

A simple recommendation service which computes the similarity of items., _(*2)

Since this is part of my ongoing MSc project, README will be improved by October., _(*3)

Concept

Similarity Computation

The similarity between two items is computed as follows:, _(*4)

Given the following two JSON documents:, _(*5)

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": ["black", "white"],
    "category": "Shoes",
    "size": 42
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": ["red", "white"],
    "category": "Sweater",
    "sleeves": "long"
}

First, any item features which are not in both documents are discared:, _(*6)

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": "black,white",
    "category": "Shoes",
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": "red,white",
    "category": "Sweater",
}

Second, the documents are converted into lists with the keys as a prefix to the values:, _(*7)

a = ["brand_Addi", "model_Ayak", "colors_black", "colors_white", "category_Shoes"]
b = ["brand_Addi", "model_Kazak", "colors_red", "colors_white", "category_Sweater"]

Finally, the variant of the tanimoto coefficient is calculated:, _(*8)

nA = number of features in A
nB = number of features in B
nAB = number of intersecting features
score = nAB / (nA + nB - nAB)

Similarity index

The index is kept in a MongoDB collection with a document for each feature. This document also keeps track of its similarity score against other documents. Every time a new record is processed, the similarity to other documents is computed and stored. This score is then added to the other document as well. Thus when a similarity score is requested for a document, the end result is already pre-computed., _(*9)

API

The index is managed by POST and DELETE requests. The score is fetched via GET., _(*10)

The route prefix {index} allows maintaining more than one index within an instance., _(*11)

POST /{index} Posts a document to the index and calculates the similarity score, _(*12)

DELETE /{index} Deletes a document, _(*13)

GET /{index}?itemIds=1,2,3 Returns similar items for the items in the GET parameter., _(*14)

Installation

$ git clone https://github.com/halk/item-similarity
$ cd item-similarity
$ cp config/config.php.dist config/config.php

Please see recowise-vagrant for provisioning details., _(*15)

Tests

$ cp phpunit.xml.dist phpunit.xml
$ phpunit

22/08 2015

dev-master

9999999-dev

Content-based, schema-less recommendation service

Sources Download

MIT

The Requires

The Development Requires

by Halil Köklü

recommender content-based filtering recommendation engine

library item-similarity

Content-based, schema-less recommendation service

halk/item-similarity

The README.md

Item Similarity: content-based, schema-less recommendation service

Concept

Similarity Computation

Similarity index

API

Installation

Tests

The Versions

dev-master

The Requires

The Development Requires

by Halil Köklü