2017 © Pedro Peláez
 

library item-similarity

Content-based, schema-less recommendation service

image

halk/item-similarity

Content-based, schema-less recommendation service

  • Saturday, August 22, 2015
  • by halk
  • Repository
  • 1 Watchers
  • 0 Stars
  • 0 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 1 Versions
  • 0 % Grown

The README.md

Build Status Code Coverage GitHub license, (*1)

Item Similarity: content-based, schema-less recommendation service

A simple recommendation service which computes the similarity of items., (*2)

Since this is part of my ongoing MSc project, README will be improved by October., (*3)

Concept

Similarity Computation

The similarity between two items is computed as follows:, (*4)

Given the following two JSON documents:, (*5)

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": ["black", "white"],
    "category": "Shoes",
    "size": 42
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": ["red", "white"],
    "category": "Sweater",
    "sleeves": "long"
}

First, any item features which are not in both documents are discared:, (*6)

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": "black,white",
    "category": "Shoes",
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": "red,white",
    "category": "Sweater",
}

Second, the documents are converted into lists with the keys as a prefix to the values:, (*7)

a = ["brand_Addi", "model_Ayak", "colors_black", "colors_white", "category_Shoes"]
b = ["brand_Addi", "model_Kazak", "colors_red", "colors_white", "category_Sweater"]

Finally, the variant of the tanimoto coefficient is calculated:, (*8)

nA = number of features in A
nB = number of features in B
nAB = number of intersecting features
score = nAB / (nA + nB - nAB)

Similarity index

The index is kept in a MongoDB collection with a document for each feature. This document also keeps track of its similarity score against other documents. Every time a new record is processed, the similarity to other documents is computed and stored. This score is then added to the other document as well. Thus when a similarity score is requested for a document, the end result is already pre-computed., (*9)

API

The index is managed by POST and DELETE requests. The score is fetched via GET., (*10)

The route prefix {index} allows maintaining more than one index within an instance., (*11)

POST /{index} Posts a document to the index and calculates the similarity score, (*12)

DELETE /{index} Deletes a document, (*13)

GET /{index}?itemIds=1,2,3 Returns similar items for the items in the GET parameter., (*14)

Installation

$ git clone https://github.com/halk/item-similarity
$ cd item-similarity
$ cp config/config.php.dist config/config.php

Please see recowise-vagrant for provisioning details., (*15)

Tests

$ cp phpunit.xml.dist phpunit.xml
$ phpunit

The Versions

22/08 2015

dev-master

9999999-dev

Content-based, schema-less recommendation service

  Sources   Download

MIT

The Requires

 

The Development Requires

recommender content-based filtering recommendation engine