Library for accessing NLP apis
, (*1)
This is a simple PHP library for performing multilingual Natural Language tasks using Web64's NLP-Server https://github.com/web64/nlpserver and other providers., (*2)
NLP tasks available through Web64's NLP Server: * Language detection * Article Extraction from HTML or URL * Entity Extraction (NER) - Multilingual * Sentiment Analysis - Multilingual * Embeddings / Neighbouring words - Multilingual * Summarization, (*3)
NLP Tasks Available through Stanford's CoreNLP Server: * Entity Extraction (NER), (*4)
NLP Tasks Available through Microsoft Labs API: * Concept Graph, (*5)
There is also a Laravel wrapper for this library available here: https://github.com/web64/laravel-nlp, (*6)
composer require web64/php-nlp-client
Most NLP features in this package requires a running instance of the NLP Server, which is a simple python flask app providing web service api access to common python NLP libraries., (*7)
Installation instrcuctions: https://github.com/web64/nlpserver, (*8)
This library provides access to three different methods for entity extraction., (*9)
Provider | Language Support | Programming Lang. | API Access |
---|---|---|---|
Polyglot | 40 languages | Python | NLP Server |
Spacy | 7 languages | Python | NLP Server |
CoreNLP | 6 languages | Java | CoreNLP Standalone server |
If you are dealing with text in English or one of the major European language you will get the best results with CoreNLP or Spacy., (*10)
The quality of extracted entities with Polyglot is not great, but for many languages it is the only available option at the moment., (*11)
Polyglot and Spacy NER is accessible thorough the NLP Server, CoreNLP requires its own standalone java server., (*12)
$nlp = new \Web64\Nlp\NlpClient('http://localhost:6400/'); $detected_lang = $nlp->language( "The quick brown fox jumps over the lazy dog" ); // 'en'
// From URL $nlp = new \Web64\Nlp\NlpClient('http://localhost:6400/'); $newspaper = $nlp->newspaper('https://github.com/web64/nlpserver'); // or from HTML $html = file_get_contents( 'https://github.com/web64/nlpserver' ); $newspaper = $nlp->newspaper_html( $html ); Array ( [article_html] => [authors] => Array() [canonical_url] => https://github.com/web64/nlpserver [meta_data] => Array() [meta_description] => GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects. [meta_lang] => en [source_url] => [text] => NLP Server. Python Flask web service for easy access to multilingual NLP tasks such as language detection, article extraction... [title] => web64/nlpserver: NLP Web Service [top_image] => https://avatars2.githubusercontent.com/u/76733?s=400&v=4 )
This uses the Polyglot multilingual NLP library to return entities and a sentiment score for given text.Ensure the models for the required languages are downloaded for Polyglot., (*13)
$polyglot = $nlp->polyglot_entities( $text, 'en' ); $polyglot->getSentiment(); // -1 $polyglot->getEntityTypes(); /* Array ( [Locations] => Array ( [0] => United Kingdom ) [Organizations] => [Persons] => Array ( [0] => Ben [1] => Sir Benjamin Hall [2] => Benjamin Caunt ) ) */ $polyglot->getLocations(); // Array of Locations $polyglot->getOrganizations(); // Array of organisations $polyglot->getPersons(); // Array of people $polyglot->getEntities(); /* Returns flat array of all entities Array ( [0] => Ben [1] => United Kingdom [2] => Sir Benjamin Hall [3] => Benjamin Caunt ) */
$text = "Harvesters is a 1905 oil painting on canvas by the Danish artist Anna Ancher, a member of the artists' community known as the Skagen Painters."; $nlp = new \Web64\Nlp\NlpClient('http://localhost:6400/'); $entities = $nlp->spacy_entities( $text ); /* Array ( [DATE] => Array ( [0] => 1905 ) [NORP] => Array ( [0] => Danish ) [ORG] => Array ( [0] => the Skagen Painters ) [PERSON] => Array ( [0] => Anna Ancher ) ) */
English is used by default. To use another language, ensure the Spacy language model is downloaded and add the language as the second parameter, (*14)
$entities = $nlp->spacy_entities( $spanish_text, 'es' );
$sentiment = $nlp->sentiment( "This is the worst product ever" ); // -1 $sentiment = $nlp->sentiment( "This is great! " ); // 1 // specify language in second parameter for non-english $sentiment = $nlp->sentiment( $french_text, 'fr' );
$nlp = new \Web64\Nlp\NlpClient('http://localhost:6400/'); $neighbours = $nlp->neighbours('obama', 'en'); /* Array ( [0] => Bush [1] => Reagan [2] => Clinton [3] => Ahmadinejad [4] => Nixon [5] => Karzai [6] => McCain [7] => Biden [8] => Huckabee [9] => Lula ) */
Extract short summary from a long text, (*15)
$summary = $nlp->summarize( $long_text );
Article Extraction using python port of Readability.js, (*16)
$nlp = new \Web64\Nlp\NlpClient( 'http://localhost:6400/' ); // From URL: $article = $nlp->readability('https://github.com/web64/nlpserver'); // From HTML: $html = file_get_contents( 'https://github.com/web64/nlpserver' ); $article = $nlp->readability_html( $html ); /* Array ( [article_html] => < div>NLP Server
< p>Python 3 Flask web service for easy access to multilingual NLP tasks ... [short_title] => web64/nlpserver: NLP Web Service [text] => NLP Server Python 3 Flask web service for easy access to multilingual NLP tasks such as language detection ... [title] => GitHub - web64/nlpserver: NLP Web Service ) */
CoreNLP has much better quality for NER that Polyglot, but only supports a few languages including English, French, German and Spanish., (*17)
Download CoreNLP server (Java) here: https://stanfordnlp.github.io/CoreNLP/index.html#download, (*18)
# Update download links with latest versions from the download page wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip unzip stanford-corenlp-full-2018-10-05.zip cd stanford-corenlp-full-2018-02-27 # Download English language model: wget http://nlp.stanford.edu/software/stanford-english-kbp-corenlp-2018-10-05-models.jar
# Run the server using all jars in the current directory (e.g., the CoreNLP home directory) java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 # To run server in as a background process nohup java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 &
When the CoreNLP server is running you can access it on port 9000: http://localhost:9000/, (*19)
More info about running the CoreNLP Server: https://stanfordnlp.github.io/CoreNLP/corenlp-server.html, (*20)
$corenlp = new \Web64\Nlp\CoreNlp('http://localhost:9000/'); $entities = $corenlp->entities( $text ); /* Array ( [NATIONALITY] => Array ( [0] => German [1] => Turkish ) [ORGANIZATION] => Array ( [0] => Foreign Ministry ) [TITLE] => Array ( [0] => reporter [1] => journalist [2] => correspondent ) [COUNTRY] => Array ( [0] => Turkey [1] => Germany ) */
Microsoft Concept Graph For Short Text Understanding: https://concept.research.microsoft.com/, (*21)
Find related concepts to provided keyword, (*22)
$concept = new \Web64\Nlp\MsConceptGraph; $res = $concept->get('php'); /* Array ( [language] => 0.40301612064483 [technology] => 0.19656786271451 [programming language] => 0.14456578263131 [open source technology] => 0.057202288091524 [scripting language] => 0.049921996879875 [server side language] => 0.044201768070723 [web technology] => 0.031201248049922 [server-side language] => 0.027561102444098 [server side scripting language] => 0.023920956838274 [feature] => 0.021840873634945 ) */
These are the python libraries used by the NLP Server for the NLP and data extraction tasks., (*23)
Library | URL | NLP Task used |
---|---|---|
langid.py | https://github.com/saffsd/langid.py | Language detection |
Newspaper | https://github.com/codelucas/newspaper | Article & metadata extraction |
Spacy | https://spacy.io/ | Entity extraction |
Polyglot | https://github.com/aboSamoor/polyglot | Multilingual NLPprocessing toolkit |
Gensim | https://radimrehurek.com/gensim/ | Summarization |
Readability | https://github.com/buriy/python-readability | Article extraction |
Get in touch if you have any feedback or ideas on how to improve this package or the documentation., (*24)