2017 © Pedro Peláez
 

library php-unstructured-text-parser

A PHP Class to help extract text out of documents that are not structured in a processing friendly manner

image

aymanrb/php-unstructured-text-parser

A PHP Class to help extract text out of documents that are not structured in a processing friendly manner

  • Tuesday, April 3, 2018
  • by aymanrb
  • Repository
  • 4 Watchers
  • 19 Stars
  • 888 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 11 Forks
  • 0 Open issues
  • 4 Versions
  • 6 % Grown

The README.md

Unstructured Text Parser [PHP]

Tests Coverage Status Latest Stable Version Total Downloads License, (*1)

About Unstructured Text Parser

This is a small PHP library to help extract text out of documents that are not structured in a processing friendly format. When you want to parse text out of form generated emails for example you can create a template matching the expected incoming mail format while specifying the variable text elements and leave the rest for the class to extract your pre-formatted variables out of the incoming mails' body text., (*2)

Useful when you want to parse data out of: * Emails generated from web forms * Documents with definable templates / expressions, (*3)

Installation

PHP Unstructured Text Parser is available on Packagist (using semantic versioning), and installation via Composer is recommended. Add the following line to your composer.json file:, (*4)

"aymanrb/php-unstructured-text-parser": "~2.0"

or run, (*5)

composer require aymanrb/php-unstructured-text-parser

Usage example

<?php
include_once __DIR__ . '/../vendor/autoload.php';

$parser = new aymanrb\UnstructuredTextParser\TextParser('/path/to/templatesDirectory');

$textToParse = 'Text to be parsed fetched from a file, mail, web service, or even added directly to the a string variable like this';

//performs brute force parsing against all available templates, returns first match successful parsing
$parseResults = $parser->parseText($textToParse);
print_r($parseResults->getParsedRawData());

//slower, performs a similarity check on available templates to select the most matching template before parsing
print_r(
    $parser
        ->parseText($textToParse, true)
        ->getParsedRawData()
);

Parsing Procedure

1- Grab a single copy of the text you want to parse., (*6)

2- Replace every single varying text within it to a named variable in the form of {%VariableName%} if you want to match everything in this part of text or {%VariableName:Pattern%} if you want to match a specific set of characters or use a more precise pattern., (*7)

3- Add the templates file into the templates directory (defined in parsing code) with a txt extension fileName.txt, (*8)

4- Pass the text you wish to parse to the parse method of the class and let it do the magic for you., (*9)

Template Example

If the text documents you want to parse looks like this:, (*10)

Hello,
If you wish to parse message coming from a website that states info like:
ID & Source: 12234432 Website Form  
Name: Pet Cat
E-Mail: email@example.com
Comment: Some text goes here

Thank You,
Best Regards
Admin

Your Template file (example_template.txt) could be something like:, (*11)

Hello,
If you wish to parse message coming from a website that states info like:
ID & Source: {%id:[0-9]+%} {%source%}
Name: {%senderName%}
E-Mail: {%senderEmail%}
Comment: {%comment%}

Thank You,
Best Regards
Admin

The output of a successful parsing job would be:, (*12)

Array(
    'id' => '12234432',
    'source' => 'Website Form',
    'senderName' => 'Pet Cat',
    'senderEmail' => 'email@example.com',
    'comment' => 'Some text goes here'
)

Upgrading from v1.x to v2.x

Version 2.0 is more or less a refactored copy of version 1.x of the library and provides the exact same functionality. There is just one slight difference in the results returned. It's now a parsed data object instead of an array. To get the results as an array like it used to be in v1.x simply call "getParsedRawData()" on the returned object., (*13)

<?php
//ParseText used to return array in 1.x
$extractedArray = $parser->parseText($textToParse);

//In 2.x you need to do the following if you want an array
$extractedArray = $parser->parseText($textToParse)->getParsedRawData();

The Versions

03/04 2018

dev-master

9999999-dev https://github.com/aymanrb/php-unstructured-text-parser

A PHP Class to help extract text out of documents that are not structured in a processing friendly manner

  Sources   Download

MIT

The Requires

 

The Development Requires

text parser extract data php parser templates parsing regex parsing form parsing text parse

14/10 2017

1.2.0

1.2.0.0 https://github.com/aymanrb/php-unstructured-text-parser

A PHP Class to help extract text out of documents that are not structured in a processing friendly manner

  Sources   Download

MIT

The Requires

 

The Development Requires

text parser extract data php parser templates parsing regex parsing form parsing text parse

14/10 2017

1.1.0-beta

1.1.0.0-beta https://github.com/aymanrb/php-unstructured-text-parser

A PHP Class to help extract text out of documents that are not structured in a processing friendly manner

  Sources   Download

MIT

The Requires

 

The Development Requires

text parser extract data php parser

02/11 2014

v1.0.1-beta

1.0.1.0-beta https://github.com/aymanrb/php-unstructured-text-parser

A PHP Class to help extract text out of documents that are not structured in a processing friendly manner

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

text parser extract data php parser