2017 © Pedro Peláez
 

library php-parsing-tool

library for parsing

image

farafiri/php-parsing-tool

library for parsing

  • Monday, January 1, 2018
  • by farafiri
  • Repository
  • 5 Watchers
  • 9 Stars
  • 116 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 5 Forks
  • 0 Open issues
  • 8 Versions
  • 33 % Grown

The README.md

This library:

  • tries to be easy to use like regular expression (no compiling required, no extra file keeping grammar required)
  • is powerful like others grammar compilers
  • returns easy to manipulate syntax tree objects
  • uses convenient syntax which combines best things from BNF notation (idea) and regular expression (* and + for repetitions)
  • allows you to deal with any context free grammar

Example

Let say you have string with dates in format d.m.y or y-m-d separated by comma., (*1)

  $dates = '2012-03-04,2013-02-08,23.06.2012';

  $parser = new \ParserGenerator\Parser('start     :=> datesList.
                                         datesList :=> date "," datesList
                                                   :=> date.
                                         date      :=> year "-" month "-" day
                                                   :=> day "." month "." year.
                                         year      :=> /\d{4}/.
                                         month     :=> /\d{2}/.
                                         day       :=> /\d{2}/.');

  $parsed = $parser->parse($dates);

  //and now you want to get list of years
  foreach($parsed->findAll('year') as $year) {
    echo $year;
  }

  //this time you want to print all months form year 2012
  foreach($parsed->findAll('date') as $date) {
    if ((string) $date->findFirst('year') === '2012') {
      echo $date->findFirst('month');
    }
  }

Even more examples

There are a few different example parsers implemented in the ParserGenerator\Example namespace, and you'll also find tests for them, e.g.: - \ParserGenerator\Examples\CSVParser (and \ParserGenerator\Tests\Examples\CSVParserTest) - \ParserGenerator\Examples\JSONParser (and \ParserGenerator\Tests\Examples\JSONParserTest), (*2)

Branch types

You could declare the previous grammar as PEG and it would improve speed x10. To do so, add the option ['defaultBranchType' => 'PEG'] as a second argument to Parser. Note that not every grammar can be parsed with PEG packrat algorithm., (*3)

  // by adding 'defaultBranchType' with 'PEG' value into options we declare grammar as PEG
  $parser = new \ParserGenerator\Parser('start :=> start "x"
                                               :=> "x".', array('defaultBranchType' => 'PEG'));
  // but PEG grammar cannot be left recursive, call parse will run infinite loop in this case
  //You have 2 solutions now
  //1-st: you can change grammar a bit:
  $parser = new \ParserGenerator\Parser('start :=> "x" start
                                               :=> "x".', array('defaultBranchType' => 'PEG'));

  //2-nd: use default branch type
  $parser = new \ParserGenerator\Parser('start :=> start "x"
                                               :=> "x".');

Error handling

If your input cannot be parsed, \ParserGenerator\Parser::parse will return false., (*4)

Use \ParserGenerator\Parser::getErrorString() and provide your input data to get a human-readable error description., (*5)

Alternatively you can use \ParserGenerator\Parser::getError() and directly work with error nodes., (*6)

Symbols

"text"

Matches text. You can also use single quotes. You can use escape sequences so "\n" will match new line., (*7)

/regular|expression/

Matches given regular expression. You can use pattern modifiers. Grammar like "start :=> /[a-z]/i." will also match upper case letters. Regular expression cannot be backtracked. They work like the first match is the only match. For example: "start :=> /a+/ 'a'.", when we try to parse string "aa" regular expression will capture both characters and the string will be not matched., (*8)

symbolName

Will match the defined symbol. The following example will match any pair of letters, followed by digits., (*9)

start  :=> letter digit.
letter :=> /\w/.
digit  :=> /\d/.
whiteSpace, space, newLine, tab

whiteSpace matches space, tabulator or new line character If ignoreWhitespaces mode is off these symbols work same as /\s/, " ", /\t/, /\n/. When ignoreWhitespaces mode is on then /\s/, " ", "\t", "\n" won't work and you must use whiteSpace, space, etc symbols. In ignoreWhitespaces mode these symbols check context and not consuming characters from input. For example sequence: 'a' newLine space space 'b' will match characters 'a' and 'b' separated by at least one space and at least one new line symbol, (*10)

text

match any text, (*11)

symbol+

Will try to match symbol several times (at least once). For example start :=> "a"+. will match "a" "aa" "aaa" but not "", (*12)

symbol?

Symbol is optional. For example start :=> "a"?. wil match "a" and "" but not "aa", (*13)

symbol*

Will try to match symbol several times (symbol is optional) For example start :=> "a"*. will match "a" "aa" "aaa" and "", (*14)

symbol++, symbol**, symbol??

Same as adequate symbol+, symbol* and symbol* but consumes it in a greedy way. Example:, (*15)

$nonGreedy = new \ParserGenerator\Parser('start :=> "a"* "a"*.');
$nonGreedy->parse("aaa")->getSubnode(0)->toString(); // "" first "a"* takes nothing
$nonGreedy->parse("aaa")->getSubnode(1)->toString(); // "aaa" so second must consume all left

$greedy = new \ParserGenerator\Parser('start :=> "a"** "a"**.');
$greedy->parse("aaa")->getSubnode(0)->toString(); // "aaa" first "a"** takes all
$greedy->parse("aaa")->getSubnode(1)->toString(); // "" so nothing left for second

$greedy = new \ParserGenerator\Parser('start :=> "a"** "a"+.');
// "aa" "a"** tries to take all but then parsing would fail and he must leave last char for "a"+
$greedy->parse("aaa")->getSubnode(0)->toString();
$greedy->parse("aaa")->getSubnode(1)->toString(); // "a"

If 'defaultBranchType' is set to 'PEG' then symbol* is equal to symbol** (always greedy). Same with "+" and "?". In this mode, the last case will fail (PEG cannot parse it), (*16)

?symbol

Lookahead. Check if symbol can be parsed but do not capture it. For example "start :=> 'a' ?/.{3}/ integer. integer :=> /\d+/." will match "a" followed by at least 3 digit number., (*17)

!symbol

Negative lookahead. Similar to ?symbol but continue parsing only if cannot match symbol, (*18)

symbol1+symbol2

Several symbol1 occurrences separated by symbol2 (similar for *, ++, **), (*19)

$parser = new \ParserGenerator\Parser('start :=> word+",".
                                       word  :=> /\w+/.');
foreach($parser->parse("a,bc,d")->getSubnode(0)->getSubnodes() as $subnode) {
  echo $subnode . ' ';
} //prints "a , bc , d "

foreach($parser->parse("a,bc,d")->getSubnode(0)->getMainNodes() as $subnode) {
  echo $subnode . ' ';
} //prints "a bc d "

Note that symbol1+ symbol2 is something different than symbol1+symbol2. This space between + and symbol2 is crucial "a"+ "b" matches: "aaaab" but not "ababa" "a"+"b" matches: "ababa" but not :aaaab", (*20)

(symbol1 | symbol2)

Choice, match symbol1 or symbol2. For example "start :=> ('a' | 'b') 'c'." will parse strings "ac" and "bc", (*21)

string

Syntax sugar for regex like: /"([^\]|\.)*"/ Matches quoted strings Example:, (*22)

$parser = new \ParserGenerator\Parser('start :=> string.');
$stringNode = $parser->parse('"a\tb\"c"')->getSubnode(0);
echo (string) $stringNode; //prints:"a\tb\"c"
echo $stringNode->getValue(); //prints:a    b"c

By default string may be quoted by quotation or apostrophe. string/apostrophe : can be quoted only by apostrophe string/quotation : can be quoted only by quotation string/simple : can be quoted only by quotation, no characters escaping by \, quotation character by repetition (style used in Pascal or CSV), (*23)

numbers

Of course you you can use /\d+/ but using build-in toolkit for numbers is much easier and readable, (*24)

//parser matching only integers from 3 to 17 (inclusive)
$parser = new \ParserGenerator\Parser('start :=> 3..17 .');
$parser->parse('2'); //false
$parser->parse('18'); //false
$parser->parse('12'); //syntax tree object

//parser matching only integers > 0
$parser = new \ParserGenerator\Parser('start :=> 1..infinity .');

//parser matching integers in hex decimal and oct
$parser = new \ParserGenerator\Parser('start :=> -inf..inf/hdo .');
$parser->parse('0x21')->getSubnode(0)->getValue(); // 33
$parser->parse('21')->getSubnode(0)->getValue(); //21
$parser->parse('021')->getSubnode(0)->getValue(); //17

//matching month number with leading 0 for < 10
$parser = new \ParserGenerator\Parser('start :=> 01..12 .');
$parser->parse('4'); //false
$parser->parse('04'); //syntax tree object
time()

Matching time in the given format:, (*25)

$parser = new \ParserGenerator\Parser('start :=> (time(Y-m-d) | time(d.m.Y)) .');
$parser->parse('2017-01-02')->getSubnode(0)->getValue(); // equal to new \DateTime('2017-01-02')
$parser->parse('03.05.2014')->getSubnode(0)->getValue(); // equal to new \DateTime('2014-05-03')
contain, is

Sometimes you may want to do extra checks on the parsed node. Thanks to these constructs, you can check if node contain some text or if matches a pattern:, (*26)

$parser = new \ParserGenerator\Parser('start   :=> word not is keyword.
                                       word    :=> /\w+/.
                                       keyword :=> ("do" | "while" | "if").');
$parser->parse('do'); //false
$parser->parse('doSomething'); // syntax tree object

It is possible to make some basic logic operations on the check and put them into braces, (*27)

$parser = new \ParserGenerator\Parser('start   :=> word not(is keyword or
                                                            is ("p" text /* we don`t want words starting with "p" */) or
                                                            is /./ /* we don`t want one letter words */ ).
                                       word    :=> /\w+/.
                                       keyword :=> ("do" | "while" | "if").');
$parser->parse('do'); //false
$parser->parse('d'); //false
$parser->parse('post'); //false
$parser->parse('doSomething'); // syntax tree object
unorder(separator, choice1, choice2...)

unorder should be used when you expect several elements in any order, (*28)

x :=> unorder(s, A, B, C).
/* is equivalent to */
x :=> A s B s C
  :=> A s C s B
  :=> B s A s C
  :=> B s C s A
  :=> C s A s B
  :=> C s B s A.

By default each element is expected exactly once but you can change it:, (*29)

x :=> unorder(s, ?A, *B, +C).
A is optional
B may be used multiple (or zero) times
C may be used multiple (at least once) times

At least one element is required:, (*30)

$parser = new \ParserGenerator\Parser('start   :=> unorder("", ?"a", ?"b").');

$parser->parse('a'); //syntax tree object
$parser->parse('b'); //syntax tree object
$parser->parse(''); //false

The Versions

01/01 2018

dev-master

9999999-dev

library for parsing

  Sources   Download

MIT

The Requires

  • php >=7.0

 

The Development Requires

by Avatar farafiri

30/12 2017

dev-testBranch

dev-testBranch

library for parsing

  Sources   Download

MIT

The Requires

  • php >=7.0

 

The Development Requires

by Avatar farafiri

28/12 2017

v2.0.0

2.0.0.0

library for parsing

  Sources   Download

MIT

The Requires

  • php >=7.0

 

The Development Requires

by Avatar farafiri

27/12 2017

dev-master_php5

dev-master_php5

library for parsing

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

by Avatar farafiri

27/12 2017

v1.0.2

1.0.2.0

library for parsing

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

by Avatar farafiri

15/08 2017

dev-failed-test-fix

dev-failed-test-fix

library for parsing

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

by Avatar farafiri

11/02 2017

v1.0.1

1.0.1.0

library for parsing

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

by Avatar farafiri

10/08 2015

v1.0.0

1.0.0.0

library for parsing

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

by Avatar farafiri