2017 © Pedro Peláez
 

library shellless

A PHP package to extract readable text from HTML.

image

sukohi/shellless

A PHP package to extract readable text from HTML.

  • Monday, March 13, 2017
  • by Sukohi
  • Repository
  • 1 Watchers
  • 0 Stars
  • 17 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 3 Versions
  • 13 % Grown

The README.md

Shellless

A PHP package to extract readable text from HTML., (*1)

Installation

Execute the next command., (*2)

composer require sukohi/shellless:1.*

Usage

use Sukohi\Shellless\Shellless;

$html = file_get_contents('http://example.com/');
$shellless = new Shellless();
$result = $shellless->extract($html);

echo $result->title;        // Page title

echo $result->best_text;    // The longest text

echo $result->full_text;    // Joined text if more than 100 characters length.

print_r($result->all_texts, true);

Options

$shellless->setOptions([
    'join_step' => 5,
    'min_text_length' => 100
]);

Algorithm

  1. Join close texts if less than 5 HTML tags between them.
  2. Pick up texts if more than 100 characters length.

License

This package is licensed under the MIT License.
Copyright 2017 Sukohi Kuhoh, (*3)

The Versions

13/03 2017

1.0.x-dev

1.0.9999999.9999999-dev

A PHP package to extract readable text from HTML.

  Sources   Download

MIT

by Avatar Sukohi

13/03 2017

dev-master

9999999-dev

A PHP package to extract readable text from HTML.

  Sources   Download

MIT

by Avatar Sukohi

13/03 2017

1.0.0

1.0.0.0

A PHP package to extract readable text from HTML.

  Sources   Download

MIT

by Avatar Sukohi