2017 © Pedro Peláez
 

library diggin-http-charset

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

image

diggin/diggin-http-charset

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

  • Saturday, January 13, 2018
  • by sasezaki
  • Repository
  • 2 Watchers
  • 7 Stars
  • 2,784 Installations
  • PHP
  • 5 Dependents
  • 0 Suggesters
  • 4 Forks
  • 0 Open issues
  • 5 Versions
  • 1 % Grown

The README.md

Diggin_Http_Charset

Automatically convert to UTF-8., (*1)

Master: Build Status Coverage Status, (*2)

Detecting based on header's charset & html meta charset., (*3)

(handling several charset more carefully - SJIS-win, TIS-620 and others..), (*4)

This library aims to used in web-scraping., (*5)

Requirements

  • PHP 5.3 or over
  • mbstring and iconv

Usage

  1. wrap response object:
<?php
use Diggin\Http\Charset\WrapperFactory;
$client = new Zend\Http\Client($url);
$response = $client->send();
$response = WrapperFactory::factory($response); // then, response getBody() return with converted UTF-8.

Please see more at demos/Diggin/Http/Charset ., (*6)

Guzzle & Goutte

guzzle-plugin-AutoCharsetEncodingPlugin supports for using with Guzzle3., (*7)

Usage of with Behat by @MugeSo, (*8)

Technical Information

Diggin_Http_Charset is based on HTMLScraping., (*9)

  • http://www.rcdtokyo.com/etc/htmlscraping/

License

Diggin_Http_Charset is licensed under LGPL(GNU Lesser General Public License)., (*10)

Similar library

  • perl : HTTP::Response::Encoding
    • http://search.cpan.org/dist/HTTP-Response-Encoding/
    • http://blog.livedoor.jp/dankogai/archives/50811793.html
  • python : Universal Encoding Detector
    • http://chardet.feedparser.org/

TODOs

  • handling non text/html content types.
  • better APIs & according ZF2 coding standard.
  • struggle in more charset :-\

The Versions

13/01 2018

dev-master

9999999-dev

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

  Sources   Download

LGPL-2.0+

The Requires

  • php >=5.3.3

 

The Development Requires

scraper charset mbstring

19/07 2015

v0.8.2

0.8.2.0

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

  Sources   Download

LGPL-2.0+

The Requires

  • php >=5.3.3

 

The Development Requires

scraper charset mbstring

19/07 2015

v0.8.1

0.8.1.0

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

  Sources   Download

LGPL-2.0+

The Requires

  • php >=5.3.3

 

The Development Requires

scraper charset mbstring

09/03 2013

v0.8.0

0.8.0.0

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

  Sources   Download

LGPL-2.0+

The Requires

  • php >=5.3.3

 

The Development Requires

scraper charset mbstring

27/10 2012

v0.1.0

0.1.0.0

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

  Sources   Download

LGPL-2.0+

The Requires

  • php >=5.3.0

 

The Development Requires

scraper charset mbstring