DEPRECATED ⛔
This repository has been deprecated as of 2019-01-27. That code was written a long time ago and has been unmaintained for several years.
Thus, repository will now be archived.
If you are interested in taking over ownership, feel free to contact me., (*1)
guzzle-auto-charset-encoding-subscriber
, (*2)
Plugin for Guzzle 4/5 to automatically convert the body of a reponse according to a predefined charset., (*3)
Description
Getting charsets right is hard. In a perfect world, everybody would use unicode (UTF-8) as character encoding for textual web content but that's just not
gonna happen in the near future, so we have to deal with a lot of different encodings in the wild. Unfortunately that's another layer of complexity on top
of my application and I really just want it to "work right"., (*4)
I'm using Guzzle as an underlying library for dealing with HTTP requests and my whole application relies on content beeing encoded in UTF-8. In my locale (Germany)
ISO-8859-1 is still widely used and it really messes up the content of an HTTP response, because Guzzle won't automatically convert ISO-8859-1 to my internally used
UTF-8. So I decided to write this little plugin to convert any input encoding automatically to another output encoding. Headers and meta tags can be optionally adjusted as well., (*5)
Basic Usage
$client = new Client();
$converter = new EncodingConverter("utf-8"); // define desired output encoding
$sub = new GuzzleAutoCharsetEncodingSubscriber($converter);
$url = "http://www.myseosolution.de/scripts/encoding-test.php?enc=iso"; // request website with iso-8859-1 encoding
$req = $client->createRequest("GET", $url);
$req->getEmitter()->attach($sub);
$resp = $client->send($req);
Requirements
Installation
The recommended way to install guzzle-auto-charset-encoding-subscriber is through Composer., (*6)
curl -sS https://getcomposer.org/installer | php
Next, update your project's composer.json file to include GuzzleAutoCharsetEncodingSubscriber:, (*7)
{
"repositories": [ { "type": "composer", "url": "http://packages.myseosolution.de/"} ],
"minimum-stability": "dev",
"require": {
"paslandau/guzzle-auto-charset-encoding-subscriber": "dev-master"
}
"config": {
"secure-http": false
}
}
Caution: You need to explicitly set "secure-http": false
in order to access http://packages.myseosolution.de/ as repository.
This change is required because composer changed the default setting for secure-http
to true at the end of february 2016., (*8)
After installing, you need to require Composer's autoloader:, (*9)
require 'vendor/autoload.php';
Examples
Let's have a look a the differences between a 'normal' guzzle request and a request with a guzzle-auto-charset-encoding-subscriber at first:, (*10)
$client = new Client();
$converter = new EncodingConverter("utf-8",true,true); // define desired output encoding
$sub = new GuzzleAutoCharsetEncodingSubscriber($converter);
$url = "http://www.myseosolution.de/scripts/encoding-test.php?enc=iso";
$tests = [
"Using unmodified Guzzle request" => null,
"Using guzzle-auto-charset-encoding-subscriber" => $sub,
];
foreach($tests as $name => $subscriber) {
$req = $client->createRequest("GET", $url);
if($subscriber !== null) {
$req->getEmitter()->attach($sub);
}
$resp = $client->send($req);
echo " $name\n";
echo " Request to $url:\n";
echo " Content-Type: " . $resp->getHeader("content-type") . "\n\n";
echo $resp->getBody()."\n\n";
}
Output (assuming your editor uses UTF-8 as default), (*11)
Using unmodified Guzzle request
Request to http://www.myseosolution.de/scripts/encoding-test.php?enc=iso:
Content-Type: text/html; charset=iso-8859-1; someOtherRandom="header in here"
<!DOCTYPE html>
<html>
<head>
<meta charset="iso-8859-1">
<title>Umlauts everywhere �������</title>
</head>
<body>
<h1>�������</h1>
</body>
</html>
Using guzzle-auto-charset-encoding-subscriber
Request to http://www.myseosolution.de/scripts/encoding-test.php?enc=iso:
Content-Type: text/html; charset=utf-8; someOtherRandom="header in here"
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8' >
<title>Umlauts everywhere öäüßÖÄÜ</title>
</head>
<body>
<h1>öäüßÖÄÜ</h1>
</body>
</html>
The requested website delivers content in the ISO-8859-1 encoding. The unmodified guzzle request passes exactly what it gets from the website back to us.
If we're expecting UTF-8 encoded content, we will get the "garbage" result shown above, since the german umlauts won't be recognized. Using the guzzle-auto-charset-encoding-subscriber
will convert the result from the encoding it finds in either the content-type
header or the websites meta
tags. To minimize compatibility issues on
subsequent components, the plugin also adjusted the content-type
header and the <meta charset='..'>
tag to UTF-8., (*12)
The behaviour of the plugin can be modified as follows:, (*13)
By default, the content-type
header is adjusted when the guzzle-auto-charset-encoding-subscriber converts the body of a request into another encoding.
You can prevent this behaviour by setting the $replaceHeaders
parameter to false
:, (*14)
$client = new Client();
$replaceHeaders = false; // prevent the replacement of the content-type header
$converter = new EncodingConverter("utf-8",$replaceHeaders);
$sub = new GuzzleAutoCharsetEncodingSubscriber($converter);
$url = "http://www.myseosolution.de/scripts/encoding-test.php?enc=iso";
$req = $client->createRequest("GET", $url);
$req->getEmitter()->attach($sub);
$resp = $client->send($req);
By default, the content of a document is not modified (apart from being converted into another encoding). You can explicitly force the guzzle-auto-charset-encoding-subscriber
to adjust the meta
tags within a document to reflect the new encoding by setting the $replaceContent
parameter to true
:, (*15)
$client = new Client();
$replaceHeaders = null; // default
$replaceContent = true; // force the replacement of the meta tags within the content
$converter = new EncodingConverter("utf-8",$replaceHeaders, $replaceContent);
$sub = new GuzzleAutoCharsetEncodingSubscriber($converter);
$url = "http://www.myseosolution.de/scripts/encoding-test.php?enc=iso";
$req = $client->createRequest("GET", $url);
$req->getEmitter()->attach($sub);
$resp = $client->send($req);
Currently, 3 different cases are handled/recognized:, (*16)
- HTML 4 (uses
<meta http-equiv='content-type' content='text/html; charset=utf-8'>
)
- HTML 5 (uses
<meta charset='utf-8'>
)
- XML (uses
<?xml version='1.0' encoding='utf-8' ?>
)
Some websites use no (or wrong) values for the content-type
header or the meta
tags. In those cases, the guzzle-auto-charset-encoding-subscriber can be configured to
assume a default encoding:, (*17)
$client = new Client();
$replaceHeaders = null; // default
$replaceContent = null; // default
$fixedInputEncoding = "iso-8859-1"; // assume "iso-8859-1" as default encoding
$converter = new EncodingConverter("utf-8",$replaceHeaders, $replaceContent,$fixedInputEncoding);
$sub = new GuzzleAutoCharsetEncodingSubscriber($converter);
$url = "http://www.myseosolution.de/scripts/encoding-test.php?enc=iso&header=false&meta=false"; // hide charset from header and meta tags
$req = $client->createRequest("GET", $url);
$req->getEmitter()->attach($sub);
$resp = $client->send($req);
Frequently searched questions
- How to change the reponse charset/encoding in Guzzle?
- How to convert the charset/encoding of an response in Guzzle?
- How to force an input/output encoding/charset in Guzzle?
- Guzzle responses appear malformed due to encoding/charset - what to do?