HTMLPurifier_Encoder::testIconvTruncateBug PHP Method

testIconvTruncateBug() public static method

glibc iconv has a known bug where it doesn't handle the magic IGNORE stanza correctly. In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable.
public static testIconvTruncateBug ( ) : integer
return integer Error code indicating severity of bug.
    public static function testIconvTruncateBug()
    {
        static $code = null;
        if ($code === null) {
            // better not use iconv, otherwise infinite loop!
            $r = self::unsafeIconv('utf-8', 'ascii//IGNORE', "α" . str_repeat('a', 9000));
            if ($r === false) {
                $code = self::ICONV_UNUSABLE;
            } elseif (($c = strlen($r)) < 9000) {
                $code = self::ICONV_TRUNCATES;
            } elseif ($c > 9000) {
                trigger_error('Your copy of iconv is extremely buggy. Please notify HTML Purifier maintainers: ' . 'include your iconv version as per phpversion()', E_USER_ERROR);
            } else {
                $code = self::ICONV_OK;
            }
        }
        return $code;
    }

Usage Example

 /**
  * Convert a string to UTF-8 based on configuration.
  * @param string $str The string to convert
  * @param HTMLPurifier_Config $config
  * @param HTMLPurifier_Context $context
  * @return string
  */
 public static function convertToUTF8($str, $config, $context)
 {
     $encoding = $config->get('Core.Encoding');
     if ($encoding === 'utf-8') {
         return $str;
     }
     static $iconv = null;
     if ($iconv === null) {
         $iconv = self::iconvAvailable();
     }
     if ($iconv && !$config->get('Test.ForceNoIconv')) {
         // unaffected by bugs, since UTF-8 support all characters
         $str = self::unsafeIconv($encoding, 'utf-8//IGNORE', $str);
         if ($str === false) {
             // $encoding is not a valid encoding
             trigger_error('Invalid encoding ' . $encoding, E_USER_ERROR);
             return '';
         }
         // If the string is bjorked by Shift_JIS or a similar encoding
         // that doesn't support all of ASCII, convert the naughty
         // characters to their true byte-wise ASCII/UTF-8 equivalents.
         $str = strtr($str, self::testEncodingSupportsASCII($encoding));
         return $str;
     } elseif ($encoding === 'iso-8859-1') {
         $str = utf8_encode($str);
         return $str;
     }
     $bug = HTMLPurifier_Encoder::testIconvTruncateBug();
     if ($bug == self::ICONV_OK) {
         trigger_error('Encoding not supported, please install iconv', E_USER_ERROR);
     } else {
         trigger_error('You have a buggy version of iconv, see https://bugs.php.net/bug.php?id=48147 ' . 'and http://sourceware.org/bugzilla/show_bug.cgi?id=13541', E_USER_ERROR);
     }
 }
All Usage Examples Of HTMLPurifier_Encoder::testIconvTruncateBug