Jetpack_Media_Meta_Extractor::get_images_from_html PHP Method

get_images_from_html() public static method

public static get_images_from_html ( string $html, array $images_already_extracted ) : array
$html string Some markup, possibly containing image tags
$images_already_extracted array (just an array of image URLs without query strings, no special structure), used for de-duplication
return array Image URLs extracted from the HTML, stripped of query params and de-duped
    public static function get_images_from_html($html, $images_already_extracted)
    {
        $image_list = $images_already_extracted;
        $from_html = Jetpack_PostImages::from_html($html);
        if (!empty($from_html)) {
            $srcs = wp_list_pluck($from_html, 'src');
            foreach ($srcs as $image_url) {
                if (($src = parse_url($image_url)) && isset($src['scheme'], $src['host'], $src['path'])) {
                    // Rebuild the URL without the query string
                    $queryless = $src['scheme'] . '://' . $src['host'] . $src['path'];
                } elseif ($length = strpos($image_url, '?')) {
                    // If parse_url() didn't work, strip off the query string the old fashioned way
                    $queryless = substr($image_url, 0, $length);
                } else {
                    // Failing that, there was no spoon! Err ... query string!
                    $queryless = $image_url;
                }
                // Discard URLs that are longer then 4KB, these are likely data URIs or malformed HTML.
                if (4096 < strlen($queryless)) {
                    continue;
                }
                if (!in_array($queryless, $image_list)) {
                    $image_list[] = $queryless;
                }
            }
        }
        return $image_list;
    }

Usage Example

コード例 #1
0
    /**
     * @author scotchfield
     * @covers Jetpack_Media_Meta_Extractor::get_images_from_html
     * @since 3.2
     */
    function test_extract_image_from_html()
    {
        $html = <<<EOT
<p><a href="http://paulbernal.files.wordpress.com/2013/05/mr-gove-cover.jpeg"><img class="aligncenter size-full wp-image-1027" alt="Mr Gove Cover" src="http://paulbernal.files.wordpress.com/2013/05/mr-gove-cover.jpeg?w=640" /></a></p>
<p>Mr Gove was extraordinarily arrogant.</p>
<p>Painfully arrogant.</p>
<p>He believed that he knew how everything should be done. He believed that everyone else in the world was stupid and ignorant.</p>
<p>The problem was, Mr Gove himself was the one who was ignorant.</p>
<p><a href="http://paulbernal.files.wordpress.com/2013/05/mr-gove-close-up.jpeg"><img class="aligncenter size-full wp-image-1030" alt="Mr Gove Close up" src="http://paulbernal.files.wordpress.com/2013/05/mr-gove-close-up.jpeg?w=640" /></a></p>
<p>He got most of his information from his own, misty, memory.</p>
<p>He thought he remembered what it had been like when he had been at school &#8211; and assumed that everyone else&#8217;s school should be the same.</p>
<p>He remembered the good things about his own school days, and thought that everyone should have the same.</p>
<p>He remembered the bad things about his own school days, and thought that it hadn&#8217;t done him any harm &#8211; and that other children should suffer the way that he had.</p>
EOT;
        $expected = array(0 => 'http://images-r-us.com/some-image.png', 1 => 'http://paulbernal.files.wordpress.com/2013/05/mr-gove-cover.jpeg', 2 => 'http://paulbernal.files.wordpress.com/2013/05/mr-gove-close-up.jpeg');
        $already_extracted_images = array('http://images-r-us.com/some-image.png');
        $result = Jetpack_Media_Meta_Extractor::get_images_from_html($html, $already_extracted_images);
        $this->assertEquals($expected, $result);
    }
All Usage Examples Of Jetpack_Media_Meta_Extractor::get_images_from_html