Html2Text\Html2Text::fixMSEncoding PHP Method

fixMSEncoding() static public method

To fix this any element with a className of msoNormal (the standard classname in any Microsoft export or outlook for a paragraph that behaves like a line return) is changed to a line with a break
afterwards. This cleaned up document can then be processed as normal through Html2Text.
static public fixMSEncoding ( DOMDocument $doc ) : DOMDocument
$doc DOMDocument the document to clean up
return DOMDocument the modified document with less unnecessary paragraphs
    static function fixMSEncoding($doc)
    {
        $paras = $doc->getElementsByTagName('p');
        for ($i = $paras->length - 1; $i >= 0; $i--) {
            $para = $paras->item($i);
            if ($para->getAttribute('class') == 'MsoNormal') {
                $fragment = $doc->createDocumentFragment();
                $fragment->appendChild($doc->createTextNode($para->nodeValue));
                $fragment->appendChild($doc->createElement('br'));
                $new_node = $para->parentNode->replaceChild($fragment, $para);
            }
        }
        $doc->loadHTML($doc->saveHTML());
        return $doc;
    }