Pages

Remove Microsoft Word HTML tags

The following function takes some nightmarish Word HTML and return a clean HTML output that you can use safely on the web.
function cleanHTML($html) {
 $html = ereg_replace("<(/)?(font|span|del|ins)[^>]*>","",$html);
 $html = ereg_replace("<([^>]*)(class|lang|style|size|face)=("[^"]*"|'[^']*'|[^>]+)([^>]*)>","<\1>",$html);
 $html = ereg_replace("<([^>]*)(class|lang|style|size|face)=("[^"]*"|'[^']*'|[^>]+)([^>]*)>","<\1>",$html);
 return $html
}
Source : http://tim.mackey.ie/CommentView,guid,2ece42de-a334-4fd0-8f94-53c6602d5718.aspx