Removing HTML from a Java String?

Is there a good way to remove HTML from a Java string?

This is actually very simple with Jsoup.

public static String html2text(String html) {

    return Jsoup.parse(html).text();


I think that the simpliest way to filter the html tags is:

private static final Pattern REMOVE_TAGS = Pattern.compile("");


public static String removeTags(String string) {

    if (string == null || string.length() == 0) {

        return string;



    Matcher m = REMOVE_TAGS.matcher(string);

    return m.replaceAll("");