Class Utils

java.lang.Object
org.htmlcleaner.Utils

public class Utils extends Object

Common utilities.

Created by: Vladimir Nikic
Date: November, 2006.
  • Field Details

    • HEX_STRICT

      public static Pattern HEX_STRICT
    • HEX_RELAXED

      public static Pattern HEX_RELAXED
    • DECIMAL

      public static Pattern DECIMAL
  • Constructor Details

    • Utils

      public Utils()
  • Method Details

    • isFullUrl

      public static boolean isFullUrl(String link)
      Checks if specified link is full URL.
      Parameters:
      link -
      Returns:
      True, if full URl, false otherwise.
    • fullUrl

      public static String fullUrl(String pageUrl, String link)
      Calculates full URL for specified page URL and link which could be full, absolute or relative like there can be found in A or IMG tags. (Reinstated as per user request in bug 159)
    • escapeHtml

      public static String escapeHtml(String s, CleanerProperties props)
      Escapes HTML string
      Parameters:
      s - String to be escaped
      props - Cleaner properties affects escaping behaviour
      Returns:
    • escapeXml

      public static String escapeXml(String s, CleanerProperties props, boolean isDomCreation)
      Escapes XML string.
      Parameters:
      s - String to be escaped
      props - Cleaner properties affects escaping behaviour
      isDomCreation - Tells if escaped content will be part of the DOM
    • escapeXml

      public static String escapeXml(String s, boolean advanced, boolean recognizeUnicodeChars, boolean translateSpecialEntities, boolean isDomCreation, boolean transResCharsToNCR, boolean translateSpecialEntitiesToNCR)
      change notes: 1) convert ascii characters encoded using &#xx; format to the ascii characters -- may be an attempt to slip in malicious html 2) convert &#xxx; format characters to " style representation if available for the character. 3) convert html special entities to xml &#xxx; when outputing in xml
      Parameters:
      s -
      advanced -
      recognizeUnicodeChars -
      translateSpecialEntities -
      isDomCreation -
      Returns:
      TODO Consider moving to CleanerProperties since a long list of params is misleading.
    • escapeXml

      public static String escapeXml(String s, boolean advanced, boolean recognizeUnicodeChars, boolean translateSpecialEntities, boolean isDomCreation, boolean transResCharsToNCR, boolean translateSpecialEntitiesToNCR, boolean isHtmlOutput)
      change notes: 1) convert ascii characters encoded using &#xx; format to the ascii characters -- may be an attempt to slip in malicious html 2) convert &#xxx; format characters to " style representation if available for the character. 3) convert html special entities to xml &#xxx; when outputing in xml
      Parameters:
      s -
      advanced -
      recognizeUnicodeChars -
      translateSpecialEntities -
      isDomCreation -
      isHtmlOutput -
      Returns:
      TODO Consider moving to CleanerProperties since a long list of params is misleading.
    • sanitizeXmlIdentifier

      public static String sanitizeXmlIdentifier(String attName)
    • sanitizeXmlIdentifier

      public static String sanitizeXmlIdentifier(String attName, String prefix)
    • sanitizeHtmlAttributeName

      public static String sanitizeHtmlAttributeName(String name)
    • isValidHtmlAttributeName

      public static boolean isValidHtmlAttributeName(String name)
    • sanitizeXmlIdentifier

      public static String sanitizeXmlIdentifier(String attName, String prefix, String replacementCharacter)
      Attempts to replace invalid attribute names with valid ones.
      Parameters:
      attName - the attribute name to fix
      prefix - the prefix to use to indicate an attribute name has been altered
      Returns:
    • isValidXmlIdentifier

      public static boolean isValidXmlIdentifier(String s)
      Checks whether specified string can be valid tag name or attribute name in xml.
      Parameters:
      s - String to be checked
      Returns:
      True if string is valid xml identifier, false otherwise
    • isEmptyString

      public static boolean isEmptyString(Object o)
      Parameters:
      o -
      Returns:
      True if specified string is null of contains only whitespace characters
    • tokenize

      public static String[] tokenize(String s, String delimiters)
    • isXmlReservedCharacter

      public static boolean isXmlReservedCharacter(String c)
    • getXmlNSPrefix

      public static String getXmlNSPrefix(String name)
      Parameters:
      name -
      Returns:
      For xml element name or attribute name returns prefix (part before :) or null if there is no prefix
    • getXmlName

      public static String getXmlName(String name)
      Parameters:
      name -
      Returns:
      For xml element name or attribute name returns name after prefix (part after :)
    • ltrim

      public static String ltrim(String s)
      Trims specified string from left.
      Parameters:
      s -
    • rtrim

      public static String rtrim(String s)
      Trims specified string from right.
      Parameters:
      s -
    • isWhitespaceString

      public static boolean isWhitespaceString(Object object)
      Checks whether specified object's string representation is empty string (containing of only whitespaces).
      Parameters:
      object - Object whose string representation is checked
      Returns:
      true, if empty string, false otherwise
    • deserializeEntities

      public static String deserializeEntities(String str, boolean recognizeUnicodeChars)
    • isValidXmlIdentifierStartChar

      public static boolean isValidXmlIdentifierStartChar(String identifier)
      Determines whether the initial character of an identifier is valid for XML
      Parameters:
      identifier -
      Returns:
    • replaceInvalidXmlIdentifierCharacters

      public static String replaceInvalidXmlIdentifierCharacters(String name, String replacement)
      Strips out invalid characters from names used for XML Elements and replaces them with the specified character. For example, "<p%>" becomes ""
      Parameters:
      name -
      Returns:
      valid XML name