Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save aspose-com-gists/41252ec566e9593d7df9c18fb89bc58f to your computer and use it in GitHub Desktop.
Save aspose-com-gists/41252ec566e9593d7df9c18fb89bc58f to your computer and use it in GitHub Desktop.
This Gist contains examples Aspose.HTML Working with HTML Documents

Aspose.HTML for Java – Working with HTML Documents

This GitHub gist repository features Java code examples that are referenced in the Working with Documents chapter of the Aspose.HTML for Java documentation. It is intended for Java developers looking to handle and manage HTML documents programmatically. The examples showcase the core functionalities of the Aspose.HTML for Java library, allowing you to load, manipulate, and save web content with precision and ease.

What's Inside

The gists in this collection cover the following key topics for working with HTML documents in Java:

  • Loading HTML, EPUB, and MHTML documents from various sources.
  • Creating, modifying, and deleting HTML elements dynamically.
  • Navigating the Document Object Model (DOM) tree using W3C-compliant traversal interfaces to inspect and retrieve content from HTML documents.
  • Customizing the rendering environment by setting user-defined style sheets, font folders, message handlers, sandboxing options, and other parameters.
  • Saving HTML documents in multiple formats, including HTML, MHTML, and Markdown.
  • Saving HTML documents along with all linked resources, including CSS, fonts, and images, using customizable saving options.

How to Get Started

Each gist is a self-contained executable example. To use them:

  • Make sure your Java project includes the Aspose.HTML for Java library.
  • You can easily add it to your project using a dependency management tool, such as Maven or Gradle. For more detailed tutorials, see the installation guide.
  • Select the code snippet that corresponds to the task you want to perform.
  • Copy the example into your project and run it to see the functionality in action.

You can download a free trial of Aspose.HTML for Java and use a temporary license for unlimited access.

About Aspose.HTML for Java

Aspose.HTML for Java is a robust, on-premise Java library designed for parsing, navigating, processing, and converting HTML and related formats. It offers comprehensive APIs for the Document Object Model (DOM), CSS selector queries, and XPath-style traversal helpers, which simplify the inspection, retrieval, and manipulation of content from web pages and HTML documents.

Other Related Resources

Prerequisites

To run the examples, you need:

  • Java Runtime Environment: J2SE 1.8 or higher.
  • Supported OS: Windows, macOS, Linux.
  • Development Environment: Any Java IDE (e.g., IntelliJ IDEA, Eclipse, NetBeans).
Aspose.HTML for Java - Working with HTML Documents
// Apply a custom user stylesheet to HTML content and convert it to PDF using Java
// Learn more: https://docs.aspose.com/html/java/environment-configuration/
// Prepare HTML code and save it to a file
String code = "<span>Hello, World!!!</span>";
try (java.io.FileWriter fileWriter = new java.io.FileWriter("user-agent-stylesheet.html")) {
fileWriter.write(code);
}
// Create an instance of the Configuration class
Configuration configuration = new Configuration();
// Get the IUserAgentService
IUserAgentService userAgent = configuration.getService(IUserAgentService.class);
// Set a custom color to the <span> element
userAgent.setUserStyleSheet("span { color: green; }");
// Initialize an HTML document with specified configuration
HTMLDocument document = new HTMLDocument("user-agent-stylesheet.html", configuration);
// Convert HTML to PDF
Converter.convertHTML(document, new PdfSaveOptions(), "user-agent-stylesheet_out.pdf");
// How to disable scripts for HTML to PDF conversion using Java
// Learn more: https://docs.aspose.com/html/java/environment-configuration/
// Prepare HTML code and save it to a file
String code = "<span>Hello, World!!</span>\n" +
"<script>document.write('Have a nice day!');</script>\n";
try (java.io.FileWriter fileWriter = new java.io.FileWriter("sandboxing.html")) {
fileWriter.write(code);
}
// Create an instance of the Configuration class
Configuration configuration = new Configuration();
// Mark 'scripts' as an untrusted resource
configuration.setSecurity(com.aspose.html.Sandbox.Scripts);
// Initialize an HTML document with specified configuration
HTMLDocument document = new HTMLDocument("sandboxing.html", configuration);
// Convert HTML to PDF
Converter.convertHTML(document, new PdfSaveOptions(), "sandboxing_out.pdf");
// Create an empty HTML document using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Initialize an empty HTML Document
HTMLDocument document = new HTMLDocument();
// Save the document to disk
document.save("create-empty-document.html");
// Create HTML from a string using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Prepare HTML code
String html_code = "<p>Hello, World!</p>";
// Initialize a document from a string variable
HTMLDocument document = new HTMLDocument(html_code, ".");
// Save the document to disk
document.save("create-from-string.html");
// Create HTML document using DOM methods and convert to PDF in Aspose.HTML for Java
// Learn more: https://docs.aspose.com/html/java/edit-a-document/
// Create an instance of the HTMLDocument class
HTMLDocument document = new HTMLDocument();
// Create a style element and assign the green color for all elements with class-name equals "gr"
Element style = document.createElement("style");
style.setTextContent(".gr { color: green }");
// Find the document header element and append the style element to the header
Element head = document.getElementsByTagName("head").get_Item(0);
head.appendChild(style);
// Create a paragraph element with class-name "gr"
HTMLParagraphElement p = (HTMLParagraphElement) document.createElement("p");
p.setClassName("gr");
// Create a text node
Text text = document.createTextNode("Hello, World!!");
// Append the text node to the paragraph
p.appendChild(text);
// Append the paragraph to the document body element
document.getBody().appendChild(p);
// Save the HTML document to a file
document.save("using-dom.html");
// Create an instance of the PDF output device and render the document into this device
PdfDevice device = new PdfDevice("using-dom.html");
// Render HTML to PDF
document.renderTo(device);
// Create an HTML document using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Initialize an empty HTML document
HTMLDocument document = new HTMLDocument();
// Create a text node and add it to the document
Text text = document.createTextNode("Hello, World!");
document.getBody().appendChild(text);
// Save the document to disk
document.save("create-new-document.html");
// Create an instance of the HTMLDocument class
// Learn more: https://docs.aspose.com/html/java/create-a-document/
HTMLDocument document = new HTMLDocument();
// Subscribe to the 'ReadyStateChange' event. This event will be fired during the document loading process
document.OnReadyStateChange.add(new DOMEventHandler() {
@Override
public void invoke(Object sender, Event e) {
// Check the value of 'ReadyState' property
// This property is representing the status of the document. For detail information please visit https://www.w3schools.com/jsref/prop_doc_readystate.asp
if (document.getReadyState().equals("complete")) {
System.out.println(document.getDocumentElement().getOuterHTML());
notifyAll();
}
}
});
// Navigate asynchronously at the specified Uri
document.navigate("https://html.spec.whatwg.org/multipage/introduction.html");
synchronized (this) {
wait(10000);
}
// Edit inline CSS of an HTML element and render HTML to PDF using Aspose.HTML for Java
// Learn more: https://docs.aspose.com/html/java/edit-a-document/
// Create an instance of an HTML document with specified content
String content = "<p> Inline CSS </p>";
HTMLDocument document = new HTMLDocument(content, ".");
// Find the paragraph element to set a style attribute
HTMLElement paragraph = (HTMLElement) document.getElementsByTagName("p").get_Item(0);
// Set the style attribute
paragraph.setAttribute("style", "font-size: 250%; font-family: verdana; color: #cd66aa");
// Save the HTML document to a file
document.save("edit-inline-css.html");
// Create an instance of the PDF output device and render the document into this device
PdfDevice device = new PdfDevice("edit-inline-css.html");
document.renderTo(device);
// Edit HTML with internal CSS using Java
// Learn more: https://docs.aspose.com/html/java/edit-a-document/
// Create an instance of an HTML document with specified content
String content = "<div><p>Internal CSS</p><p>An internal CSS is used to define a style for a single HTML page</p></div>";
HTMLDocument document = new HTMLDocument(content, ".");
// Create a style element with text content
Element style = document.createElement("style");
style.setTextContent(".frame1 { margin-top:50px; margin-left:50px; padding:20px; width:360px; height:90px; background-color:#a52a2a; font-family:verdana; color:#FFF5EE;} \r\n" +
".frame2 { margin-top:-90px; margin-left:160px; text-align:center; padding:20px; width:360px; height:100px; background-color:#ADD8E6;}");
// Find the document header element and append the style element to the header
Element head = document.getElementsByTagName("head").get_Item(0);
head.appendChild(style);
// Find the first paragraph element to inspect the styles
HTMLElement paragraph = (HTMLElement) document.getElementsByTagName("p").get_Item(0);
paragraph.setClassName("frame1");
// Find the last paragraph element to inspect the styles
HTMLElement lastParagraph = (HTMLElement) document.getElementsByTagName("p").get_Item(document.getElementsByTagName("p").getLength() - 1);
lastParagraph.setClassName("frame2");
// Set a font-size to the first paragraph
paragraph.getStyle().setFontSize("250%");
paragraph.getStyle().setTextAlign("center");
// Set a color and font-size to the last paragraph
lastParagraph.getStyle().setColor("#434343");
lastParagraph.getStyle().setFontSize("150%");
lastParagraph.getStyle().setFontFamily("verdana");
// Save the HTML document to a file
document.save("edit-internal-css.html");
// Create an instance of the PDF output device and render the document on that device
PdfDevice device = new PdfDevice("edit-internal-css.html");
// Render HTML to PDF
document.renderTo(device);
// Handle missing image requests with a custom MessageHandler in Aspose.HTML for Java
// Learn more: https://docs.aspose.com/html/java/environment-configuration/
// Prepare HTML code with missing image file
String code = "<img src='missing.jpg'>";
try (java.io.FileWriter fileWriter = new java.io.FileWriter("document.html")) {
fileWriter.write(code);
}
// Create an instance of the Configuration class
Configuration configuration = new Configuration();
// Add ErrorMessageHandler to the chain of existing message handlers
INetworkService network = configuration.getService(INetworkService.class);
LogMessageHandler logHandler = new LogMessageHandler();
network.getMessageHandlers().addItem(logHandler);
// Initialize an HTML document with specified configuration
// During the document loading, the application will try to load the image and we will see the result of this operation in the console
HTMLDocument document = new HTMLDocument("document.html", configuration);
// Convert HTML to PNG
Converter.convertHTML(document, new ImageSaveOptions(), "output.png");
// Create async waiter thread for HTML document loading using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
public class HTMLDocumentWaiter implements Runnable {
private final HTMLDocumentAsynchronouslyOnLoad html;
public HTMLDocumentWaiter(HTMLDocumentAsynchronouslyOnLoad html) throws Exception {
this.html = html;
this.html.execute();
}
@Override
public void run() {
System.out.println("Current Thread: " + Thread.currentThread().getName() + "; " + Thread.currentThread().getId());
while (!Thread.currentThread().isInterrupted() && html.getMsg() == null) {
try {
Thread.sleep(60000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
Thread.currentThread().interrupt();
}
}
// Limit JavaScript execution time when converting HTML to image using Java
// Learn more: https://docs.aspose.com/html/java/environment-configuration/
// Prepare HTML code and save it to a file
String code = "<h1>Runtime Service</h1>\r\n" +
"<script> while(true) {} </script>\r\n" +
"<p>The Runtime Service optimizes your system by helping it start apps and programs faster.</p>\r\n";
try (java.io.FileWriter fileWriter = new java.io.FileWriter("runtime-service.html")) {
fileWriter.write(code);
}
// Create an instance of the Configuration class
Configuration configuration = new Configuration();
// Limit JS execution time to 5 seconds
IRuntimeService runtimeService = configuration.getService(IRuntimeService.class);
runtimeService.setJavaScriptTimeout(TimeSpan.fromSeconds(5));
// Initialize an HTML document with specified configuration
HTMLDocument document = new HTMLDocument("runtime-service.html", configuration);
// Convert HTML to PNG
Converter.convertHTML(document, new ImageSaveOptions(), "runtime-service_out.png");
// Load HTML asynchronously using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Create an instance of the HTMLDocument class
HTMLDocument document = new HTMLDocument();
// Create a string variable for OuterHTML property reading
StringBuilder outerHTML = new StringBuilder();
// Subscribe to 'ReadyStateChange' event
// This event will be fired during the document loading process
document.OnReadyStateChange.add(new DOMEventHandler() {
@Override
public void invoke(Object sender, Event e) {
// Check the value of the 'ReadyState' property
// This property is representing the status of the document. For detail information please visit https://www.w3schools.com/jsref/prop_doc_readystate.asp
if (document.getReadyState().equals("complete")) {
// Fill the outerHTML variable by value of loaded document
outerHTML.append(document.getDocumentElement().getOuterHTML());
}
}
});
Thread.sleep(5000);
System.out.println("outerHTML = " + outerHTML);
// Load HTML from a file using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Prepare the "load-from-file.html" file
try (java.io.FileWriter fileWriter = new java.io.FileWriter("load-from-file.html")) {
fileWriter.write("Hello, World!");
}
// Load HTML from the file
HTMLDocument document = new HTMLDocument("load-from-file.html");
// Write the document content to the output stream
System.out.println(document.getDocumentElement().getOuterHTML());
// Load HTML from a stream using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Create a memory stream object
String code = "<p>Hello, World! I love HTML!</p>";
java.io.InputStream inputStream = new java.io.ByteArrayInputStream(code.getBytes());
// Initialize a document from the stream variable
HTMLDocument document = new HTMLDocument(inputStream, ".");
// Save the document to disk
document.save("load-from-stream.html");
// Load HTML from a URL using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Load a document from https://docs.aspose.com/html/files/document.html web page
HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/files/document.html");
System.out.println(document.getDocumentElement().getOuterHTML());
// Load SVG from a string using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Initialize an SVG document from a string object
SVGDocument document = new SVGDocument("<svg xmlns='http://www.w3.org/2000/svg'><circle cx='50' cy='50' r='40'/></svg>", ".");
// Write the document content to the output stream
System.out.println(document.getDocumentElement().getOuterHTML());
// Log failed HTTP requests with a custom MessageHandler
// Learn more: https://docs.aspose.com/html/java/environment-configuration/
// Message handler logs all failed requests to the console
MessageHandler handler = new MessageHandler() {
@Override
public void invoke(INetworkOperationContext context) {
if (context.getResponse().getStatusCode() != HttpURLConnection.HTTP_OK) {
System.out.println(String.format("File '%s' Not Found", context.getRequest().getRequestUri().toString()));
}
// Invoke the next message handler in the chain
next(context);
}
};
// Save HTML to a file using Java
// Learn more: https://docs.aspose.com/html/java/save-a-document/
// Initialize an empty HTML document
HTMLDocument document = new HTMLDocument();
// Create a text node and add it to the document
Text text = document.createTextNode("Hello, World!");
document.getBody().appendChild(text);
// Save the HTML document to a file
document.save("save-to-file.html");
// Save HTML as a Markdown file using Java
// Learn more: https://docs.aspose.com/html/java/save-a-document/
// Prepare HTML code
String html_code = "<H2>Hello, World!</H2>";
// Initialize a document from a string variable
HTMLDocument document = new HTMLDocument(html_code, ".");
// Save the document as a Markdown file
document.save("save-to-MD.md", HTMLSaveFormat.Markdown);
// Save HTML as MHTML using Java
// Learn more: https://docs.aspose.com/html/java/save-a-document/
// Prepare a simple HTML file with a linked document
java.nio.file.Files.write(
java.nio.file.Paths.get("document.html"),
"<p>Hello, World!</p><a href='linked-file.html'>linked file</a>".getBytes());
// Prepare a simple linked HTML file
java.nio.file.Files.write(
java.nio.file.Paths.get("linked-file.html"),
"<p>Hello, linked file!</p>".getBytes());
// Load the "document.html" into memory
HTMLDocument document = new HTMLDocument("document.html");
// Save the document to MHTML format
document.save("save-to-MTHML.mht", HTMLSaveFormat.MHTML);
// Save HTML as SVG using Java
// Learn more: https://docs.aspose.com/html/java/save-a-document/
// Prepare SVG code
String code = "<svg xmlns='http://www.w3.org/2000/svg' height='200' width='300'>" +
"<g fill='none' stroke-width= '10' stroke-dasharray='30 10'>" +
"<path stroke='red' d='M 25 40 l 215 0' />" +
"<path stroke='black' d='M 35 80 l 215 0' />" +
"<path stroke='blue' d='M 45 120 l 215 0' />" +
"</g>" +
"</svg>";
// Initialize an SVG instance from the content string
SVGDocument document = new SVGDocument(code, ".");
// Save the SVG file to disk
document.save("save-to-SVG.svg");
// Save HTML with a linked resources using Java
// Learn more: https://docs.aspose.com/html/java/save-a-document/
// Prepare a simple HTML file with a linked document
java.nio.file.Files.write(
java.nio.file.Paths.get("save-with-linked-file.html"),
"<p>Hello, World!</p><a href='linked.html'>linked file</a>".getBytes());
// Prepare a simple linked HTML file
java.nio.file.Files.write(java.nio.file.Paths.get("linked.html"),
"<p>Hello, linked file!</p>".getBytes());
// Load the "save-with-linked-file.html" into memory
HTMLDocument document = new HTMLDocument("save-with-linked-file.html");
// Create an instance of the HTMLSaveOptions class
HTMLSaveOptions options = new HTMLSaveOptions();
// The following line with the value "0" cuts off all other linked HTML-files while saving this instance
// If you remove this line or change the value to "1", the "linked.html" file will be saved as well to the output folder
options.getResourceHandlingOptions().setMaxHandlingDepth(1);
// Save the document with the save options
document.save("save-with-linked-file_out.html", options);
// Set User Agent charset to ISO-8859-1 and convert HTML to PDF using Java
// Learn more: https://docs.aspose.com/html/java/environment-configuration/
// Prepare HTML code and save it to a file
String code = "<h1>Character Set</h1>\r\n" +
"<p>The <b>CharSet</b> property sets the primary character-set for a document.</p>\r\n";
try (java.io.FileWriter fileWriter = new java.io.FileWriter("user-agent-charset.html")) {
fileWriter.write(code);
}
// Create an instance of the Configuration class
Configuration configuration = new Configuration();
// Get the IUserAgentService
IUserAgentService userAgent = configuration.getService(IUserAgentService.class);
// Set ISO-8859-1 encoding to parse the document
userAgent.setCharSet("ISO-8859-1");
// Initialize an HTML document with specified configuration
HTMLDocument document = new HTMLDocument("user-agent-charset.html", configuration);
// Convert HTML to PDF
Converter.convertHTML(document, new PdfSaveOptions(), "user-agent-charset_out.pdf");
// Set font folder for HTML to PDF conversion using Java
// Learn more: https://docs.aspose.com/html/java/environment-configuration/
// Prepare HTML code and save it to a file
String code = "<h1>FontsSettings property</h1>\r\n" +
"<p>The FontsSettings property is used for configuration of fonts handling.</p>\r\n";
try (java.io.FileWriter fileWriter = new java.io.FileWriter("user-agent-fontsetting.html")) {
fileWriter.write(code);
}
// Initialize an instance of the Configuration class
Configuration configuration = new Configuration();
// Get the IUserAgentService
IUserAgentService userAgent = configuration.getService(IUserAgentService.class);
// Set a custom font folder path
userAgent.getFontsSettings().setFontsLookupFolder("fonts");
// Initialize an HTML document with specified configuration
HTMLDocument document = new HTMLDocument("user-agent-fontsetting.html", configuration);
// Convert HTML to PDF
Converter.convertHTML(document, new PdfSaveOptions(), "user-agent-fontsetting_out.pdf");
// Handle HTML document onLoad event when navigating to URL using Java
// Learn more: https://docs.aspose.com/html/java/create-a-document/
// Create an instance of the HTMLDocument class
HTMLDocument document = new HTMLDocument();
// Subscribe to the 'OnLoad' event. This event will be fired once the document is fully loaded
document.OnLoad.add(new DOMEventHandler() {
@Override
public void invoke(Object sender, Event e) {
msg = document.getDocumentElement().getOuterHTML();
System.out.println(msg);
}
});
// Navigate asynchronously at the specified Uri
document.navigate("https://html.spec.whatwg.org/multipage/introduction.html");
// Edit HTML body content and print the updated outerHTML using Aspose.HTML for Java
// Learn more: https://docs.aspose.com/html/java/edit-a-document/
// Create an instance of the HTMLDocument class
HTMLDocument document = new HTMLDocument();
// Write the content of the HTML document into the console output
System.out.println(document.getDocumentElement().getOuterHTML());
// @output: <html><head></head><body></body></html>
// Set the content of the <body> element
document.getBody().setInnerHTML("<p>HTML is the standard markup language for Web pages.</p>");
// Write the content of the HTML document into the console output
System.out.println(document.getDocumentElement().getOuterHTML());
// @output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment