Read html from a docx file

If you are using the OpenNTF Poi extension this is quite simple to preform because there is an bean called poiBean that have a function called buildHTMLFromDocX this function takes an InputSteam and you can get an input stream from a NotesEmbeddedObject.

a simple example where you have stored an attachment in a richtext field called attachment and this doc is shown in hte view attachment. The code below is placed on an xpage button and the Xpage as an computed text that will show the Extracted viewScope data.

var v:NotesView=database.getView("test")
var doc:NotesDocument=v.getFirstDocument()
var rt:NotesRichTextItem=doc.getFirstItem("Attachment")
var att:NotesEmbeddedObject=rt.getEmbeddedObjects()[0]
var is:java.io.InputSteam=att.getInputStream()
var oi:java.io.ByteArrayOutputStream=poiBean.buildHTMLFromDocX(is)
viewScope.Extracted=oi.toString()

If you want to parse the HTML jsoup is a great library

Check out my other Poi example published several year ago

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.