If you are using the OpenNTF Poi extension this is quite simple to preform because there is an bean called poiBean that have a function called buildHTMLFromDocX this function takes an InputSteam and you can get an input stream from a NotesEmbeddedObject.
a simple example where you have stored an attachment in a richtext field called attachment and this doc is shown in hte view attachment. The code below is placed on an xpage button and the Xpage as an computed text that will show the Extracted viewScope data.
var v:NotesView=database.getView("test")
var doc:NotesDocument=v.getFirstDocument()
var rt:NotesRichTextItem=doc.getFirstItem("Attachment")
var att:NotesEmbeddedObject=rt.getEmbeddedObjects()[0]
var is:java.io.InputSteam=att.getInputStream()
var oi:java.io.ByteArrayOutputStream=poiBean.buildHTMLFromDocX(is)
viewScope.Extracted=oi.toString()If you want to parse the HTML jsoup is a great library
Check out my other Poi example published several year ago










