新需求需要写程序在word中插入Latex公式。所以研究了一下JavaPOI库。本文章记录一下坑。
首先现在的docx文件中的公式使用的是OMML,Latex公式不能直接插入,需要先转换成MathML再转成OMML。
其中MathML to OMML这一部分已经有微软现成的XSL样式表MML2OMML.XSL
只要你安装了新office,就能在目录中搜到这个文件。
SnuggleTeX 这个java库可以实现Latex字符串转MathML,所以思路比较明确,就是先用SnuggleTeX转MathML,再用MML2OMML转成OOMML,插入到段落中。
下面是代码:
public class main{
static File stylesheet = new File("MML2OMML.XSL");
static TransformerFactory tFactory = TransformerFactory.newInstance();
static StreamSource styleSource = new StreamSource(stylesheet);
public static CTOMath _getOMML(String mathML) throws Exception {
Transformer transformer = tFactory.newTransformer(styleSource);
StringReader stringreader = new StringReader(mathML);
StreamSource source = new StreamSource(stringreader);
StringWriter stringwriter = new StringWriter();
StreamResult result = new StreamResult(stringwriter);
transformer.transform(source, result);
String ooML = stringwriter.toString();
stringwriter.close();
CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
CTOMath ctOMath = ctOMathPara.getOMathArray(0);
XmlCursor xmlcursor = ctOMath.newCursor();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (xmlcursor.getObject() instanceof CTR cTR) {
cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
cTR.getRPr2().getRFontsArray(0).setHAnsi("Cambria Math");
}
}
}
return ctOMath;
}
public void add_latex(String latex,XWPFParagraph paragraph) throws Exception {
SnuggleEngine engine = new SnuggleEngine();
SnuggleSession session = engine.createSession();
SnuggleInput input = new SnuggleInput(latex);
session.parseInput(input);
String mathML = session.buildXMLString();
CTOMath ctOMath = _getOMML(mathML);
CTP ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});
}
}
QA:
JAXP0801001: the compiler encountered an XPath expression containing '11' groups that exceeds the '10' limit set by 'FEATURE_SECURE_PROCESSING'.
解决方式:修改VM选项。添加 -Djdk.xml.xpathExprGrpLimit=0 -Djdk.xml.xpathExprOpLimit=0 -Djdk.xml.xpathTotalOpLimit=0
评论区