Tech Help Needed: MS Word HTML to DOCX
Sep. 20th, 2013 10:13 amOh, Great Internet Guru. Your technical help is required.
I’m writing some perl scripts at work that have as one of their ultimate end goals the generation of an MS Word document that can ideally serve as a subdocument in a master document/subdocument arrangement. I’ve done Master/Subdocuments before with DOCX format, and I’ve discovered that if the master is a DOC, it will look for subdocuments that are DOCs.
So far, I’ve got the script generating the variant of HTML that Word understands so that it can get the proper formats (e.g., I copy the prologue that defines the Word formats I need and generate HTML with appropriate CLASS= statements). This gives me a standalone .htm file, which I can rename as .doc and Word handles just fine. However, if you save it, Word knows it is really HTML and creates this funky subdirectory with files that it really doesn’t use.
I’ve been looking for a way that I can convert this .HTM or .DOC file into a real Word .DOC or .DOCX file without interaction (i.e., from the command line). I tried going the Macro approach, and even found that it saves the macro in that subdirectory for the .HTM files… however, the security restrictions on our systems here mean that I can’t execute the macro from the MS Word command line via /m . There’s a file WORDCONV.EXE in the Office12 directory, but it doesn’t seem to do anything, and I can’t seem to find any documentation on it.
So, for those MS Word gurus out there — any ideas? I can live with what I’ve got now (the .htm file I rename); I’d just like something cleaner.
This entry was originally posted on Observations Along The Road (on cahighways.org) as this entry by cahwyguy. Although you can comment on DW, please make comments on original post at the Wordpress blog using the link below; you can sign in with your LJ, FB, or a myriad of other accounts. There are currently comments on the Wordpress blog. PS: If you see share buttons above, note that they do not work outside of the Wordpress blog.