新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> XML与各种文件格式的相互转换及相关工具。 word to xml, xml to word, html to xml, xml to pdf,
    csv to xml, rtf to xml, text to xml, xml to text, xls to xml, xml to xls
    FOP
    [返回] 中文XML论坛 - 专业的XML技术讨论区XML.ORG.CN讨论区 - XML技术『 WORD to XML, HTML to XML 』 → CambridgeDocs - 一个Word to XML工具 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 12465 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: CambridgeDocs - 一个Word to XML工具 举报  打印  推荐  IE收藏夹 
       本主题类别:     
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 WORD to XML, HTML to XML 』的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 CambridgeDocs - 一个Word to XML工具

    CambridgeDocs Technology Overview:   
       Driver for Microsoft Word Paragraph Content (text)
    Styles Information (style of text of paragraph and of text runs within paragraphs)
    Formatting information (font, font-color, font-size) of paragraph and of text runs within paragraphs that deviate from "Style" setting
    Paragraph Format Information (leftindent, rightindent, spacebefore, spaceafter)
    Frames (text frames can be extracted as a block-level <FRAME> tag, which has the contents within it).  Specific information about the location of the frame on the page (x,y coordinates) can be extracted (if Pagination = true).
    Images (bitmap images within the Word document can be extracted).  Specific information about location of the image on the page (x,y coordinates) can be set (if Pagination =true).
    Superscripted text in-line is extracted and noted, including reference to footnotes.
    WordArt (extracted as WMF files)
    Lists -numbered lists and bulleted lists are identified
    Page Breaks - hard page breaks are inserted as block level items
    Word Fields - word fields can have either their text extracted by itself, or you can have <FIELD> tags as in-line tags, with a field code, as well as the content of the field.
    Tables - table information is extracted, including background color, column-widths, row-height, colspan, rowspan, table-border (at the level of each cell), including border-color.
    Pagination - pagination can be set to true, in which case the entire document is divided into <PAGE> tags.
    Footnotes and EndNotes can extracted (they all become endnotes in the XML version of the document and are automatically renumbered)
    Page Headers and Footers can be extracted, as <HEADER> and <FOOTER> elements.

    The Microsoft Word driver built by CambridgeDocs was meant to extract as much information as possible from a Microsoft Word (.doc, .rtf, or other) file into XML.  This includes the content, the formatting and stylistic information, layout information, and graphics information. We refer to this as "non-lossy", because many of our customers want to use XML for multi-channel publishing, which means that after the conversion to XML, they may want to reconvert to HTML, to PDF, etc.

    Depending on your needs, you can set options on or off for specific bits of information.  Our XML conversion also includes a pagination option, which preserves the pagination of the original document (especially useful for pages which have text frames and images positioned exactly on the page).

      
    Word Driver FAQs

    What XML format an I convert my Word documents into?

    The driver initially converts into ppXML, our "intermediate format".  You can then  convert into any further XML schema you like, including DocBook, LegalXML, or into your own custom DTD/schema using an XSLT, or by using the extraction and transformation rules of the xDoc Converter, our flagship product.

    What format can I render it into?

    We provide an XSLT that can be used to convert it further - into XHTML so that it can be viewed in a browser.   You can see this in action by going to the "View as HTML" tab of the RUN/DEBUG window in the xDoc Converter, or by applying the XSLT in the XMLSpy plug-in.

    We also provide an XSLT that can transform ppXML into XSL:FO, which can be used to create PDF files, RTF files, etc.

    Can I do a two-way conversion back into Word?

    Yes, you can do a two way conversion - from Word in to XML, and then from XML back into Word using our XSL:FO and RTF rendering capabilities.  The xDoc Submit plug-in for Word will have this functionality built into it.  However, because of some limitations of XSL:FO rendering engines, you may not be able to convert some of the more advanced features of the word driver both ways.


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2005/2/24 0:13:00
     
     zhangshying 帅哥哟,离线,有人找我吗?
      
      
      等级:大一(猛啃高等数学)
      文章:15
      积分:112
      门派:XML.ORG.CN
      注册:2005/4/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给zhangshying发送一个短消息 把zhangshying加入好友 查看zhangshying的个人资料 搜索zhangshying在『 WORD to XML, HTML to XML 』的所有贴子 引用回复这个贴子 回复这个贴子 查看zhangshying的博客2
    发贴心情 
    有TXT文件自动转到WORD文件的工具吗
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2005/4/25 16:50:00
     
     cxh0926 帅哥哟,离线,有人找我吗?
      
      
      等级:大一(猛啃高等数学)
      文章:20
      积分:136
      门派:XML.ORG.CN
      注册:2005/3/11

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给cxh0926发送一个短消息 把cxh0926加入好友 查看cxh0926的个人资料 搜索cxh0926在『 WORD to XML, HTML to XML 』的所有贴子 引用回复这个贴子 回复这个贴子 查看cxh0926的博客3
    发贴心情 
    顶起。希望高手来解决下,我也正要问的!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2005/5/5 22:18:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 WORD to XML, HTML to XML 』的所有贴子 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/12/22 5:30:09

    本主题贴数3,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    46.875ms