Converting Files¶
Files to PDF¶
Document types supported by PyMuPDF can easily be converted to PDF by using the Document.convert_to_pdf() method. This method returns a buffer of data which can then be utilized by PyMuPDF to create a new PDF.
Example
import pymupdf
xps = pymupdf.open("input.xps")
pdfbytes = xps.convert_to_pdf()
pdf = pymupdf.open("pdf", pdfbytes)
pdf.save("output.pdf")
PDF to SVG¶
Technically, as SVG files cannot be multipage, we must export each page as an SVG.
To get an SVG representation of a page use the Page.get_svg_image() method.
Example
import pymupdf
doc = pymupdf.open("input.pdf")
page = doc[0]
# Convert page to SVG
svg_content = page.get_svg_image()
# Save to file
with open("output.svg", "w", encoding="utf-8") as f:
f.write(svg_content)
doc.close()
PDF to Markdown¶
By utlilizing the PyMuPDF4LLM API we are able to convert PDF to a Markdown representation.
Example
import pymupdf4llm
import pathlib
md_text = pymupdf4llm.to_markdown("test.pdf")
print(md_text)
pathlib.Path("4llm-output.md").write_bytes(md_text.encode())
PDF to DOCX¶
Use the pdf2docx library which uses PyMuPDF to provide document conversion from PDF to DOCX format.
Example
from pdf2docx import Converter
pdf_file = 'input.pdf'
docx_file = 'output.docx'
# convert pdf to docx
cv = Converter(pdf_file)
cv.convert(docx_file) # all pages by default
cv.close()
