Building web applications:
PDF merge/split
Building a Simple PDF Merger/Splitter
Web Application using Python
Abstract
In today’s digital age, web applications play a crucial role in enhancing efficiency, accessibility, and user experience across various domains. In this tutorial, I’ll walk you through the process of creating a simple yet powerful PDF merger and splitter web application using the Python programming language. Not only will I introduce you to the technical aspects of web application development, but I’ll also address data security concerns that you might have when dealing with online tools.
Content
The PDF Merger and Splitter application
At its core, my web application addresses a common need – merging and splitting PDF files. Whether you’re combining multiple reports or extracting specific sections from a document, my application simplifies these tasks and empowers you to manage PDFs effortlessly.
Why Python?
Python is a language known for its simplicity and versatility. With an abundance of libraries available, it’s an ideal choice for tasks like PDF manipulation. In this project, we’ll leverage Python’s capabilities to create an intuitive and user-friendly web application.
Setting up the environment
Before we dive into the code, let’s ensure that you have the necessary environment set up. Create a Conda environment to isolate the project’s dependencies. Open your terminal and run the following commands:
conda create -n pdf-app python=3.8
conda activate pdf-app
pip install gradio pymupdf
The infrastructure
To create our web application, I’ll rely on two essential libraries: Gradio and PyMuPDF (aka. fitz). Gradio provides an elegant way to design interactive interfaces for machine learning and other applications. PyMuPDF, on the other hand, is a powerful library for PDF manipulation.
Code breakdown
Let’s dive into the code that powers our PDF merger and splitter application. I’ll guide you through each step to help you understand how the different components come together.
# Import necessary libraries
import os
import gradio as gr
import fitz
# Define the merge function
def merge(pdfs):
os.makedirs("out", exist_ok=True)
os.chdir("./out")
result = fitz.open()
for pdf in pdfs:
with fitz.open(pdf) as mfile:
result.insert_pdf(mfile)
result.save("merge.pdf")
file = "merge.pdf"
return file
# Create the interface
demo = gr.Interface(
fn=merge,
inputs=gr.Files(file_types=["text", ".pdf"]),
outputs="file",
theme='nuttea/Softblue',
allow_flagging="never"
)
# Launch the interface
demo.queue(concurrency_count=10)
demo.launch()
Deploying on Hugging Face Spaces
Take your Python application to the cloud by utilizing an online hosting service such as Hugging Face Spaces, offering free tier spaces with 2 CPUs and 16 GB memory. Create a free account and follow the steps to deploy your Python app. Once deployed, your app will be accessible to users around the world. Naturally, you can also decide to self-host, but such an infrastructure will not be covered by this blog post.
Sharing on different platforms
Want to embed your PDF merger and splitter app on your company website or SharePoint? You can use web components to seamlessly integrate the app. Simply add the following code snippets to your website’s HTML code:
<script
type="module"
src="https://gradio.s3-us-west-2.amazonaws.com/3.39.0/gradio.js"
></script>
<gradio-app src="https://jpmadsen-pdf-merge.hf.space" eager="true" info="false"></gradio-app>
Conclusion
Web applications are instrumental in making complex tasks accessible to users worldwide. With Python and the libraries we’ve explored, you have the power to create practical tools that simplify everyday challenges. I invite you to explore my PDF merger and splitter, share your feedback, and embark on your journey of web application development.
Comments
Looking for an outside perspective?
Just send me a message!