Technology Sharing

gptpdf for LLMs: gptpdf introduction, installation and usage methods, detailed guide to case applications

2024-07-08

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

gptpdf for LLMs: gptpdf introduction, installation and usage methods, detailed guide to case applications

Table of contents

Introduction to gptpdf

1. Processing flow

The first step is to use the PyMuPDF library to parse the PDF to find all non-text areas and mark them, for example:

The second step is to use a large visual model (such as GPT-4o) to parse and obtain a markdown file.

How to install and use gptpdf

1. Installation

2. Use

Interpreting the test.py code

3、API

Case application of gptpdf


Introduction to gptpdf

gptpdf is a tool that mainly uses visual language models (such as GPT-4o) to parse PDF into markdown. Our method is very simple (only 293 lines of code), but it can almostPerfectly parse typesetting, mathematical formulas, tables, pictures, charts, etc.The average price per page is only $0.013. We use GeneralAgent lib to interact with OpenAI API. pdfgpt-ui is a visualization tool based on gptpdf.

Github addressGitHub - CosmosShadow/gptpdf: Using GPT to parse PDF

1、Processing Flow

The first step is to use the PyMuPDF library to parse the PDF to find all non-text areas and mark them, for example:

The second step is to use a large visual model (such as GPT-4o) to parse and obtain a markdown file.

How to install and use gptpdf

1、Install

pip install gptpdf

2、use

from gptpdf import parse_pdf

api_key = 'Your OpenAI API Key'
content, image_paths = parse_pdf(pdf_path, api_key=api_key)
print(content)

For more information, see test/test.py

address:https://github.com/CosmosShadow/gptpdf/blob/main/test/test.py

Interpreting the test.py code

import os

# 从 .env 文件中加载环境变量
import dotenv
dotenv.load_dotenv()

def test_use_api_key():
    from gptpdf import parse_pdf
    pdf_path = '../examples/attention_is_all_you_need.pdf'
    output_dir = '../examples/attention_is_all_you_need/'
    # 从环境变量中获取 OPENAI_API_KEY 和 OPENAI_API_BASE
    api_key = os.getenv('OPENAI_API_KEY')
    base_url = os.getenv('OPENAI_API_BASE')
    # 手动提供 OPENAI_API_KEY 和 OPENAI_API_BASE
    content, image_paths = parse_pdf(pdf_path, output_dir=output_dir, api_key=api_key, base_url=base_url, model='gpt-4o', gpt_worker=6)
    # 输出解析后的内容和图像路径
    print(content)
    print(image_paths)
    # 同时会生成 output_dir/output.md 文件

def test_use_env():
    from gptpdf import parse_pdf
    pdf_path = '../examples/attention_is_all_you_need.pdf'
    output_dir = '../examples/attention_is_all_you_need/'
    # 使用环境变量中的 OPENAI_API_KEY 和 OPENAI_API_BASE
    content, image_paths = parse_pdf(pdf_path, output_dir=output_dir, model='gpt-4o', verbose=True)
    # 输出解析后的内容和图像路径
    print(content)
    print(image_paths)
    # 同时会生成 output_dir/output.md 文件

def test_azure():
    from gptpdf import parse_pdf
    # Azure API Key
    api_key = '8ef0b4df45e444079cd5a4xxxxx' 
    # Azure API 基础 URL
    base_url = 'https://xxx.openai.azure.com/' 
    # Azure 部署的模型 ID 名称(不是 OpenAI 模型名称)
    model = 'azure_xxxx'

    pdf_path = '../examples/attention_is_all_you_need.pdf'
    output_dir = '../examples/attention_is_all_you_need/'
    # 使用提供的 Azure API Key 和基础 URL
    content, image_paths = parse_pdf(pdf_path, output_dir=output_dir, api_key=api_key, base_url=base_url, model=model, verbose=True)
    # 输出解析后的内容和图像路径
    print(content)
    print(image_paths)

if __name__ == '__main__':
    # 取消注释以运行特定的测试函数
    # test_use_api_key()
    # test_use_env()
    test_azure()

3、API

parse_pdf(pdf_path, output_dir='./', api_key=None, base_url=None, model='gpt-4o', verbose=False)
Parse the pdf file into a markdown file and return the markdown content and a list of all image paths.

  • pdf_path: pdf file path

  • output_dir: Output directory. Stores all images and markdown files

  • api_key: OpenAI API key (optional). If not provided, the OPENAI_API_KEY environment variable is used.

  • base_url: OpenAI base URL. (Optional). If not provided, the OPENAI_BASE_URL environment variable is used.

  • model: Multimodal large model in OpenAI API format, default is "gpt-4o". If you need to use other models, such as