If you, like me, resent every dollar spent on commercial PDF tools,
you might want to know how to change the text content of a PDF without
having to pay for Adobe Acrobat or another PDF tool. I didn't see an
obvious open-source tool that lets you dig into PDF internals, but I
did discover a few useful facts about how PDFs are structured that
I think may prove useful to others (or myself) in the future. They
are recorded here. They are surely not universally applicable --
the PDF standard is truly Byzantine -- but they worked for my case.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from transformers import AutoModelForCausalLM, AutoTokenizer | |
from peft import PeftModel | |
import torch | |
import os | |
import argparse | |
def get_args(): | |
parser = argparse.ArgumentParser() | |
parser.add_argument("--base_model_name_or_path", type=str) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding=utf-8 | |
# Copyright 2023 The HuggingFace Inc. team. All rights reserved. | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Based on younesbelkada/finetune_llama_v2.py | |
# Install the following libraries: | |
# pip install accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 scipy | |
from dataclasses import dataclass, field | |
from typing import Optional | |
import torch | |
from datasets import load_dataset | |
from transformers import ( |