
LoRA(低秩适应)是一种用于 微调大型语言模型,特别是基于transformers的语言模型,以减少计算和内存开销。
LoRA 一般应用于已经在大型数据集上预先训练的模型。
预训练是使用大量文本数据语料库训练模型的初始阶段。该模型通过反向传播调整其内部参数(权重)来学习。在数十亿或数万亿个标记上重复此过程,直到模型可以准确地预测或生成文本。
预训练后,模型会进行微调,在较小的特定于任务的数据集上进行进一步训练,以使其适应特定任务,例如情感分析、翻译或问答。

Pre-training vs Fine-tuning
随着模型变得越来越大,微调对于有效地调整模型、减少所需的时间和 GPU 内存至关重大。它允许有针对性的调整,而无需重新训练整个模型,使大规模语言模型更加实用。

Model Sizes Over Time. Source
微调技术有多种类型,每种都适合不同的目标和环境。我们的目标是关注 LoRA,但让我们简要回顾一下最常见的技术。

Fine-Tune Types.
PEFT 背后的主要思想是使预训练模型适应新任务,同时仅更新一小部分参数。
在深入研究 LoRA 之前,让我们先分解一下 PEFT 的主要类型。
在LLMs中,权重矩阵是高维且密集的,这意味着它们有许多参数。
LoRA 背后的想法是,其中许多参数并不是对每项任务都至关重大。 LoRA 没有调整所有矩阵,而是引入了低秩矩阵,这些矩阵更小、更简单,但依旧能够捕获新任务所需的基本变化。
秩指矩阵中线性独立的行或列的数量。本质上,它表明矩阵所包含的复杂性或独特信息量。
矩阵的秩始终为零(零矩阵除外,其秩为零)。
我们通过将矩阵转换为行梯形形式来找到独立行的数量。独立行或列的数量是通过对非零行或列进行计数来确定的。

The rank of A is 3。

The rank of A is 2。

Source
如果矩阵的秩等于其最小维度(行或列),则将其视为 满级,这意味着它捕获了其大小可能的最大信息量。
一个 低等级另一方面,矩阵的秩小于其维度,这意味着它捕获的信息较少,并且可以视为满秩矩阵的简化或压缩版本。
LoRA 不是直接更新模型的大型满秩权重矩阵,而是引入了低秩矩阵。这些矩阵需要更少的参数来表明,并且训练的计算成本更便宜。
W has a rank of 2,这意味着它可以表明为两个较小矩阵的乘积。
import torch
import numpy as np
_ = torch.manual_seed(42)
d, k = 5, 5
W_rank = 2
W = torch.randn(d,W_rank) @ torch.randn(W_rank,k) # 5x2 @ 2x5
print(W)
"""
tensor([[ 0.9042, -1.4169, 1.4654, -1.2297, 0.6689],
[ 1.5066, 0.2465, -0.2778, -1.1182, 1.0168],
[-1.6714, 1.0142, -1.0348, 1.7002, -1.1763],
[ 0.6054, -1.1338, 1.1743, -0.8894, 0.4548],
[ 0.2302, 0.3984, -0.4187, -0.0421, 0.1419]])
"""
W_rank = np.linalg.matrix_rank(W)
print(f'Rank of W: {W_rank}')
"""
Rank of W: 2
"""
单值分解 (SVD):矩阵W可以分解为三个矩阵 U, S, 和V.
U, S, V = torch.svd(W)
U
"""
tensor([[-0.5466, 0.3814, 0.5431, 0.4039, 0.3125],
[-0.3295, -0.7763, 0.0493, 0.4477, -0.2932],
[ 0.6526, 0.1640, 0.0137, 0.7387, -0.0372],
[-0.4073, 0.3556, -0.7711, 0.2863, -0.1764],
[ 0.0301, -0.3139, -0.3284, 0.0938, 0.8854]])
"""
S
"""
tensor([4.6126e+00, 1.9884e+00, 1.0705e-07, 8.8211e-08, 3.1594e-09])
"""
V
"""
tensor([[-0.5032, -0.4806, 0.1853, 0.4932, -0.4881],
[ 0.3965, -0.5501, 0.6400, -0.3613, 0.0108],
[-0.4066, 0.5804, 0.6994, -0.0904, 0.0220],
[ 0.5444, 0.1884, 0.2553, 0.7665, 0.1245],
[-0.3576, -0.3067, 0.0421, 0.1750, 0.8635]])
"""
W = U x S x V^T
U_r = U[:, :W_rank]
S_r = torch.diag(S[:W_rank])
V_r = V[:, :W_rank].t()
B = U_r @ S_r
A = V_r
print(f'Shape of B: {B.shape}')
print(f'Shape of A: {A.shape}')
"""
Shape of B: torch.Size([5, 2])
Shape of A: torch.Size([2, 5])
"""
print("Total parameters of W: ", W.nelement())
print("Total parameters of B and A: ", B.nelement() + A.nelement())
"""
Total parameters of W: 25
Total parameters of B and A: 20
"""在传统的微调中,直接更新神经网络的整个权重矩阵。这意味着权重矩阵中的每个元素都是根据训练数据进行调整的。
LoRA 不是在微调时直接更新大权重矩阵,而是引入了两个较小的矩阵,一般称为 一个 和 B.

Two matrices. Source
这些矩阵被设计为低秩的。例如,如果原始权重矩阵 瓦 是有尺寸的 n×m,低阶矩阵 一个 和 乙可能有尺寸 n×r 和 r×m 分别是,其中 r 远小于 n 和 米.

Rank 2. Source
矩阵A和B的乘积得出矩阵C=A×B,其大小与原始权重矩阵W一样。
LoRA并不直接更新W ,而是通过调整这个乘积矩阵C来跟踪变化。具体来说,模型的输出受原始权重矩阵与低秩近似之和的影响,即W′=W+C

由于A和B比W小得多,因此在微调过程中仅调整一小部分参数。这使得该过程在内存使用和计算方面都更加高效。

Rank decomposition. Source

Low-rank matrix decomposition. Source

Source
在反向传播过程中,冻结的预训练权重保持不变,损失仅用于更新 LoRA 引入的 B 和 A 矩阵。
可训练参数的数量、矩阵的秩和模型精度是相互关联的。降低微调的等级会减少可训练参数,使过程更加高效,但可能会限制模型精度。平衡秩和可训练参数对于优化资源使用并保持模型精度至关重大。

Rank vs Trainable Parameters. Source
QLoRA (量化低秩自适应)以LoRA概念为基础,通过量化增加了一层效率。LoRA 通过使用低秩矩阵减少了可训练参数的数量,而 QLoRA 则通过量化这些矩阵更进一步。

LoRA and QLoRA. Source
在 QLoRA 中,精度在量化微调过程结束时通过一种称为反量化的技术恢复。QLoRA 使用 4 位量化。
在微调期间,低秩矩阵以量化格式存储,这意味着数值以较低精度表明(例如,4 位或 8 位)。
当模型用于推理或最终输出生成期间,这些量化矩阵被反量化。
然后将反量化的低秩矩阵与原始模型权重(保持较高的精度)相结合以产生最终输出。尽管在微调期间使用量化矩阵,这使得模型在推理期间保持高精度和准确度。
准确是表明数值的详细程度或准确程度。它涉及在训练和推理过程中如何存储和处理数字(如权重、激活和梯度)。精度一般由用于表明数字的位数决定:

Format of Floating points. Source

Source
量化是降低模型中使用的数字精度的过程,一般从 32 位浮点到较低位宽的表明形式,例如 16 位、8 位甚至 4 位整数。

int8 Quantization. Source
模型大小 = 数据类型大小 x 权重数量
给出了运行模型所需内存的粗略估计,也称为推理。
在训练过程中,内存要求更高,由于除了存储权重之外,还需要存储梯度和学习率。
浮点运算次数 代表 每秒浮点运算数。这是一种衡量计算机或 GPU 执行计算速度的方法,尤其是在处理数字时。
当你使用 精度较低(就像 16 位而不是 32 位),GPU 可以工作得更快,由于数字更小并且更容易处理。切换到较低的精度几乎可以使计算机训练模型的速度提高一倍,由于它可以在一样的时间内执行更多的计算。
使用 LoRA 或 QLoRA 微调模型时,权重变化由低秩矩阵表明。这些矩阵并不直接与原始权重相加;相反,它们在添加之前会按一个因子缩放。
Alpha 超参数决定了这种缩放的强度。它充当乘数,调整低秩矩阵对原始权重的影响。较高的 Alpha 会增加适应的影响,而较低的 Alpha 会减少适应的影响。
应用于权重变化的比例因子计算为Alpha / Rank。
peft是 Hugging Face 团队开发的一个库。
pip install peftfrom transformers import AutoModelForSeq2SeqLM
from peft import get_peft_model, LoraConfig, TaskTypepeft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/mt0-large")
model = get_peft_model(model, peft_config)get_peft_model用 LoRA 配置包装预训练模型,有效应用 LoRA 技术。这意味着模型将在微调期间使用低秩矩阵,从而减少需要训练的参数数量。
model.print_trainable_parameters()
"""
trainable params: 2,359,296 || all params: 1,231,940,608 || trainable%: 0.1915
"""我们将按照此调整临时LLM.
我将使用具有 GPU 支持的 Google Colab。
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.gitimport torch
torch.cuda.is_available()
# True让我们加载模型和标记器。我们将使用“bigscience/bloom-1b7”模型和“bigscience/tokenizer”标记器。
AutoModelForCausalLM加载预先训练的因果语言模型,该模型用于文本生成等任务,其中模型预测序列中的下一个单词。
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"bigscience/bloom-1b7",
torch_dtype=torch.float16,
device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained("bigscience/tokenizer")在 LoRA 中,我们针对模型内的特定权重矩阵进行低秩分解。在引用的论文中,作者选择分解与 询问 (wq) 和 价值 (wv)Transformer 架构中的组件。
在 BLOOM模型中,用于查询、键和值操作的组件被组合成一个名为 query_key_value。因此,与其单独针对 wq 和 wv,整个 query_key_value 模块用于分解。
这些组件的结构和命名约定在不同型号之间可能有所不同。例如,在某些模型中,例如 Llama,权重矩阵可能有不同的标签或组织在不同的模块下。
print(model)
"""
BloomForCausalLM(
(transformer): BloomModel(
(word_embeddings): Embedding(250880, 2048)
(word_embeddings_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
(h): ModuleList(
(0-23): 24 x BloomBlock(
(input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
(self_attention): BloomAttention(
(query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
(dense): Linear(in_features=2048, out_features=2048, bias=True)
(attention_dropout): Dropout(p=0.0, inplace=False)
)
(post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
(mlp): BloomMLP(
(dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
(gelu_impl): BloomGelu()
(dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
)
)
)
(ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=2048, out_features=250880, bias=False)
)
"""
预处理阶段...
我们迭代模型中的所有参数(权重和偏差)。
for param in model.parameters():
param.requires_grad = False # freeze the model - train adapters later
if param.ndim == 1:
# cast the small parameters (e.g. layernorm) to fp32 for stability
param.data = param.data.to(torch.float32)
model.gradient_checkpointing_enable() # reduce number of stored activations
model.enable_input_require_grads()
class CastOutputToFloat(nn.Sequential):
def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)param.requires_grad = False:该行通过设置冻结参数 requires_grad 到False,这意味着它们的值在反向传播期间不会更新。当您只想微调模型的某些部分(例如适配器(例如,在 LoRA 或其他微调技术中)),同时保持模型的其余部分不变时,这种情况很常见。
if param.ndim == 1::此条件检查参数是否为一维张量。这些参数一般包括偏差和归一化层(如 LayerNorm)中的参数。
param.data = param.data.to(torch.float32):将这些小的一维参数转换为 32 位浮点 (FP32) 精度。虽然模型的其余部分可能会使用较低的精度(如 FP16)来提高效率,但对这些参数使用 FP32 可以提高数值稳定性,特别是对于层归一化等操作,其中微小的差异可能会产生重大影响。
model.gradient_checkpointing_enable():此方法可在模型中启用梯度检查点。梯度检查点是一种用于减少训练期间内存使用的技术。该模型不会存储所有中间激活(反向传播所需的),而是在反向传播期间根据需要重新计算它们。
model.enable_input_require_grads():该方法为模型的输入嵌入启用梯度。当您希望梯度传播回输入标记或微调涉及修改嵌入时,这是必要的。这是一种确保某些层的输入跟踪其梯度的方法,当您训练适配器或对模型进行其他小修改时一般需要这种方法。
CastOutputToFloat类的定义是为了确保的输出lm_head(模型的最后一层,负责为语言建模任务生成 logits)被转换为 FP32,无论早期层使用的精度如何。
设置 LoRA 配置...
print_trainable_parameters函数计算并打印模型中相对于参数总数的可训练参数的数量。
def print_trainable_parameters(model):
"""
Prints the number of trainable parameters in the model.
"""
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
)from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["query_key_value"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
print_trainable_parameters(model)
"""
trainable params: 1572864 || all params: 1723981824 || trainable%: 0.09123437254985815
"""
我们将使用“squad_v2”数据集。
from datasets import load_dataset
qa_dataset = load_dataset("squad_v2")create_prompt 函数生成一个提示,将给定的上下文、问题和答案组合到单字符串模板中。然后,可以使用该提示来训练模型,使其学习在给定上下文和问题的情况下生成答案。
def create_prompt(context, question, answer):
if len(answer["text"]) < 1:
answer = "Cannot Find Answer"
else:
answer = answer["text"][0]
prompt_template = f"### CONTEXT
{context}
### QUESTION
{question}
### ANSWER
{answer}</s>"
return prompt_template
mapped_qa_dataset = qa_dataset.map(lambda samples: tokenizer(create_prompt(samples['context'], samples['question'], samples['answers'])))create_prompt函数应用于每个样本qa_dataset(问答数据集),然后使用标记器根据生成的提示对每个样本进行标记,使数据集为模型训练或微调做好准备。
让我们训练……
import transformers
trainer = transformers.Trainer(
model=model,
train_dataset=mapped_qa_dataset["train"],
args=transformers.TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
warmup_steps=100,
max_steps=100,
learning_rate=1e-3,
fp16=True,
logging_steps=1,
output_dir='outputs',
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
trainer.train()保存并加载模型...
peft_model_id="results"
trainer.model.save_pretrained(peft_model_id)
tokenizer.save_pretrained(peft_model_id)import torch
from peft import PeftModel, PeftConfig
# Load peft config for pre-trained checkpoint etc.
peft_model_id = "results"
config = PeftConfig.from_pretrained(peft_model_id)
# load base LLM model and tokenizer
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()
print("Peft model loaded")让我们尝试一下模型:
from IPython.display import display, Markdown
def make_inference(context, question):
batch = tokenizer(f"### CONTEXT
{context}
### QUESTION
{question}
### ANSWER
", return_tensors='pt')
with torch.cuda.amp.autocast():
output_tokens = model.generate(**batch, max_new_tokens=200)
display(Markdown((tokenizer.decode(output_tokens[0], skip_special_tokens=True))))context = "The Moon orbits Earth at an average distance of 384,400 km (238,900 mi), or about 30 times Earth's diameter. Its gravitational influence is the main driver of Earth's tides and very slowly lengthens Earth's day. The Moon's orbit around Earth has a sidereal period of 27.3 days. During each synodic period of 29.5 days, the amount of visible surface illuminated by the Sun varies from none up to 100%, resulting in lunar phases that form the basis for the months of a lunar calendar. The Moon is tidally locked to Earth, which means that the length of a full rotation of the Moon on its own axis causes its same side (the near side) to always face Earth, and the somewhat longer lunar day is the same as the synodic period. However, 59% of the total lunar surface can be seen from Earth through cyclical shifts in perspective known as libration."
question = "At what distance does the Moon orbit the Earth?"
make_inference(context, question)
"""
CONTEXT
The Moon orbits Earth at an average distance of 384,400 km (238,900 mi), or about 30 times Earth's diameter. Its gravitational influence is the main driver of Earth's tides and very slowly lengthens Earth's day. The Moon's orbit around Earth has a sidereal period of 27.3 days. During each synodic period of 29.5 days, the amount of visible surface illuminated by the Sun varies from none up to 100%, resulting in lunar phases that form the basis for the months of a lunar calendar. The Moon is tidally locked to Earth, which means that the length of a full rotation of the Moon on its own axis causes its same side (the near side) to always face Earth, and the somewhat longer lunar day is the same as the synodic period. However, 59% of the total lunar surface can be seen from Earth through cyclical shifts in perspective known as libration.
QUESTION
At what distance does the Moon orbit the Earth?
ANSWER
The Moon orbits the Earth at an average distance of 384,400 km (238,900 mi), or about 30 times Earth's diameter. Its gravitational influence is the main driver of Earth's tides and very slowly lengthens Earth's day. The Moon's orbit around Earth has a sidereal period of 27.3 days. During each synodic period of 29.5 days, the amount of visible surface illuminated by the Sun varies from none up to 100%, resulting in lunar phases that form the basis for the months of a lunar calendar. The Moon is tidally locked to Earth, which means that the length of a full rotation of the Moon on its own axis causes its same side (the near side) to always face Earth, and the somewhat longer lunar day is the same as the synodic period. However, 59% of the total lunar surface can be seen from Earth through cyclical shifts in perspective known as libration.
"""让我们演示如何应用 LoRA 来微调 FLAN-T5 模型。
我将使用具有 GPU 支持的 Google Colab。
!pip install datasets py7zr rouge-score bitsandbytes accelerate transformers
!pip install "peft==0.2.0"我们将使用 Samsun dataset。
SAMSum 数据集由三星波兰研发中心的语言学家创建,包含 16000 个带有第三人称摘要的信使式对话,可在非商业许可下用于研究。
from datasets import load_dataset
dataset = load_dataset("samsum", download_mode="force_redownload")
print(f"Train dataset size: {len(dataset['train'])}")
print(f"Test dataset size: {len(dataset['test'])}")
"""
Train dataset size: 14732
Test dataset size: 819
"""
print(dataset["train"][1])
"""
{'id': '13728867',
'dialogue': 'Olivia: Who are you voting for in this election?
Oliver: Liberals as always.
Olivia: Me too!!
Oliver: Great',
'summary': 'Olivia and Olivier are voting for liberals in this election. '}
"""让我们得到分词器。我们将使用 Flan-T5-small.。
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id="google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_id)目前,进行一些预处理......
import numpy as np
from datasets import concatenate_datasets
tokenized_inputs = concatenate_datasets([dataset["train"], dataset["test"]])
.map(lambda x: tokenizer(x["dialogue"], truncation=True),
batched=True, remove_columns=["dialogue", "summary"])
input_lenghts = [len(x) for x in tokenized_inputs["input_ids"]]
max_source_length = int(np.percentile(input_lenghts, 85))
print(f"Max source length: {max_source_length}")
# Max source length: 255
print(tokenized_inputs[0])
"""
{'id': '13818513',
'input_ids': [21542,
10,
27,
...
61,
1],
'attention_mask': [1,
1,
...
1]}
"""
concatenate_datasets用于将多个数据集合并为一个。
第一,将训练和测试数据集连接成一个数据集以一起处理它们。然后,“对话” 连接数据聚焦的字段被标记化。
max_source_length表明 85% 的标记化输入序列的长度。这是为了更好地利用的最大长度。
同样,“summary” 字段被连接起来并且 target_lenghts 被计算。
tokenized_targets = concatenate_datasets([dataset["train"], dataset["test"]])
.map(lambda x: tokenizer(x["summary"], truncation=True),
batched=True, remove_columns=["dialogue", "summary"])
target_lenghts = [len(x) for x in tokenized_targets["input_ids"]]
max_target_length = int(np.percentile(target_lenghts, 90))
print(f"Max target length: {max_target_length}")
# Max target length: 50
print(tokenized_targets[0])
"""
{'id': '13818513', 'input_ids': [21542, 13635, 5081, 11, 56, 830, 16637, 128, 5721, 5, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
"""这 preprocess_function 定义为将数据集转换为适合模型训练的格式。
这 preprocess_function 使用以下方法应用于整个数据集 map 方法,批量处理数据集。这remove_columns参数指定标记化后应删除哪些原始列(对话、摘要和 id)。
def preprocess_function(sample,padding="max_length"):
# add prefix to the input for t5
inputs = ["summarize: " + item for item in sample["dialogue"]]
# tokenize inputs
model_inputs = tokenizer(inputs, max_length=max_source_length, padding=padding, truncation=True)
# Tokenize targets with the `text_target` keyword argument
labels = tokenizer(text_target=sample["summary"], max_length=max_target_length, padding=padding, truncation=True)
# If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
# padding in the loss.
if padding == "max_length":
labels["input_ids"] = [
[(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]
]
model_inputs["labels"] = labels["input_ids"]
return model_inputs
tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=["dialogue", "summary", "id"])
print(f"Keys of tokenized dataset: {list(tokenized_dataset['train'].features)}")
# save datasets to disk for later easy loading
tokenized_dataset["train"].save_to_disk("data/train")
tokenized_dataset["test"].save_to_disk("data/eval")
"""
Keys of tokenized dataset: ['input_ids', 'attention_mask', 'labels']
"""import torch
torch.cuda.is_available()
# True让我们加载模型。
from transformers import AutoModelForSeq2SeqLM
model_id = "google/flan-t5-small"
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map="auto")让我们使用 LoRA 来实现peft.
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_2_SEQ_LM
)
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 688128 || all params: 77649280 || trainable%: 0.8862001038515747
prepare_model_for_int8_training准备先前加载的 8 位模型(model)进行训练。它可以通过冻结某些层并使其他层可训练来确保模型正确配置以进行微调。
数据整理器负责在训练或评估期间动态地将数据批处理在一起。
from transformers import DataCollatorForSeq2Seq
label_pad_token_id = -100
data_collator = DataCollatorForSeq2Seq(
tokenizer,
model=model,
label_pad_token_id=label_pad_token_id,
pad_to_multiple_of=8
)pad_to_multiple_of=8:此选项可确保将序列填充到 8 的倍数的长度。填充到 8 的倍数可能有利于某些硬件(例如 GPU)的性能,由于它与优化计算效率的内存访问模式保持一致。
目前,让我们设置训练循环。
Seq2SeqTrainer是用于训练序列到序列模型的专门训练器类。Seq2SeqTrainingArguments用于定义训练过程的各种超参数和配置。
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
output_dir="lora-flan-t5-base"
training_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
auto_find_batch_size=True,
learning_rate=1e-3,
num_train_epochs=5,
logging_dir=f"{output_dir}/logs",
logging_strategy="steps",
logging_steps=500,
save_strategy="no",
report_to="tensorboard",
)
# Create Trainer instance
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=tokenized_dataset["train"],
)
model.config.use_cache = Falsemodel.config.use_cache = False:这会在训练期间禁用缓存。在某些模型中,缓存可以通过存储中间结果来加速推理,但在训练过程中,它可能会导致问题或导致不必要的警告。通过将其设置为False,代码会抑制这些警告,确保训练过程顺利进行。不过,一般应该重新启用缓存以进行推理以提高性能。
训练…
trainer.train()
"""
/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
[1320/9210 09:46 < 58:29, 2.25 it/s, Epoch 0.72/5]
Step Training Loss
500 1.928700
1000 1.930100
[9210/9210 1:07:01, Epoch 5/5]
Step Training Loss
500 1.928700
1000 1.930100
1500 1.930600
2000 1.896400
2500 1.875400
3000 1.887600
3500 1.862400
4000 1.841600
4500 1.835700
5000 1.846200
5500 1.832900
6000 1.818700
6500 1.812300
7000 1.796200
7500 1.771600
8000 1.781500
8500 1.769900
9000 1.779300
TrainOutput(global_step=9210, training_loss=1.8422986688106509, metrics={'train_runtime': 4023.311, 'train_samples_per_second': 18.308, 'train_steps_per_second': 2.289, 'total_flos': 6924203301273600.0, 'train_loss': 1.8422986688106509, 'epoch': 5.0})
"""# Save our LoRA model & tokenizer results
peft_model_id="results"
trainer.model.save_pretrained(peft_model_id)
tokenizer.save_pretrained(peft_model_id)
# if you want to save the base model to call
# trainer.model.base_model.save_pretrained(peft_model_id)TrainOutput:这是训练过程的总结。
以下是如何加载刚刚保存的模型。
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load peft config for pre-trained checkpoint etc.
peft_model_id = "results"
config = PeftConfig.from_pretrained(peft_model_id)
# load base LLM model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()model.eval()将模型设置为评估模式,这是执行推理时必需的。在评估模式下,某些层(例如 dropout)被禁用,以确保推理期间结果一致。
我们可以从“samsum”数据聚焦生成对话样本的文本摘要。
from datasets import load_dataset
from random import randrange
dataset = load_dataset("samsum")
sample = dataset['test'][randrange(len(dataset["test"]))]
input_ids = tokenizer(sample["dialogue"], return_tensors="pt", truncation=True).input_ids.cuda()
outputs = model.generate(input_ids=input_ids, max_new_tokens=40, do_sample=True, top_p=0.9)
print(f"input sentence: {sample['dialogue']}
{'---'* 20}")
print(f"summary:
{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")
"""
input sentence: Lincoln: Heeyyy ;* whats up
Fatima: I talked to Jenson, he’s not too happy ;p
Lincoln: the place sucks??
Fatima: No, the place is ok, I think, we can go there, it’s about Alene
Lincoln: typical, dont worry about it
Fatima: He thinks she may have a depression :[
Lincoln: nothin new, everyone has it, she needs a doctor then
Fatima: But she won’t go ;/
Lincoln: so she’s destroying her life fuck it its not your problem
Fatima: It is, they’re both my friends!
Lincoln: you better think what to do if they break up
Fatima: Ehh yes Ill have a problem ;//
Lincoln: both blaming each other and talking with you about it, perfect
Fatima: Alene is just troubled… She’d been through a lot…
Lincoln: everyone has their problems, the question is are ya doin sth about them
Fatima: She has problems facing it, don’t be surprised :[
Lincoln: then it is her problem
Fatima: You are so cruel at times… o.O
Lincoln: maybe, for me its just a common sense
Fatima: Why can’t everyone be just happy???
Lincoln: youll not understand, you had good childhood, nice parents, you have no idea
Fatima: Probably, true… Well I can be just grateful o.o
Lincoln: do that and stop worrying about others, youre way to bautful for that <3
Fatima: :*:*:*
------------------------------------------------------------
summary:
Fatima spoke to Jenson who might have a depression and does not go because Alene is having problems with her. Lincoln advises them to stop worrying about others.
"""最后,我们来评价一下。
ROUGE 是摘要等任务的常见评估指标,其目标是将生成的文本与参考(真实情况)文本进行比较。
evaluate库提供轻松访问 NLP 中常用的评估指标,例如 ROUGE、BLEU 等。
evaluate_peft_model从数据聚焦获取单个样本并使用模型生成摘要。然后,它对生成的摘要(预测)和参考摘要(标签)进行解码,并将其返回以进行评估。
import evaluate
import numpy as np
from datasets import load_from_disk
from tqdm import tqdm
metric = evaluate.load("rouge")
def evaluate_peft_model(sample,max_target_length=50):
# generate summary
outputs = model.generate(input_ids=sample["input_ids"].unsqueeze(0).cuda(), do_sample=True, top_p=0.9, max_new_tokens=max_target_length)
prediction = tokenizer.decode(outputs[0].detach().cpu().numpy(), skip_special_tokens=True)
# decode eval sample
# Replace -100 in the labels as we can't decode them.
labels = np.where(sample['labels'] != -100, sample['labels'], tokenizer.pad_token_id)
labels = tokenizer.decode(labels, skip_special_tokens=True)
# Some simple post-processing
return prediction, labels
test_dataset = load_from_disk("data/eval/").with_format("torch")
predictions, references = [] , []
for sample in tqdm(test_dataset):
p,l = evaluate_peft_model(sample)
predictions.append(p)
references.append(l)
rogue = metric.compute(predictions=predictions, references=references, use_stemmer=True)
print(f"Rogue1: {rogue['rouge1']* 100:2f}%")
print(f"rouge2: {rogue['rouge2']* 100:2f}%")
print(f"rougeL: {rogue['rougeL']* 100:2f}%")
print(f"rougeLsum: {rogue['rougeLsum']* 100:2f}%")
"""
Rogue1: 41.264447%
rouge2: 16.052127%
rougeL: 32.351498%
rougeLsum: 32.337353%
"""参考:
https://arxiv.org/abs/2106.09685
https://www.superannotate.com/blog/llm-fine-tuning
https://www.turing.com/resources/finetuning-large-language-models#primary-fine-tuning-approaches
https://twosigmaventures.com/blog/article/the-promise-and-perils-of-large-language-models/
https://ai.plainenglish.io/lora-explained-enhancing-ai-models-with-low-rank-adaptation-56d0bfc42deb
¥79.20
YourCee/Raspberry Pi 树莓派Grove Base Hat for传感器基开发板
¥3.00
D1迷你版PRO升级版NodeMcu Lua WIFI基于ESP8266开发板MINI学习板
¥145.73
【YourCee】原装正版uno r3开发板 编程Atmega328P AVR 8位单片机
¥2.10
智能全彩RGB环开发板1位 8位 12位 16 位 24位WS2812 5050LED大环
¥175.00
3D打印机套件DIY控制器开发板MEGA2560 R3+12864液晶屏+RAMPS1.4
¥140.00
MEGA 2560 R3入门学习套件开发板 面包板电阻传感器驱动板