Langchain通过gradio_tools支持集成多模态大模型

  • 时间:2025-12-06 22:34 作者: 来源: 阅读:0
  • 扫一扫,手机访问
摘要:大家都还记得之前的AutoGPT,HuggingfaceGPT变相调用各种模型实现多模态等探索更广泛性场景。目前langchain通过gradio_tools集成涵盖(文本,语音,视频以及多个场景彼此转换等),本质tools调用hf的api space,例如:依据图片创造音乐(ImageToMusicTool),图片进行分割(SAMImageSegmentationTool)目前langchain

大家都还记得之前的AutoGPT,HuggingfaceGPT变相调用各种模型实现多模态等探索更广泛性场景。目前langchain通过gradio_tools集成涵盖(文本,语音,视频以及多个场景彼此转换等),本质tools调用hf的api space,例如:

依据图片创造音乐(ImageToMusicTool),


Langchain通过gradio_tools支持集成多模态大模型

图片进行分割(SAMImageSegmentationTool)

Langchain通过gradio_tools支持集成多模态大模型

目前langchain支持以下多模态模型集成:

Langchain通过gradio_tools支持集成多模态大模型

一.下面介绍几个常用场景

1.文本生成图片

使用模型StableDiffusion

#https://huggingface.co/spaces/gradio-client-demos/text-to-image

Langchain通过gradio_tools支持集成多模态大模型

2.语音转文字

使用模型openai-whisper

#https://huggingface.co/spaces/abidlabs/whisper

Langchain通过gradio_tools支持集成多模态大模型

3.文字转语音

使用模型suno/bark

#https://huggingface.co/spaces/suno/bark

Langchain通过gradio_tools支持集成多模态大模型

4.文本转视频

使用达摩院的模型
amo-vilab/modelscope-damo-text-to-video-synthesis

#https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis

Langchain通过gradio_tools支持集成多模态大模型

二.主要代码实现

import os

from gradio_tools.tools import StableDiffusionTool,

WhisperAudioTranscriptionTool,

BarkTextToSpeechTool,

TextToVideoTool


#text-to-image

#写英文提示词在StableDiffusion model上

sd_local_file_path = StableDiffusionTool().langchain.run("Please create a photo of a dog riding a skateboard")

#本质加载hf上的sd space ""

"""

Loaded as API: https://gradio-client-demos-text-to-image.hf.space ✔

Job Status: Status.STARTING eta: None

"""

print("sd_local_file_path:",sd_local_file_path)

from PIL import Image

im = Image.open(sd_local_file_path)

print("文本生成图片地址:",im)

im.save("./data/"+os.path.basename(sd_local_file_path))


#audio-to-texts 语音转文字

stt_text = WhisperAudioTranscriptionTool().langchain.run("audio/68570059060983616_0_15.mp3")

print("语音转文字内容:",stt_text)

"""

Loaded as API: https://abidlabs-whisper.hf.space ✔

"""


#text-to-audio 文本转语音

tts_audio=BarkTextToSpeechTool().langchain.run("我是中国人")

print("文字转语音文件:","./data/"+os.path.basename(tts_audio))

"""

Loaded as API: https://suno-bark.hf.space ✔

Job Status: Status.STARTING eta: None

Due to heavy traffic on this app, the prediction will take approximately 72 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(suno/bark)

Job Status: Status.IN_QUEUE eta: 72.77052729940723

Due to heavy traffic on this app, the prediction will take approximately 48 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(suno/bark)

Job Status: Status.IN_QUEUE eta: 48.52035075205344

Job Status: Status.PROCESSING eta: None

文字转语音文件:
./data/tmpgvejiznvv3ohaiwu.wav

"""


#text-to-viedo 文本转视频仅仅支持英文

ttv_local_file_path=TextToVideoTool().langchain.run("A panda eating bamboo on a rock.")

"""

Loaded as API: https://damo-vilab-modelscope-text-to-video-synthesis.hf.space ✔

"""

print("文本转视频地址:","./data/"+os.path.basename(ttv_local_file_path))

  • 全部评论(0)
手机二维码手机访问领取大礼包
返回顶部