大家都还记得之前的AutoGPT,HuggingfaceGPT变相调用各种模型实现多模态等探索更广泛性场景。目前langchain通过gradio_tools集成涵盖(文本,语音,视频以及多个场景彼此转换等),本质tools调用hf的api space,例如:
依据图片创造音乐(ImageToMusicTool),

图片进行分割(SAMImageSegmentationTool)

目前langchain支持以下多模态模型集成:

一.下面介绍几个常用场景
1.文本生成图片
使用模型StableDiffusion
#https://huggingface.co/spaces/gradio-client-demos/text-to-image

2.语音转文字
使用模型openai-whisper
#https://huggingface.co/spaces/abidlabs/whisper

3.文字转语音
使用模型suno/bark
#https://huggingface.co/spaces/suno/bark

4.文本转视频
使用达摩院的模型
amo-vilab/modelscope-damo-text-to-video-synthesis
#https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis

二.主要代码实现
import os
from gradio_tools.tools import StableDiffusionTool,
WhisperAudioTranscriptionTool,
BarkTextToSpeechTool,
TextToVideoTool
#text-to-image
#写英文提示词在StableDiffusion model上
sd_local_file_path = StableDiffusionTool().langchain.run("Please create a photo of a dog riding a skateboard")
#本质加载hf上的sd space ""
"""
Loaded as API: https://gradio-client-demos-text-to-image.hf.space ✔
Job Status: Status.STARTING eta: None
"""
print("sd_local_file_path:",sd_local_file_path)
from PIL import Image
im = Image.open(sd_local_file_path)
print("文本生成图片地址:",im)
im.save("./data/"+os.path.basename(sd_local_file_path))
#audio-to-texts 语音转文字
stt_text = WhisperAudioTranscriptionTool().langchain.run("audio/68570059060983616_0_15.mp3")
print("语音转文字内容:",stt_text)
"""
Loaded as API: https://abidlabs-whisper.hf.space ✔
"""
#text-to-audio 文本转语音
tts_audio=BarkTextToSpeechTool().langchain.run("我是中国人")
print("文字转语音文件:","./data/"+os.path.basename(tts_audio))
"""
Loaded as API: https://suno-bark.hf.space ✔
Job Status: Status.STARTING eta: None
Due to heavy traffic on this app, the prediction will take approximately 72 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(suno/bark)
Job Status: Status.IN_QUEUE eta: 72.77052729940723
Due to heavy traffic on this app, the prediction will take approximately 48 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(suno/bark)
Job Status: Status.IN_QUEUE eta: 48.52035075205344
Job Status: Status.PROCESSING eta: None
文字转语音文件:
./data/tmpgvejiznvv3ohaiwu.wav
"""
#text-to-viedo 文本转视频仅仅支持英文
ttv_local_file_path=TextToVideoTool().langchain.run("A panda eating bamboo on a rock.")
"""
Loaded as API: https://damo-vilab-modelscope-text-to-video-synthesis.hf.space ✔
"""
print("文本转视频地址:","./data/"+os.path.basename(ttv_local_file_path))
¥20.00
精仿砍柴网博客文章类织梦模板(带wap)修正版
¥19.90
油管教程营销运营实战视频零基础自学教程运营视频变现
¥20.00
仿大图网素材分享网站dede模板
¥1500.00
【完整版】四种语言点赞任务源码 | 中文+英文+泰语+繁体 | 机器人全自动抖音短视频点赞任务源码 | 全自动点赞 | 全新UI微信爱点赞 | 悬赏 | 众人帮 | 爱分享赚钱平台 |
¥480.00
【完整版】2021深蓝版抖音快手点赞任务源码 | 霸屏天下 | 抖音快手火山小红书头条任务点赞 | 威客兼 |任务点赞 |
¥50.00
Discuz整站源码完美高仿163地方门户系统整站源码 分类信息模板商业版