[AI 서비스 개발] 오디오 whisper 사용해보기

티스토리 뷰

AI/AI 서비스 개발

[AI 서비스 개발] 오디오 whisper 사용해보기

brave_sol 2024. 12. 29. 16:16

1. 허깅스페이스

- https://huggingface.co/openai/whisper-large-v3-turbo

2. 한국어 음성

# step1: import modeuls
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset

# step2: create inference object
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
# model_id = "openai/whisper-large-v3-turbo"
model_id = "openai/whisper-tiny"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
    return_timestamps=True
)
# pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v3-turbo" )

# step3: 데이터 가져오기
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]

# step4: 추론하기
result = pipe(sample)

# step5: 후처리
print(result["text"])

- tiny는 정확도가 떨어지지만, 영어 학습용으로 사용하려면 일부러 성능 낮은걸 사용해 발음을 똑바로 하도록 유도할 수 있다. 따라서 정확도에 너무 집착하기 보다는 해결해야 하는 문제에 대해 생각하는 것이 중요하다!

'AI > AI 서비스 개발' 카테고리의 다른 글

[AI 서비스 개발] CORS란? (0)	2025.01.01
[AI 서비스 개발] NLP 감정 분석 모델 비교(허깅페이스) (1)	2024.12.30
[AI 서비스 개발] 자연어처리, 허깅페이스 Transformers (4)	2024.12.29
[AI 서비스 개발] 사진 해석 (Image captioning) (3)	2024.12.29
[AI 서비스 개발] 글자 인식 OCR (0)	2024.12.29

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

용감해지고 싶은 쫄보의 도전기

티스토리 뷰

[AI 서비스 개발] 오디오 whisper 사용해보기

'AI > AI 서비스 개발' 카테고리의 다른 글

티스토리툴바