허깅페이스 예제

Notice

Recent Posts

Recent Comments

Link

« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

데이터 과학

허깅페이스 예제 본문

인공지능/자연어 처리

허깅페이스 예제

티에스윤 2025. 10. 26. 20:55

허깅페이스를 활용한 예제를 작성해 보겠습니다.

코랩에서도 가능하니 코랩창을 열어서 실습을 한번 해 보세요.

pip install transformers sentencepiece torch

from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-ko")

text = "How are you today?"
result = translator(text)[0]['translation_text']
print("번역 결과:", result)

간단하게 실습을 해 봅시다.

from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-ko")

while True:
    text = input("번역할 문장을 입력하세요 (종료: exit): ")
    if text.lower() == "exit":
        break
    result = translator(text)[0]['translation_text']
    print("번역 결과:", result)

Helsinki-NLP/opus-mt-en-ko 모델을 사용해서 간단한 문장을 번역해서 사용할 수 있습니다.

모델명 특징

Helsinki-NLP/opus-mt-en-ko	가볍고 빠름, 일반 번역용
facebook/nllb-200-distilled-600M	200개 언어 지원, 多언어에 강함
google/mT5-small	범용 언어 모델, 다목적 번역 가능

모델에 따라 용량과 사용법이 조금 다릅니다.

원하는 모델에 대해 한번 적용해 봅시다.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "Helsinki-NLP/opus-mt-ko-en"

tok = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

sent_list = [

"회의는 내일 오전 9시에 시작합니다.",

"최종 보고서는 금요일까지 올려 주세요.",

"해당 모델은 추가 학습 없이도 꽤 잘 동작합니다."

]

inputs = tok(sent_list, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(

**inputs,

max_new_tokens=128,

num_beams=5, # 빔서치(품질↑, 속도↓)

length_penalty=1.0, # 길이 밸런싱

early_stopping=True

)

translations = tok.batch_decode(outputs, skip_special_tokens=True)

for ko, en in zip(sent_list, translations):

print(f"[KO] {ko}\n[EN] {en}\n")

Helsinki-NLP 모델은 가볍고 빠릅니다. facebook 모델을 사용한 방법도 있습니다.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "facebook/nllb-200-distilled-600M" # 가볍고 성능 준수

tok = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

src, tgt = "kor_Hang", "eng_Latn" # 한국어→영어

text = "연구 일정이 변경되었습니다. 오늘 오후에 회의를 다시 잡겠습니다."

inputs = tok(text, return_tensors="pt")

generated = model.generate(

**inputs,

forced_bos_token_id=tok.convert_tokens_to_ids(tgt), # 목표 언어 지정

max_new_tokens=128,

num_beams=4

)

print(tok.batch_decode(generated, skip_special_tokens=True)[0])

이 예시도 있습니다.

from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

import torch

# 0) 디바이스 자동 감지

device = 0 if torch.cuda.is_available() else -1

# 1) 안전한 파이프라인 로더 (토큰 강제 무시)

def load_pipeline(model_id: str):

# token=None을 넘겨 저장된 잘못된 토큰 사용을 차단

return pipeline("translation", model=model_id, device=device, revision="main", token=None)

# 2) NLLB 폴백(다국어 지원) 구성

NLLB_ID = "facebook/nllb-200-distilled-600M"

NLLB_CODES = {

"en": "eng_Latn",

"ko": "kor_Hang",

}

def nllb_translate(text: str, tgt_lang_code: str) -> str:

tok = AutoTokenizer.from_pretrained(NLLB_ID, token=None)

mdl = AutoModelForSeq2SeqLM.from_pretrained(NLLB_ID, token=None)

dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")

mdl.to(dev)

inputs = tok([text], return_tensors="pt").to(dev)

forced_bos = tok.convert_tokens_to_ids(tgt_lang_code)

out_ids = mdl.generate(**inputs, forced_bos_token_id=forced_bos, max_new_tokens=128, num_beams=4)

return tok.batch_decode(out_ids, skip_special_tokens=True)[0]

# 3) 한→영

try:

ko2en = load_pipeline("Helsinki-NLP/opus-mt-ko-en")

ko_input = "안녕하세요, 오늘 일정 공유 부탁드립니다."

ko2en_out = ko2en(ko_input)[0]["translation_text"]

except Exception as e:

print(f"[WARN] ko→en 기본 모델 실패, NLLB로 폴백합니다. 이유: {e}")

ko2en_out = nllb_translate("안녕하세요, 오늘 일정 공유 부탁드립니다.", NLLB_CODES["en"])

print("번역 (한→영):", ko2en_out)

# 4) 영→한

try:

en2ko = load_pipeline("Helsinki-NLP/opus-mt-en-ko")

en_input = "Could you share today’s schedule, please?"

en2ko_out = en2ko(en_input)[0]["translation_text"]

except Exception as e:

print(f"[WARN] en→ko 기본 모델 실패, NLLB로 폴백합니다. 이유: {e}")

en2ko_out = nllb_translate("Could you share today’s schedule, please?", NLLB_CODES["ko"])

print("번역 (영→한):", en2ko_out)

모델은 허깅페이스 사이트에 계속 올라오고 있습니다.

https://huggingface.co/

Hugging Face – The AI community building the future.

The Home of Machine Learning Create, discover and collaborate on ML better. We provide paid Compute and Enterprise solutions. We are building the foundation of ML tooling with the community.

huggingface.co

----------

pip install streamlit

import streamlit as st
from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-ko")

st.title("AI 번역기")
text = st.text_area("번역할 문장을 입력하세요:")
if st.button("번역"):
result = translator(text)[0]['translation_text']
st.write("번역 결과:", result)

저작자표시 비영리 변경금지 (새창열림)

'인공지능 > 자연어 처리' 카테고리의 다른 글

트랜스포머 정리 (1)	2025.11.10
BERT 개요 (0)	2025.11.10
허깅페이스 (0)	2025.10.26
벨만 방정식 (0)	2025.10.20
Attention Is All You Need (0)	2025.09.22

'인공지능/자연어 처리' Related Articles

데이터 과학

허깅페이스 예제 본문

허깅페이스 예제

'인공지능 > 자연어 처리' 카테고리의 다른 글

티스토리툴바