cl tohoku bert base-japanese-whole-word-masking model
🤗 Huggingface cl-tohoku/bert-base-japanese-whole-word-masking
The model cl tohoku bert base-japanese-whole-word-masking is a Natural Language Processing (NLP) Model implemented in Transformer library, generally using the Python programming language.
What is the cl tohoku bert base-japanese-whole-word-masking model?
This is a BERT model pretrained on texts in the Japanese language . It processes input texts with word-level tokenization based on the IPA dictionary, followed by the WordPiece subword tokenization . The model is trained with the whole word masking enabled for the masked language modeling (MLM) objective . The code for the pretraining is available at cl-tohoku/bert-japanese. The training corpus is 2.6GB in size, consisting of approximately 17M sentences . The vocabulary size is 32000 words, 32000 Japanese words, and 32,000 words . The models are distributed under the terms of the Creative Commons Attribution-ShareAlike 3,
Fine-tune cl-tohoku bert-base-japanese-whole-word-masking models
Metatext is a powerful no-code tool for train, tune and integrate custom NLP models
Model usage
You can find cl tohoku bert base-japanese-whole-word-masking model easily in transformers python library. To download and use any of the pretrained models on your given task, you just need to use those a few lines of codes (PyTorch version). Here an example to download using pip (a package installer for Python)
Download and install using pip
$ pip install transformers
Usage in python
# Import generic wrappers
from transformers import AutoModel, AutoTokenizer
# Define the model repo
model_name = "cl-tohoku/bert-base-japanese-whole-word-masking"
# Download pytorch model
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Transform input tokens
inputs = tokenizer("Hello world!", return_tensors="pt")
# Model apply
outputs = model(**inputs)
More info about cl-tohoku bert-base-japanese-whole-word-masking
Classify and extract text 10x better and faster 🦾
Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.