cl-tohoku/bert-base-japanese-whole-word-masking Model - NLP Hub

Classify and extract text 10x better and faster 🦾

cl tohoku bert base-japanese-whole-word-masking model

🤗 Huggingface cl-tohoku/bert-base-japanese-whole-word-masking

The model cl tohoku bert base-japanese-whole-word-masking is a Natural Language Processing (NLP) Model implemented in Transformer library, generally using the Python programming language.

What is the cl tohoku bert base-japanese-whole-word-masking model?

This is a BERT model pretrained on texts in the Japanese language . It processes input texts with word-level tokenization based on the IPA dictionary, followed by the WordPiece subword tokenization . The model is trained with the whole word masking enabled for the masked language modeling (MLM) objective . The code for the pretraining is available at cl-tohoku/bert-japanese. The training corpus is 2.6GB in size, consisting of approximately 17M sentences . The vocabulary size is 32000 words, 32000 Japanese words, and 32,000 words . The models are distributed under the terms of the Creative Commons Attribution-ShareAlike 3,

Fine-tune cl-tohoku bert-base-japanese-whole-word-masking models

Metatext is a powerful no-code tool for train, tune and integrate custom NLP models

➡️ Learn more

Model usage

You can find cl tohoku bert base-japanese-whole-word-masking model easily in transformers python library. To download and use any of the pretrained models on your given task, you just need to use those a few lines of codes (PyTorch version). Here an example to download using pip (a package installer for Python)

Download and install using pip

$ pip install transformers

Usage in python

# Import generic wrappers
from transformers import AutoModel, AutoTokenizer 


# Define the model repo
model_name = "cl-tohoku/bert-base-japanese-whole-word-masking" 


# Download pytorch model
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


# Transform input tokens 
inputs = tokenizer("Hello world!", return_tensors="pt")

# Model apply
outputs = model(**inputs)

More info about cl-tohoku bert-base-japanese-whole-word-masking

See the paper, download and more info

Classify and extract text 10x better and faster 🦾

Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.

Book a demo