Skip to content

NLP

Named Entity Recognition with Huggingface Trainer

In a previous post we looked at how we can utilise Huggingface together with PyTorch in order to create a NER tagging classifier. We did this by loading a preset encoder model & defined our own tail end model for our NER classification task. This required us to utilise Torch`, ie create more lower end code, which isn't the most beginner friendly, especially if you don't know Torch. In this post, we'll look at utilising only Huggingface, which simplifies the training & inference steps quite a lot. We'll be using the trainer & pipeline methods of the Huggingface library and will use a dataset used in mllibs, which includes tags for different words that can be identified as keywords to finding data source tokens, plot parameter tokens and function input parameter tokens.

Named Entity Recognition for Sentence Splitting

In the last post, we talked about how to use NER for tagging named entities using transformers. In this sections, we'll try something a little more simpler, utilising traditional encoding & ML methods. One advantage of using such models is the cost of training. We'll also look at a less common example for NER tagging, which I've implemented in my project mllibs

Named Entity Recognition with Torch Loop

In this notebook, we'll take a look at how we can utilise HuggingFace to easily load and use BERT for token classification. Whilst we are loading both the base model & tokeniser from HuggingFace, we'll be using a custom Torch training loop and tail model customisation. The approach isn't the most straightforward but it is one way we can do it. We'll be utilising Massive dataset by Amazon and fine-tune the transformer encoder BERT