Studying publicly available pre-trained language models for gender bias issues

Word embedding is a well-known technique for text analysis in Natural Language Processing (NLP). This involves mapping words to a vector representation, corresponding to a point in a multi-dimensional space. There are many embedding methods available that use different approaches to define these vector representations. As well, they are trained with various corpora, such as Wikipedia, Twitter or Google News texts. It has been shown that certain word embeddings do contain gender bias, where, for example, the word nurse is found to be closer to woman in the vector space than man. IMI MIRA Laura Hattam investigates this further by analysing three embedding methods: BERT, GloVe and Word2Vec, and studying each of these for hidden gender bias using three distinct measures.

Read the full article here.

This work is part of an Innovate UK project with Dr Julian Padget and the company Etiq.