Sparse categorical crossentropy что это

Боковая панель

НАЧАЛО РАБОТЫ

МОДЕЛИ

ПРЕДОБРАБОТКА

ПРИМЕРЫ

Ошибки

Использование функций потерь

Функция потерь (или объективная функция, или функция оценки результатов оптимизации) является одним из двух параметров, необходимых для компиляции модели:

model.compile(loss=’mean_squared_error’, optimizer=’sgd’)
from keras import losses
model.compile(loss=losses.mean_squared_error, optimizer=’sgd’)

Можно либо передать имя существующей функции потерь, либо передать символическую функцию TensorFlow/Theano, которая возвращает скаляр для каждой точки данных и принимает следующие два аргумента:

y_true: истинные метки. Тензор TensorFlow/Theano.

y_pred: Прогнозы. Тензор TensorFlow/Theano той же формы, что и y_true.

Фактически оптимизированная цель — это среднее значение выходного массива по всем точкам данных.

Доступные функции потери

mean_squared_error

keras.losses.mean_squared_error(y_true, y_pred)

mean_absolute_error

keras.losses.mean_absolute_error(y_true, y_pred)

mean_absolute_percentage_error

keras.losses.mean_absolute_percentage_error(y_true, y_pred)

mean_squared_logarithmic_error

keras.losses.mean_squared_logarithmic_error(y_true, y_pred)

squared_hinge

keras.losses.squared_hinge(y_true, y_pred)

hinge

keras.losses.hinge(y_true, y_pred)

categorical_hinge

keras.losses.categorical_hinge(y_true, y_pred)

logcosh

keras.losses.logcosh(y_true, y_pred)

Логарифм гиперболического косинуса ошибки прогнозирования.

log(cosh(x)) приблизительно равен (x ** 2) / 2 для малого x и abs(x) — log(2) для большого x. Это означает, что ‘logcosh’ работает в основном как средняя квадратичная ошибка, но не будет так сильно зависеть от случайного сильно неправильного предсказания.

Аргументы

Тензор с одной записью о скалярной потере на каждый сэмпл.

huber_loss

keras.losses.huber_loss(y_true, y_pred, delta=1.0)

categorical_crossentropy

keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)

sparse_categorical_crossentropy

keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)

binary_crossentropy

keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)

kullback_leibler_divergence

keras.losses.kullback_leibler_divergence(y_true, y_pred)

poisson

keras.losses.poisson(y_true, y_pred)

cosine_proximity

keras.losses.cosine_proximity(y_true, y_pred, axis=-1)

is_categorical_crossentropy

keras.losses.is_categorical_crossentropy(loss)

Примечание: при использовании потери categorical_crossentropy ваши данные должны быть в категориальном формате (например, если у вас 10 классов, то целью для каждой выборки должен быть 10-мерный вектор, который является полностью нулевым, за исключением 1 в индексе, соответствующем классу выборки). Для того, чтобы преобразовать целые данные в категорические, можно использовать утилиту Keras to_categorical:

from keras.utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=None)

При использовании переменной sparse_categorical_crossentropy loss, ваши данные должны быть целыми. Если у вас есть категориальные данные, следует использовать categoryical_crossentropy.

categoryical_crossentropy — это еще один термин для обозначения потери лога по нескольким классам.

Источник

Sparse categorical crossentropy что это

При обучении нейронной сети (НС) выполняется минимизация функции потерь, которая при использовании библиотеки Keras указывается в качестве параметра метода compile класса Model [1], например:

from keras.models import Model
from keras.layers import Input, Dense
a = Input(shape=(32,))
b = Dense(32)(a)
model = Model(inputs = a, outputs = b)
model.compile(loss = keras.losses.mean_squared_error, optimizer = ‘adam’, metrics = [‘accuracy’])

model.compile(loss = ‘mean_squared_error’, optimizer = ‘adam’, metrics = [‘accuracy’])

model.compile(‘adam’, ‘mse’, [‘accuracy’])
model.summary()

В случае НС с несколькими выходами для каждого выхода можно указать «свою» функцию потерь, передав методу compile либо словарь, либо список задействованных функций потерь, например:

from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers.recurrent import LSTM
visible = Input(shape = (32,1))
extract = LSTM(10, return_sequences = True)(visible)
output1 = Dense(1, activation=’sigmoid’)(extract)
output2 = Dense(1, activation=’sigmoid’)(extract)
model = Model(inputs = visible, outputs = [output1, output2])
model.summary()
model.compile(‘adam’, loss = [‘mse’, ‘hinge’], metrics = [‘accuracy’])

Для первого выхода использована функция потерь «Средняя квадратическая ошибка», для второго – «Верхняя граница». В этом случае при обучении минимизируется сумма заданных потерь.
При необходимости может быть задана пользовательская функция, например:

Пользовательская функция потерь принимает параметры тензоры y_true, y_pred – соответственно истинное и предсказанные значения на выходе НС.
Далее рассматриваются функции потерь, определенные в библиотеке Keras.

Функции потерь библиотеки Keras

Перечень функция потерь библиотеки Keras перечислены в табл. 1.

Таблица 1. Функции потерь библиотеки Keras.
В таблице:
yi – прогнозируемое значение;
xi – истинное значение;
n – размер вектора xi (yi).

Функция

Обозначение

Формула

Средняя квадратическая ошибка /
mean squared error

MSE

Средняя абсолютная ошибка /
mean absolute error

MAE

Средняя абсолютная процентная ошибка /
mean absolute percentage error

MAPE

Средняя квадратическая логарифмическая ошибка /
mean squared logarithmic error

MSLE

Квадрат верхней границы /
squared hinge

SH

Верхняя граница / hinge

H

Категориальная верхняя граница /
categorical hinge

CH

max(0, neg – pos + 1), где

Логарифм гиперболического косинуса /
logcosh

LC

Категориальная перекрестная энтропия /
categorical crossentropy

CCE

Разреженная категориальная перекрестная энтропия /
sparse categorical crossentropy

SCCE

(σ(yi) – softmax-функция, или номализованная экспонента)

Бинарная перекрестная энтропия /
binary crossentropy

BCE

Расстояние Кульбака-Лейблера /
kullback leibler divergence

KLD

Пуассон /
Poisson

PSS

Косинусная близость /
cosine proximity

CP

При указании функции потерь можно употреблять следующие псевдонимы:

mse = MSE = mean_squared_error
mae = MAE = mean_absolute_error
mape = MAPE = mean_absolute_percentage_error
msle = MSLE = mean_squared_logarithmic_error
kld = KLD = kullback_leibler_divergence
cosine = cosine_proximity

Программы вычисления функций потерь библиотеки Keras

Приводимые ниже программы (функции) взяты из [2, 3].
Все функции потерь принимают 2D-тензоры с истинным (y_true) и предсказанным сетью значениями (y_pred) и возвращают 1D-тензор, содержащий текущее значение функции потерь.
Все функции потерь, кроме sparse_categorical_crossentropy, работают с категориальным представлением входных данных. Поэтому в программах обучения НС метки классов представляются в виде векторов, длина которых равна числу классов. Например, при обучении НС классификации рукописных цифр метка 5 (соответствует цифре 5 в обучающих и тестовых выборках) представляется в виде следующего вектора:

[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.].

Это обеспечивает следующий код:

import keras
num_classes = 10 # Число классов
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

В приведенном выше коде y_train и y_test – соответственно массивы с метками классов для обучающей и тестовых выборок.
Программы, вычисляющие функции потерь, используют следующие функции: abs, clip, epsilon, l2_normalize, log, max, maximum, mean, softplus, square и sum. Эти функции иллюстрируются следующими примерами.

Функции потерь вычисляются следующими процедурами:

Детализация вычисления категориальной перекрестной энтропии:

Детализация вычисления разреженной перекрестной энтропии:

Детализация вычисления бинарной перекрестной энтропии:

Заключение

score = model.evaluate(X_test, y_test, verbose = 0)
print(‘Потери при тестировании: ‘, score[0])
print(‘Точность при тестировании:’, score[1])

y_pred = model.predict(X_test)
classes = []
for m in y_pred: classes.append(np.argmax(m))
# np.sum(classes == y_test_0) вернет сумму случаев, когда classes[i] = y_test_0[i]
acc = np.sum(classes == y_test_0) / n_test * 100
print(«Точность прогнозирования: » + str(acc) + ‘%’)

В приведенном выше коде

до преобразования y_test в категориальное представление.

Очевидно, что точность оценки score[1] и точность прогнозирования acc должны совпадать.

Источник

Losses

The purpose of loss functions is to compute the quantity that a model should seek to minimize during training.

Available losses

Note that all losses are available both via a class handle and via a function handle. The class handles enable you to pass configuration arguments to the constructor (e.g. loss_fn = CategoricalCrossentropy(from_logits=True) ), and they perform reduction by default when used in a standalone way (see details below).

Probabilistic losses

Regression losses

Hinge losses for «maximum-margin» classification

Usage of losses with compile() & fit()

A loss function is one of the two arguments required for compiling a Keras model:

All built-in loss functions may also be passed via their string identifier:

Loss functions are typically created by instantiating a loss class (e.g. keras.losses.SparseCategoricalCrossentropy ). All losses are also provided as function handles (e.g. keras.losses.sparse_categorical_crossentropy ).

Using classes enables you to pass configuration arguments at instantiation time, e.g.:

Standalone usage of losses

A loss is a callable with arguments loss_fn(y_true, y_pred, sample_weight=None) :

By default, loss functions return one scalar loss value per input sample, e.g.

However, loss class instances feature a reduction constructor argument, which defaults to «sum_over_batch_size» (i.e. average). Allowable values are «sum_over_batch_size», «sum», and «none»:

Note that this is an important difference between loss functions like tf.keras.losses.mean_squared_error and default loss class instances like tf.keras.losses.MeanSquaredError : the function version does not perform reduction, but by default the class instance does.

Here’s how you would use a loss class instance as part of a simple training loop:

Creating custom losses

Any callable with the signature loss_fn(y_true, y_pred) that returns an array of losses (one of sample in the input batch) can be passed to compile() as a loss. Note that sample weighting is automatically supported for any such loss.

Here’s a simple example:

The add_loss() API

Loss functions applied to the output of a model aren’t the only way to create losses.

When writing the call method of a custom layer or a subclassed model, you may want to compute scalar quantities that you want to minimize during training (e.g. regularization losses). You can use the add_loss() layer method to keep track of such loss terms.

Here’s an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:

Источник

How to use sparse categorical crossentropy with TensorFlow 2 and Keras?

Last Updated on 30 March 2021

For multiclass classification problems, many online tutorials – and even François Chollet’s book Deep Learning with Python, which I think is one of the most intuitive books on deep learning with Keras – use categorical crossentropy for computing the loss value of your neural network.

However, traditional categorical crossentropy requires that your data is one-hot encoded and hence converted into categorical format. Often, this is not what your dataset looks like when you’ll start creating your models. Rather, you likely have feature vectors with integer targets – such as 0 to 9 for the numbers 0 to 9.

But did you know that there exists another type of loss – sparse categorical crossentropy – with which you can leave the integers as they are, yet benefit from crossentropy loss? I didn’t when I just started with Keras, simply because pretty much every article I read performs one-hot encoding before applying regular categorical crossentropy loss.

In this blog, we’ll figure out how to build a convolutional neural network with sparse categorical crossentropy loss.

We’ll create an actual CNN with Keras. It’ll be a simple one – an extension of a CNN that we created before, with the MNIST dataset. However, doing that allows us to compare the model in terms of its performance – to actually see whether sparse categorical crossentropy does as good a job as the regular one.

After reading this tutorial, you will…

Note that model code is also available on GitHub.

Update 28/Jan/2021: Added summary and code example to get started straight away. Performed textual improvements, changed header information and slight addition to title of the tutorial.

Update 17/Nov/2020: Made the code examples compatible with TensorFlow 2

Update 01/Feb/2020: Fixed an error in full model code.

Table of contents

Summary and code example: tf.keras.losses.sparse_categorical_crossentropy

Training a neural network involves passing data forward, through the model, and comparing predictions with ground truth labels. This comparison is done by a loss function. In multiclass classification problems, categorical crossentropy loss is the loss function of choice. However, it requires that your labels are one-hot encoded, which is not always the case.

In that case, sparse categorical crossentropy loss can be a good choice. This loss function performs the same type of loss – categorical crossentropy loss – but works on integer targets instead of one-hot encoded ones. Saves you that to_categorical step which is common with TensorFlow/Keras models!

Let’s pause for a second! 👩‍💻

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve’s free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!

Sparse categorical crossentropy vs normal categorical crossentropy

Have you also seen lines of code like these in your Keras projects?

Most likely, you have – because many blogs explaining how to create multiclass classifiers with Keras apply categorical crossentropy, which requires you to one-hot encode your target vectors.

Now you may wonder: what is one-hot encoding?

One-hot encoding

Suppose that you have a classification problem where you have four target classes: < 0, 1, 2, 3 >.

However, as we saw in another blog on categorical crossentropy, its mathematical structure doesn’t allow us to feed it integers directly.

We’ll have to convert it into categorical format first – with one-hot encoding, or to_categorical in Keras.

You’ll effectively transform your targets into this:

Note that when you have more classes, the trick goes on and on – you simply create n-dimensional vectors, where n equals the unique number of classes in your dataset.

Categorical crossentropy

When converted into categorical data, you can apply categorical crossentropy:

Sparse categorical crossentropy что это. Смотреть фото Sparse categorical crossentropy что это. Смотреть картинку Sparse categorical crossentropy что это. Картинка про Sparse categorical crossentropy что это. Фото Sparse categorical crossentropy что это

Don’t worry – it’s a human pitfall to always think defensively when we see maths.

It’s not so difficult at all, to be frank, so make sure to read on!

What you see is obviously the categorical crossentropy formula. What it does is actually really simple: it iterates over all the possible classes C predicted by the ML during the forward pass of your machine learning training process.

For each class, it takes a look at the target observation of the class – i.e., whether the actual class matching the prediction in your training set is 0 or one. Additionally, it computes the (natural) logarithm of the prediction of the observation (the odds that it belongs to that class). From this, it follows that only one such value is relevant – the actual target. For this, it simply computes the natural log value which increases significantly when it is further away from 1:

Sparse categorical crossentropy что это. Смотреть фото Sparse categorical crossentropy что это. Смотреть картинку Sparse categorical crossentropy что это. Картинка про Sparse categorical crossentropy что это. Фото Sparse categorical crossentropy что это

Sparse categorical crossentropy

Never miss new Machine Learning articles ✅

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve’s free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!

However, when you have integer targets instead of categorical vectors as targets, you can use sparse categorical crossentropy. It’s an integer-based version of the categorical crossentropy loss function, which means that we don’t have to convert the targets into categorical format anymore.

Creating a CNN with TensorFlow 2 and Keras

Let’s now create a CNN with Keras that uses sparse categorical crossentropy. In some folder, create a file called model.py and open it in some code editor.

Today’s dataset: MNIST

As usual, like in our previous blog on creating a (regular) CNN with Keras, we use the MNIST dataset. This dataset, which contains thousands of 28×28 pixel handwritten digits (individual numbers from 0-9), is one of the standard datasets in machine learning training programs because it’s a very easy and normalized one. The images are also relatively small and high in quantity, which benefits the predictive and generalization power of your model when trained properly. This way, one can really focus on the machine learning aspects of an exercise, rather than the data related issues.

Software dependencies

If we wish to run the sparse categorical crossentropy Keras CNN, it’s necessary to install a few software tools:

Preferably, you run your model in an Anaconda environment. This way, you will be able to install your packages in a unique environment with which other packages do not interfere. Mingling Python packages is often a tedious job, which often leads to trouble. Anaconda resolves this by allowing you to use environments or isolated sandboxes in which your code can run. Really recommended!

Our model

This will be our model for today:

Let’s break creating the model apart.

Adding imports

First, we add our imports – packages and functions that we’ll need for our model to work as intended.

More specifically, we…

Model configuration

Next up, model configuration:

We specify image width and image height, which are 28 for both given the images in the MNIST dataset. We specify a batch size of 250, which means that during training 250 images at once will be processed. When all images are processed, one completes an epoch, of which we will have 25 in total during the training of our model. Additionally, we specify the number of classes in advance – 10, the numbers 0 to 9. 20% of our training set will be set apart for validating the model after every batch, and for educational purposes we set model verbosity to True (1) – which means that all possible output is actually displayed on screen.

Preparing MNIST data

Next, we load and prepare the MNIST data:

What we do is simple – we use mnist.load_data() to load the MNIST data into four Python variables, representing inputs and targets for both the training and testing datasets.

Additionally, we reshape the data so that TensorFlow will accept it.

Join hundreds of other learners! 😎

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve’s free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!

Additional preparations

Additionally, we perform some other preparations which concern the data instead of how it is handled by your system:

We first parse the numbers as floats. This benefits the optimization step of the training process.

Additionally, we normalize the data, which benefits the training process as well.

Model architecture

We then create the architecture of our model:

To be frank: the architecture of our model doesn’t really matter for showing that sparse categorical crossentropy really works. In fact, you can use the architecture you think is best for your machine learning problem. However, we put up the architecture above because it is very generic and hence works well in many simple classification scenarios:

Model compilation: hyperparameter tuning

We next compile the model, which involves configuring it by means of hyperparameter tuning:

We specify the loss function used – sparse categorical crossentropy! We use it together with the Adam optimizer, which is one of the standard ones used today in very generic scenarios, and use accuracy as an additional metric, since it is more intuitive to humans.

Training and evaluation

Next, we fit the data following the specification created in the model configuration step and specify evaluation metrics that test the trained model with the testing data:

Model performance

You should then see something like this:

25 epochs as configured, with impressive scores in both the validation and testing phases. It pretty much works as well as the classifier created with categorical crossentropy – and I actually think the difference can be attributed to the relative randomness of the model optimization process:

Recap

Well, today, we’ve seen how to create a Convolutional Neural Network (and by consequence, any model) with sparse categorical crossentropy in Keras. If you have integer targets in your dataset, which happens in many cases, you usually perform to_categorical in order to use multiclass crossentropy loss. With sparse categorical crossentropy, this is no longer necessary. This blog demonstrated this by means of an example Keras implementation of a CNN that classifies the MNIST dataset.

Model code is also available on GitHub, if it benefits you.

I hope this blog helped you – if it did, or if you have any questions, let me know in the comments section! 👇 I’m happy to answer any questions you may have 😊 Thanks and enjoy coding!

We help you with Machine Learning! 🧠

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve’s free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!

References

Chollet, F. (2017). Deep Learning with Python. New York, NY: Manning Publications.

Источник

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *