DeepDiff Tutorial: Comparing Numbers

illustrations illustrations illustrations illustrations illustrations illustrations

DeepDiff Tutorial: Comparing Numbers

Published on Apr 12, 2019 by Sep Dehpour

Table Of Contents

This tutorial is written based on DeepDiff 4.0.6.

One of the features of DeepDiff that comes very handy is comparing nested data structures that include numbers. There are times that we do care about the exact numbers and want it to be reported if anything slightly changed.

from pprint import pprint
from deepdiff import DeepDiff


t1 = {"key": [1.2, 1.5]}
t2 = {"key": [1.20, 1.50]}

>>> pprint(DeepDiff(t1, t2))
{}

Let’s say the numbers gets more precise:

t1 = {"key": [1.21, 1.5]}
t2 = {"key": [1.2100000000001, 1.50]}

>>> pprint(DeepDiff(t1, t2))
{'values_changed': {"root['key'][0]": {'new_value': 1.2100000000001,
                                       'old_value': 1.21}}

significant_digits 

Do we really care about this change? Perhaps we don’t. In that case we have a few options. The first option is to pass significant_digits that we care about. By default the significant_digits sets how many digits after the decimal point to be considered when comparing numbers.

t1 = {"key": [1.21, 1.5]}
t2 = {"key": [1.2100000000001, 1.50]}

>>> pprint(DeepDiff(t1, t2, significant_digits=5))
{}

So if we care only about 5 digits of accuracy after the decimal points, we set the significant_digits=5 like the above example.

What if we care about the difference of numbers in the way that the difference is relative to the size of the number?

For example between 2.0001 and 2.0002 we may care about the difference of 0.001 but the difference between 20000.0001 and 20000.0002 is too small compared to the actual numbers that are being compared.

Is we don’t set the significant_digits, everything will be reported in the results:

t1 = {"key": [2.0001, 20000.0001]}
t2 = {"key": [2.0002, 20000.0002]}

>>> pprint(DeepDiff(t1, t2))
{'values_changed': {"root['key'][0]": {'new_value': 2.0002,
                                       'old_value': 2.0001},
                    "root['key'][1]": {'new_value': 20000.0002,
                                       'old_value': 20000.0001}}}

And if we set the significant_digits=3, both values disappear.

>>> pprint(DeepDiff(t1, t2, significant_digits=3))
{}

That’s where number_format_notation comes to play:

number_format_notation 

To make DeeoDiff to consider diffs based on the ratio of diff to the original numbers, we can set the number_format_notation parameter. The number_format_notation is by default set to “f” meaning fixed point. However setting it to “e” which stands for the exponent notation or scientific notation, gives us what we want:

>>> pprint(DeepDiff(t1, t2, significant_digits=4, number_format_notation="e"))
{'values_changed': {"root['key'][0]": {'new_value': 2.0002,
                                       'old_value': 2.0001}}}

Basically in the above diff we are saying that we care about 4 significant digits in the scientific notation which automatically makes the diff relative to the size of the number.

ignore_numeric_type_changes 

So far so good. What if we have type changes in our numbers? For example you loaded a json file that has floats but the Python object you have includes decimal types.

from decimal import Decimal

t1 = {"key": [Decimal('2.0001')]}
t2 = {"key": [2.0001]}

>>> pprint(DeepDiff(t1, t2))
{'type_changes': {"root['key'][0]": {'new_type': <class 'float'>,
                                     'new_value': 2.0001,
                                     'old_type': <class 'decimal.Decimal'>,
                                     'old_value': Decimal('2.0001')}}}

To solve this problem, DeepDiff provides the ignore_numeric_type_changes parameter:

t1 = {"key": [Decimal('2.0001')]}
t2 = {"key": [2.0001]}

>>> pprint(DeepDiff(t1, t2, ignore_numeric_type_changes=True))
{}

Behind the scene, DeepDiff converts both of the numbers into string representation of them with the accuracy of 12 significant digits by default. You can again overwrite the significant_digits with passing the parameter. Let’s set that to a higher number:

t1 = {"key": [Decimal('2.0001')]}
t2 = {"key": [2.0001]}

>>> pprint(DeepDiff(t1, t2, ignore_numeric_type_changes=True, significant_digits=18))
{'values_changed': {"root['key'][0]": {'new_value': 2.0001,
                                       'old_value': Decimal('2.0001')}}}

In other words, 2.0001 == Decimal('2.0001') when significant_digits=12 (default) but not when we increase the significant_digits to 18.

This is due to floating point arithmetic issues. A good resource to take a look at is located at https://docs.python.org/3/tutorial/floatingpoint.html

To understand what happens, behind the scene DeepDiff converts the numbers into strings whenever the ignore_numeric_type_changes=True. In such casesm by defeault it uses number_format_notation="f" which stands for fixed point notation but again we can use the number_format_notation to change that behaviour.

When you don’t pass the significant_digits, the default value of 12 is used behind the scene:

>>> '{:.12f}'.format(2.0001)
'2.000100000000'
>>> '{:.12f}'.format(Decimal('2.0001'))
'2.000100000000'

But when you use significant_digits=18

>>> '{:.18f}'.format(2.0001)
'2.000100000000000211'
>>> '{:.18f}'.format(Decimal('2.0001'))
'2.000100000000000000'

As you can see the float and decimal won’t match anymore! You can use the significant_digits and number_format_notation to have granular control over how numbers are compared when ignore_numeric_type_changes=True

Just like what we did with the number_format_notation, we can limit the reported diff to be limited to numbers that their diff is big enough compared to their size:

t1 = {"key": [Decimal('2.0001'), Decimal('20000.0001')]}
t2 = {"key": [2.0002, 20000.0002]}

>>> pprint(DeepDiff(t1, t2, ignore_numeric_type_changes=True, significant_digits=4, number_format_notation="e"))
{'values_changed': {"root['key'][0]": {'new_value': 2.0002,
                                       'old_value': Decimal('2.0001')}}}

number_to_string_func 

For the power users who want more granular control over how numbers are compared, you can pass a custom function that converts numbers to strings.

The original function that converts numbers to strings resides in the helper.py module.

Here is its current implementation at the time of writing of this article:

from decimal import Decimal, localcontext

ZERO_DECIMAL_CHARACTERS = set("-0.")

number_formatting = {
    "f": r'{:.%sf}',
    "e": r'{:.%se}',
}


def number_to_string(number, significant_digits, number_format_notation="f"):
    """
    Convert numbers to string considering significant digits.
    """
    try:
        using = number_formatting[number_format_notation]
    except KeyError:
        raise ValueError("number_format_notation got invalid value of {}. The valid values are 'f' and 'e'".format(number_format_notation)) from None
    if isinstance(number, Decimal):
        tup = number.as_tuple()
        with localcontext() as ctx:
            ctx.prec = len(tup.digits) + tup.exponent + significant_digits
            number = number.quantize(Decimal('0.' + '0' * significant_digits))
    result = (using % significant_digits).format(number)
    # Special case for 0: "-0.00" should compare equal to "0.00"
    if set(result) <= ZERO_DECIMAL_CHARACTERS:
        result = "0.00"
    # https://bugs.python.org/issue36622
    if number_format_notation == 'e' and isinstance(number, float):
        result = result.replace('+0', '+')
    return result

All that this function does is to convert the numbers into strings based on the significant_digits and formatting notation (“f” for fixed point or “e” for scientific.)

You can modify this function or its results and pass it to DeepDiff as the number_to_string_func.

For a silly example let’s say you don’t care if numbers below 100 have changed. You only care if numbers above 100 have changed. Then you do:

from deepdiff.helper import number_to_string


def custom_number_to_string(number, *args, **kwargs):
    number = 100 if number < 100 else number
    return number_to_string(number, *args, **kwargs)

t1 = [10, 12, 100000]
t2 = [20, 22, 100000]

ddiff = DeepDiff(t1, t2, significant_digits=3, number_format_notation="e",
                 number_to_string_func=custom_number_to_string)
>>> ddiff
{}

Note: number_to_string_func is only used when either the significant_digits is set or ignore_numeric_type_changes is set.

Learn More! 

This was the first in a series of tutorials I will be writing for DeepDiff. Hope you enjoyed it!

To learn more about DeepDiff please take a look at the documentation or the source code and give it a star on github if you find it useful!

See Also

You AutoComplete Me

You AutoComplete Me

Autocomplete in Python. Get familiar with various data structures in Python, from the built-in deque to creating Trie-tree and Directed Acyclic Word Graph (DAWG) and even fuzzy matching via phonetic algorithms and Levenshtein edit distance.

Read More