# DeepDiff Tutorial: Comparing Numbers      ### DeepDiff Tutorial: Comparing Numbers

Published on Apr 12, 2019 by Sep Dehpour

This tutorial is written based on DeepDiff 4.0.6.

One of the features of DeepDiff that comes very handy is comparing nested data structures that include numbers. There are times that we do care about the exact numbers and want it to be reported if anything slightly changed.

``````from pprint import pprint
from deepdiff import DeepDiff

t1 = {"key": [1.2, 1.5]}
t2 = {"key": [1.20, 1.50]}

>>> pprint(DeepDiff(t1, t2))
{}
``````

Let’s say the numbers gets more precise:

``````t1 = {"key": [1.21, 1.5]}
t2 = {"key": [1.2100000000001, 1.50]}

>>> pprint(DeepDiff(t1, t2))
{'values_changed': {"root['key']": {'new_value': 1.2100000000001,
'old_value': 1.21}}
``````

## significant_digits

Do we really care about this change? Perhaps we don’t. In that case we have a few options. The first option is to pass `significant_digits` that we care about. By default the `significant_digits` sets how many digits after the decimal point to be considered when comparing numbers.

``````t1 = {"key": [1.21, 1.5]}
t2 = {"key": [1.2100000000001, 1.50]}

>>> pprint(DeepDiff(t1, t2, significant_digits=5))
{}
``````

So if we care only about 5 digits of accuracy after the decimal points, we set the `significant_digits=5` like the above example.

What if we care about the difference of numbers in the way that the difference is relative to the size of the number?

For example between `2.0001` and `2.0002` we may care about the difference of `0.001` but the difference between `20000.0001` and `20000.0002` is too small compared to the actual numbers that are being compared.

Is we don’t set the significant_digits, everything will be reported in the results:

``````t1 = {"key": [2.0001, 20000.0001]}
t2 = {"key": [2.0002, 20000.0002]}

>>> pprint(DeepDiff(t1, t2))
{'values_changed': {"root['key']": {'new_value': 2.0002,
'old_value': 2.0001},
"root['key']": {'new_value': 20000.0002,
'old_value': 20000.0001}}}
``````

And if we set the `significant_digits=3`, both values disappear.

``````>>> pprint(DeepDiff(t1, t2, significant_digits=3))
{}
``````

That’s where `number_format_notation` comes to play:

## number_format_notation

To make DeeoDiff to consider diffs based on the ratio of diff to the original numbers, we can set the `number_format_notation` parameter. The `number_format_notation` is by default set to “f” meaning fixed point. However setting it to “e” which stands for the exponent notation or scientific notation, gives us what we want:

``````>>> pprint(DeepDiff(t1, t2, significant_digits=4, number_format_notation="e"))
{'values_changed': {"root['key']": {'new_value': 2.0002,
'old_value': 2.0001}}}
``````

Basically in the above diff we are saying that we care about 4 significant digits in the scientific notation which automatically makes the diff relative to the size of the number.

## ignore_numeric_type_changes

So far so good. What if we have type changes in our numbers? For example you loaded a json file that has floats but the Python object you have includes decimal types.

``````from decimal import Decimal

t1 = {"key": [Decimal('2.0001')]}
t2 = {"key": [2.0001]}

>>> pprint(DeepDiff(t1, t2))
{'type_changes': {"root['key']": {'new_type': <class 'float'>,
'new_value': 2.0001,
'old_type': <class 'decimal.Decimal'>,
'old_value': Decimal('2.0001')}}}
``````

To solve this problem, DeepDiff provides the `ignore_numeric_type_changes` parameter:

``````t1 = {"key": [Decimal('2.0001')]}
t2 = {"key": [2.0001]}

>>> pprint(DeepDiff(t1, t2, ignore_numeric_type_changes=True))
{}
``````

Behind the scene, DeepDiff converts both of the numbers into string representation of them with the accuracy of 12 significant digits by default. You can again overwrite the `significant_digits` with passing the parameter. Let’s set that to a higher number:

``````t1 = {"key": [Decimal('2.0001')]}
t2 = {"key": [2.0001]}

>>> pprint(DeepDiff(t1, t2, ignore_numeric_type_changes=True, significant_digits=18))
{'values_changed': {"root['key']": {'new_value': 2.0001,
'old_value': Decimal('2.0001')}}}
``````

In other words, `2.0001 == Decimal('2.0001')` when `significant_digits=12 (default)` but not when we increase the significant_digits to 18.

This is due to floating point arithmetic issues. A good resource to take a look at is located at https://docs.python.org/3/tutorial/floatingpoint.html

To understand what happens, behind the scene DeepDiff converts the numbers into strings whenever the `ignore_numeric_type_changes=True`. In such casesm by defeault it uses `number_format_notation="f"` which stands for fixed point notation but again we can use the number_format_notation to change that behaviour.

When you don’t pass the significant_digits, the default value of 12 is used behind the scene:

``````>>> '{:.12f}'.format(2.0001)
'2.000100000000'
>>> '{:.12f}'.format(Decimal('2.0001'))
'2.000100000000'
``````

But when you use `significant_digits=18`

``````>>> '{:.18f}'.format(2.0001)
'2.000100000000000211'
>>> '{:.18f}'.format(Decimal('2.0001'))
'2.000100000000000000'
``````

As you can see the float and decimal won’t match anymore! You can use the `significant_digits` and `number_format_notation` to have granular control over how numbers are compared when `ignore_numeric_type_changes=True`

Just like what we did with the number_format_notation, we can limit the reported diff to be limited to numbers that their diff is big enough compared to their size:

``````t1 = {"key": [Decimal('2.0001'), Decimal('20000.0001')]}
t2 = {"key": [2.0002, 20000.0002]}

>>> pprint(DeepDiff(t1, t2, ignore_numeric_type_changes=True, significant_digits=4, number_format_notation="e"))
{'values_changed': {"root['key']": {'new_value': 2.0002,
'old_value': Decimal('2.0001')}}}
``````

## number_to_string_func

For the power users who want more granular control over how numbers are compared, you can pass a custom function that converts numbers to strings.

The original function that converts numbers to strings resides in the helper.py module.

Here is its current implementation at the time of writing of this article:

``````from decimal import Decimal, localcontext

ZERO_DECIMAL_CHARACTERS = set("-0.")

number_formatting = {
"f": r'{:.%sf}',
"e": r'{:.%se}',
}

def number_to_string(number, significant_digits, number_format_notation="f"):
"""
Convert numbers to string considering significant digits.
"""
try:
using = number_formatting[number_format_notation]
except KeyError:
raise ValueError("number_format_notation got invalid value of {}. The valid values are 'f' and 'e'".format(number_format_notation)) from None
if isinstance(number, Decimal):
tup = number.as_tuple()
with localcontext() as ctx:
ctx.prec = len(tup.digits) + tup.exponent + significant_digits
number = number.quantize(Decimal('0.' + '0' * significant_digits))
result = (using % significant_digits).format(number)
# Special case for 0: "-0.00" should compare equal to "0.00"
if set(result) <= ZERO_DECIMAL_CHARACTERS:
result = "0.00"
# https://bugs.python.org/issue36622
if number_format_notation == 'e' and isinstance(number, float):
result = result.replace('+0', '+')
return result
``````

All that this function does is to convert the numbers into strings based on the `significant_digits` and formatting notation (“f” for fixed point or “e” for scientific.)

You can modify this function or its results and pass it to DeepDiff as the `number_to_string_func`.

For a silly example let’s say you don’t care if numbers below 100 have changed. You only care if numbers above 100 have changed. Then you do:

``````from deepdiff.helper import number_to_string

def custom_number_to_string(number, *args, **kwargs):
number = 100 if number < 100 else number
return number_to_string(number, *args, **kwargs)

t1 = [10, 12, 100000]
t2 = [20, 22, 100000]

ddiff = DeepDiff(t1, t2, significant_digits=3, number_format_notation="e",
number_to_string_func=custom_number_to_string)
>>> ddiff
{}
``````

Note: `number_to_string_func` is only used when either the `significant_digits` is set or `ignore_numeric_type_changes` is set. 