Property-based Testing¶

With examples in Python and JS¶

aka Generative Testing¶

aka "I'm too lazy to come up with test cases"¶

What is property-based/generative testing?¶

property-based: check that statements or invariants are true for an example
generative: automatically generate examples

Two separate ideas, but usually practiced together.

Instead of checking for equality...¶

In [3]:

def test_sort_on_example():
    assert sorted([3, 5, 18, 2, -4]) == [-4, 2, 3, 5, 18]

Describe a property that should be true.¶

In [4]:

from hypothesis import given
from hypothesis.strategies import lists, integers

@given(lists(integers()))
def test_sort_produces_correct_order(a_list):
    sorted_lst = sorted(a_list)
    for ix in range(len(sorted_lst) - 1):
        assert sorted_lst[ix] <= sorted_lst[ix + 1]

How to do property-based/generative testing¶

Generate an "arbitrary example" of the input for the function under test.
- lists(integers()) is a generic example of a list of integers.
- Testing libraries have built-ins for primitive types, but we'll see later how to make more complicated structures.

Come up with "properties" of the function under test.
- Not like 'obj.attribute'
- More like "The output of sorted is a list of the same length as its input"

Use your "arbitary example" generator to run through tons of examples. Try and find a case where the property fails.

Why do I need a framework?¶

What's wrong with just import random or Math.random()?

A (good) generative-testing framework provides:¶

'shrink'-ability of failure cases.
- [-389294971153641476861717385245149368395L, -389294971153641476861717385244970907453L, 988, -88, 1665, 1649, -389294971153641476861717385245047265893L, ..., -938, -389294971153641476861717385244722191400L, -389294971153641476861717385245315066655L, 751, -389294971153641476861717385244787482770L]
- becomes [0, -1]

specifically testing edge-cases
- eg, 0, +/-Inf, [], ''

database of failed cases (reproducibility)
- not yet in testcheck.js

When is generative testing appropriate?¶

When we can come up with properties¶

roundtrip computations
"fixed-point" computations
compare to a reference implementation
a clear relationship between input and output
start with an example and generalize it

How to come up with properties?¶

Roundtrip examples -- salesforce ids¶

Salesforce ids come in two forms:

case-sensitive 15-char
case-insensitive 18-char

Convert from 15 to 18:¶

Split the fifteen characters into three groups of five.
For each group of five:
Make a length-5 bitstring (eg 01001) encoding whether the letter at each position is capitalized
Use a lookup table to map that bitstring to a letter. Tack the letter onto the end.
After doing this for each group of five, the last three positions of the id encode the capitalization of the rest of the string.

Convert from 18 to 15: do the reverse¶

In [5]:

def sfid_18_to_15(sf_id):
    sf_id = list(sf_id)
    for i in range(3):
        bitmask = 'abcdefghijklmnopqrstuvwxyz012345'.index(sf_id[15 + i])
        for j in range(5):
            if bitmask & (1 << j):
                sf_id[i * 5 + j] = sf_id[i * 5 + j].upper()
            else:
                sf_id[i * 5 + j] = sf_id[i * 5 + j].lower()
    return ''.join(sf_id[:15])


def sfid_15_to_18(sf_id):
    suffix = []
    for i in range(3):
        flags = 0
        for j in range(5):
            c = sf_id[i * 5 + j]
            if 'A' <= c <= 'Z':
                flags += 1 << j
        suffix.append('abcdefghijklmnopqrstuvwxyz012345'[flags])
    return sf_id + ''.join(suffix)

Can you spot the error?

What can we test?¶

Passing through both conversions, an 18-char id comes out the same

In [6]:

@given(sampled_from(SF_18_CHAR_IDS))
def test_18_to_15_to_18_roundtrip(sf_18_char):
    assert sf_18_char == sfid_15_to_18(sfid_18_to_15(sf_18_char))

In [7]:

test_18_to_15_to_18_roundtrip()

Passing through both conversions, a 15-char id comes out the same

In [8]:

@given(sampled_from(SF_15_CHAR_IDS))
def test_15_to_18_to_15_roundtrip(sf_15_char):
    assert sf_15_char == sfid_18_to_15(sfid_15_to_18(sf_15_char))

In [9]:

test_15_to_18_to_15_roundtrip()

no problems so far, maybe we're good?

An 18-char id should be case-insensitive

In [10]:

@given(sampled_from(SF_18_CHAR_IDS))
def test_18_char_is_case_insensitive(sf_18_char):
    assert all_same(
        sfid_18_to_15(sf_18_char),
        sfid_18_to_15(sf_18_char.lower()),
        sfid_18_to_15(sf_18_char.upper()),
        sfid_18_to_15(sf_18_char.swapcase()))

In [11]:

test_18_char_is_case_insensitive()

Falsifying example: test_18_char_is_case_insensitive(sf_18_char='00600000002nxMDaay')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-b926b06fa861> in <module>()
----> 1 test_18_char_is_case_insensitive()

<ipython-input-10-7fb62f71f9bd> in test_18_char_is_case_insensitive()
      1 @given(sampled_from(SF_18_CHAR_IDS))
----> 2 def test_18_char_is_case_insensitive(sf_18_char):
      3     assert all_same(
      4         sfid_18_to_15(sf_18_char),
      5         sfid_18_to_15(sf_18_char.lower()),

~/.virtualenvs/property-based-testing-talk/lib/python3.5/site-packages/hypothesis/core.py in wrapped_test(*arguments, **kwargs)
    719             state = StateForActualGivenExecution(
    720                 test_runner, search_strategy, test, settings, random)
--> 721             state.run()
    722 
    723         for attr in dir(test):

~/.virtualenvs/property-based-testing-talk/lib/python3.5/site-packages/hypothesis/core.py in run(self)
    612                     reify_and_execute(
    613                         self.search_strategy, self.test,
--> 614                         print_example=True, is_final=True
    615                     ))
    616         except (UnsatisfiedAssumption, StopTest):

~/.virtualenvs/property-based-testing-talk/lib/python3.5/site-packages/hypothesis/executors.py in default_new_style_executor(data, function)
     56 
     57 def default_new_style_executor(data, function):
---> 58     return function(data)
     59 
     60 

~/.virtualenvs/property-based-testing-talk/lib/python3.5/site-packages/hypothesis/core.py in run(data)
    113                     lambda: 'Trying example: %s(%s)' % (
    114                         test.__name__, arg_string(test, args, kwargs)))
--> 115             return test(*args, **kwargs)
    116     return run
    117 

<ipython-input-10-7fb62f71f9bd> in test_18_char_is_case_insensitive(sf_18_char)
      4         sfid_18_to_15(sf_18_char),
      5         sfid_18_to_15(sf_18_char.lower()),
----> 6         sfid_18_to_15(sf_18_char.upper()),
      7         sfid_18_to_15(sf_18_char.swapcase()))

<ipython-input-5-7ba554829509> in sfid_18_to_15(sf_id)
      2     sf_id = list(sf_id)
      3     for i in range(3):
----> 4         bitmask = 'abcdefghijklmnopqrstuvwxyz012345'.index(sf_id[15 + i])
      5         for j in range(5):
      6             if bitmask & (1 << j):

ValueError: substring not found

In [12]:

# Avoiding tracebacks in the slides using this helper function, 
print_failure_case(test_18_char_is_case_insensitive)

Falsifying example: test_18_char_is_case_insensitive(sf_18_char='00600000002nxMDaay')

In [13]:

def sfid_18_to_15(sf_id):
    sf_id = list(sf_id)
    for i in range(3):
        bitmask = 'abcdefghijklmnopqrstuvwxyz012345'.index(sf_id[15 + i].lower())
         #                                                   added .lower() ^
        for j in range(5):
            if bitmask & (1 << j):
                sf_id[i * 5 + j] = sf_id[i * 5 + j].upper()
            else:
                sf_id[i * 5 + j] = sf_id[i * 5 + j].lower()
    return ''.join(sf_id[:15])

In [14]:

test_18_char_is_case_insensitive() # now it passes!

How to come up with properties?¶

Roundtrip examples -- normalize/denormalize¶

In NREL/openstudio-geometry-editor, we store a geometry object like:

{
  id: 1,
  vertices: [
    {id: 'a', x: 0, y: 16}, {id: 'b', x: 4, y: 16},
    {id: 'c', x: 0, y: 10}, {id: 'd', x: 4, y: 10},
  ],
  edges: [
    { id: 'ab', v1: 'a', v2: 'b' }, { id: 'bd', v1: 'b', v2: 'd' },
    { id: 'cd', v1: 'c', v2: 'd' }, { id: 'ac', v1: 'a', v2: 'c' },
  ],
  faces: [
    {id: 'top', edgeRefs: [
      {edge_id: 'ab', reverse: false}, {edge_id: 'bd', reverse: false},
      {edge_id: 'cd', reverse: true}, {edge_id: 'ac', reverse: true},
    ]},
  ],
}

How to come up with properties example: normalize/denormalize¶

Often, a denormalized structure is easier to work with:

{
  id: 1,
  vertices: [
    {id: 'a', x: 0, y: 16}, {id: 'b', x: 4, y: 16},
    {id: 'c', x: 0, y: 10}, {id: 'd', x: 4, y: 10},
  ],
  edges: [
    { id: 'ab', v1: {id: 'a', x: 0, y: 16}, v2: {id: 'b', x: 4, y: 16} },
    { id: 'bd', v1: {id: 'b', x: 4, y: 16}, v2: {id: 'd', x: 4, y: 10} },
    { id: 'cd', v1: {id: 'c', x: 0, y: 10}, v2: {id: 'd', x: 4, y: 10} },
    { id: 'ac', v1: {id: 'a', x: 0, y: 16}, v2: {id: 'c', x: 0, y: 10} },
  ],
  faces: [
    {id: 'top', edgeRefs: [
      { edge: { id: 'ab', v1: {id: 'a', x: 0, y: 16}, v2: {id: 'b', x: 4, y: 16} },
        reverse: false },
      { edge: { id: 'bd', v1: {id: 'b', x: 4, y: 16}, v2: {id: 'd', x: 4, y: 10} },
        reverse: false },
      { edge: { id: 'cd', v1: {id: 'c', x: 0, y: 10}, v2: {id: 'd', x: 4, y: 10} },
        reverse: true },
      { edge: { id: 'ac', v1: {id: 'a', x: 0, y: 16}, v2: {id: 'c', x: 0, y: 10} },
        reverse: true },
    ]},
  ],
}

How to come up with properties example: normalize/denormalize¶

import { gen, check, property } from 'testcheck';

describe('denormalize', () => {
  it('is undone by normalize', () => {
    const result = check(property(
      gen.oneOf(_.values(geometryExamples)),
      (geom) => {
        assertEqual(
          geom,
          normalize(denormalize(geom)),
        );
      },
    ));
    assert(resp.result, 'Property failed to verify!');
  });
});

Aside: using testcheck.js¶

check(property(...)) won't throw an error on it's own. I wrote this helper function:

function assertProperty(...args) {
  // if last parameter is not a function, assume it's config for call to check()
  const checkConfig = _.isFunction(args[args.length - 1]) ? {} : args.pop();

  let resp = check(property(...args), checkConfig);
  if (resp.result instanceof Error) {
    resp = resp.shrunk || resp;
    console.error(`${resp.result}`);
    console.error(
      `Property failed to verify on input: ${JSON.stringify(resp.smallest || resp.fail)}`);
    throw resp.result;
  }
  assert(resp.result, `Property failed to verify! ${JSON.stringify(resp)}`);
}

How to come up with Properties?¶

"Fixed-point" computations (aka 'idempotent')¶

When applying a function multiple times should give the same result¶

Examples:

In [15]:

# sorting
@given(lists(integers()))
def test_sort_is_fixed_point(a_list):
    assert sorted(a_list) == sorted(sorted(a_list))

In [16]:

# set operations
@given(sets(integers()), sets(integers()))
def test_set_ops_are_idempotent(set1, set2):
    assert set1 & set2 == (set1 & set2) & set2
    assert set1 | set2 == (set1 | set2) | set2
    assert set1 - set2 == (set1 - set2) - set2

In [17]:

# first_day_of_quarter (a problem I had at my last job)
@given(datetimes())
def test_first_day_of_quarter_is_fixed_point(dt):
    assert (first_day_of_quarter(dt) ==
            first_day_of_quarter(first_day_of_quarter(dt)))

How to come up with Properties?¶

Compare to reference function¶

why not just use the reference function?¶

maybe yours is faster

maybe the reference is in another language?

In [18]:

@given(datetimes(), integers(1, 12))
def test_pg_period_string(date, base_month):
    py_string = period_string(date, 'quarter', base_month)
    db_string = fetchone('SELECT quarter_str(%s, %s)', (base_month, date))[0]
    assert py_string == db_string

binary search is famously hard to get right, so maybe compare with a linear search?

How to come up with Properties?¶

Describe a relationship between the inputs and the outputs¶

In [19]:

@given(lists(integers()), lists(integers()))
def test_list_length_is_preserved_on_addition(a, b):
    assert len(a + b) == len(a) + len(b)

How to come up with Properties?¶

Start with an example and generalize it¶

assertEqual(prime_factors(4), [2, 2])

Why not test that prime_factors(some_prime * some_prime) is [some_prime, some_prime]

assertProperty(
  gen.oneOf(first10kPrimes),
  prime => assertEqual(primeFactors(prime * prime), [prime, prime]),
);

An Example: merge sort¶

In [20]:

def merge_sort(lst):
    if len(lst) <= 1:
        return lst
    
    middle = len(lst) // 2
    left = merge_sort(lst[:middle])
    right = merge_sort(lst[middle:])
    
    return merge(left, right)

def merge(left, right):
    if not left or not right:
        # base case: one of the arrays is empty
        return left or right

    result = []
    ix, jx = 0, 0
    while ix < len(left) and jx < len(right):
        if left[ix] <= right[jx]:
            result.append(left[ix])
            ix += 1
        else:
            result.append(right[jx])
            jx += 1
    return result

An example: merge sort¶

Let's define some tests¶

In [21]:

@given(lists(integers()))
def test_sort_produces_correct_order(a_list):
    sorted_lst = merge_sort(a_list)
    for ix in range(len(sorted_lst) - 1):
        assert sorted_lst[ix] <= sorted_lst[ix + 1]

@given(lists(integers()))
def test_sort_is_fixed_point(a_list):
    sorted_lst = merge_sort(a_list)
    assert sorted_lst == merge_sort(sorted_lst)

In [22]:

test_sort_produces_correct_order()
test_sort_is_fixed_point()

In [23]:

@given(lists(integers()))
def test_sort_maintains_length(a_list):
    sorted_lst = merge_sort(a_list)
    assert len(sorted_lst) == len(a_list)

In [24]:

@given(lists(integers()))
def test_sorted_list_has_same_elements(a_list):
    sorted_lst = merge_sort(a_list)
    # A Counter is a map from element of the list to the number of times
    # it appears.
    assert Counter(sorted_lst) == Counter(a_list)

In [25]:

@given(lists(integers()))
def test_sort_against_reference(a_list):
    assert merge_sort(a_list) == sorted(a_list)

an example: merge sort¶

Unfortunately, these last three all fail.

In [26]:

print_failure_case(test_sort_maintains_length)
print_failure_case(test_sorted_list_has_same_elements)
print_failure_case(test_sort_against_reference)

Falsifying example: test_sort_maintains_length(a_list=[0, 0])
Falsifying example: test_sorted_list_has_same_elements(a_list=[0, 0])
Falsifying example: test_sort_against_reference(a_list=[0, 0])

In [27]:

merge_sort([0, 0])

Out[27]:

[0]

an example: merge sort¶

Can you spot the error?¶

In [28]:

def merge(left, right):
    if not left or not right:
        # base case: one of the arrays is empty
        return left or right

    result = []
    ix, jx = 0, 0
    while ix < len(left) and jx < len(right):
        if left[ix] <= right[jx]:
            result.append(left[ix])
            ix += 1
        else:
            result.append(right[jx])
            jx += 1
    return result

We're never copying remaining elements from left[ix:] or right[jx:].

an example: merge sort¶

In [29]:

def merge(left, right):
    if not left or not right:
        # base case: one of the arrays is empty
        return left or right

    result = []
    ix, jx = 0, 0
    while ix < len(left) and jx < len(right):
        if left[ix] <= right[jx]:
            result.append(left[ix])
            ix += 1
        else:
            result.append(right[jx])
            jx += 1
    
    # Adding this:
    result.extend(left[ix:])
    result.extend(right[jx:])
    
    return result

an example: merge sort¶

Now our tests pass...

In [30]:

test_sort_produces_correct_order()
test_sort_is_fixed_point()
test_sort_maintains_length()
test_sorted_list_has_same_elements()
test_sort_against_reference()

How to write a generator/strategy¶

Besides the built in generators provided by hypothesis and testcheck.js, it's useful to create your own custom generators to make examples of your domain objects.

hypothesis provides builds, that takes a function and some strategies in *args, **kwargs:

# example from bgschiller/hangman-api
def new_puzzle(word=None):
    word = word or random.choice(WORDS)
    return {
        'word_so_far': list('_' * len(word)),
        'actual_word': word.upper(),
        'guesses': [],
    }
puzzle_strategy = builds(new_puzzle, word=sampled_from(WORDS))

puzzle_strategy.example()
#{'actual_word': 'TRACE',
# 'guesses': [],
# 'word_so_far': ['_', '_', '_', '_', '_']}

const genVertices = gen.object({
    startX: gen.numberWithin(-10000, 10000),
    startY: gen.numberWithin(-10000, 10000),
    endX: gen.numberWithin(-10000, 10000),
    endY: gen.numberWithin(-10000, 10000),
    pointsAlong: gen.array(gen.numberWithin(0, 1), { minSize: 0, maxSize: 10 }),
  })

.suchThat(({ startX, startY, endX, endY }) => startX !== endX || startY !== endY)

.then(({ startX, startY, endX, endY, pointsAlong }) => {
    const toPointOnLine = t => ({
      x: startX + (t * (endX - startX)),
      y: startY + (t * (endY - startY)),
    });

    const arr = [
      /* start */{ x: startX, y: startY, identity: 'start' },
      /* end */{ x: endX, y: endY, identity: 'end' },
      /* verts */
      ...pointsAlong.map(toPointOnLine),
    ];
    if (pointsAlong.length) {
      /* and a duplicate out of order for good measure */
      arr.push(toPointOnLine(pointsAlong[0]));
    }
    return arr;
  });

Generator example: polygons¶

(I use this one in NREL/openstudio-geometry-editor). A caveats:

not completely 'expressive'
pretty good for testing
(better than the 3-5 examples I would have patience to write on my own)

generator example: polygons¶

A recipe for a polygon¶

Choose a point to be the center.
Choose a number of vertices, $n$.
For each of $n$ equal directions (eg, when $n=3$: $0^{\circ}$, $120^{\circ}$, and $240^{\circ}$)
- Choose a distance, $d$ and plant a vertex that far away from the center in the direction chosen.

generator example: polygons

const createIrregularPolygon = ({ center, radii }) => {
  const angleStep = (2 * Math.PI) / radii.length;

  return radii
    .map((radius, n) => ({
      x: center.x + (radius * Math.sin(n * angleStep)),
      y: center.y + (radius * Math.cos(n * angleStep)),
    }));
};

const genIrregularPolygon = gen.object({
  center: gen.object({ x: gen.int, y: gen.int }),
  radii: gen.array(gen.intWithin(5, 100), { minSize: 3, maxSize: 20 }),
})
.then(createIrregularPolygon);

Throwing away examples¶

Hypothesis has assume
testcheck has suchThat

In [33]:

# an example from the hypothesis docs
@given(floats())
def test_negation_is_self_inverse(x):
    assert x == -(-x)

print_failure_case(test_negation_is_self_inverse)

Falsifying example: test_negation_is_self_inverse(x=nan)

In [34]:

from hypothesis import assume
from math import isnan

# an example from the hypothesis docs
@given(floats())
def test_negation_is_self_inverse(x):
    assume(not isnan(x))
    assert x == -(-x)

print_failure_case(test_negation_is_self_inverse)

Throwing away examples¶

testcheck has .suchThat()

const genPoint = gen.object({ x: gen.int, y: gen.int });

const genTriangle = gen.array(genPoint, { size: 3 })
  .suchThat(pts => !ptsAreCollinear(...pts));

Throwing away examples¶

Be careful using `assume` and `suchThat`¶

There is a performance cost
If the requirement is too hard (improbable) to satisfy, an exception is thrown
Often, it's possible to make your generator more specific instead:

const genEvenNumber = gen.int.suchThat(n => n % 2 === 0);

const genEvenNumber = gen.int.then(n => n * 2);

Property-based Testing¶

With examples in Python and JS¶

aka Generative Testing¶

aka "I'm too lazy to come up with test cases"¶

What is property-based/generative testing?¶

Instead of checking for equality...¶

Describe a property that should be true.¶

How to do property-based/generative testing¶

Why do I need a framework?¶

A (good) generative-testing framework provides:¶

When is generative testing appropriate?¶

When we can come up with properties¶

How to come up with properties?¶

Roundtrip examples -- salesforce ids¶

Convert from 15 to 18:¶

Convert from 18 to 15: do the reverse¶

What can we test?¶

How to come up with properties?¶

Roundtrip examples -- normalize/denormalize¶

How to come up with properties example: normalize/denormalize¶

How to come up with properties example: normalize/denormalize¶

Aside: using testcheck.js¶

How to come up with Properties?¶

"Fixed-point" computations (aka 'idempotent')¶

When applying a function multiple times should give the same result¶

How to come up with Properties?¶

Compare to reference function¶

why not just use the reference function?¶

How to come up with Properties?¶

Describe a relationship between the inputs and the outputs¶

How to come up with Properties?¶

Start with an example and generalize it¶

An Example: merge sort¶

An example: merge sort¶

Let's define some tests¶

an example: merge sort¶

an example: merge sort¶

Can you spot the error?¶

an example: merge sort¶

an example: merge sort¶

How to write a generator/strategy¶

Generator example: polygons¶

generator example: polygons¶

A recipe for a polygon¶

Throwing away examples¶

Throwing away examples¶

Throwing away examples¶

Be careful using assume and suchThat¶

That's it!¶

Sources¶

Be careful using `assume` and `suchThat`¶