There's an interesting geometric interpretation that this implies:
The specific scenario they're modeling here, where a bunch of samples with correlated properties x and y are clustered by x + y, essentially takes a strip of datapoints that are distributed over a diagonally-oriented rectangle, and then cuts it into a series of shorter strips, by cutting at thresholds (lines where x+y=n) that run along the opposite diagonal. If the resulting short strips are narrow enough (if the upper and lower thresholds are closer together than that degree to which x and y typically differ), they will be rectangles oriented the other way - in other words they will represent groups with strong negative correlation.
The specific scenario they're modeling here, where a bunch of samples with correlated properties x and y are clustered by x + y, essentially takes a strip of datapoints that are distributed over a diagonally-oriented rectangle, and then cuts it into a series of shorter strips, by cutting at thresholds (lines where x+y=n) that run along the opposite diagonal. If the resulting short strips are narrow enough (if the upper and lower thresholds are closer together than that degree to which x and y typically differ), they will be rectangles oriented the other way - in other words they will represent groups with strong negative correlation.