The notion that visual attention can operate over visual objects in addition to spatial locations has recently received much empirical support, but there has been relatively little empirical consideration of what can count as an 'object' in the first place. We have investigated this question in the context of the multiple object tracking paradigm, in which subjects must track a number of independently and unpredictably moving identical items in a field of identical distractors. What types of feature clusters can be tracked in this manner? In other words, what counts as an 'object' in this task? We investigated this question with a technique we call target merging: we alter tracking displays so that distinct target and distractor locations appear perceptually to be parts of the same object by merging pairs of items (one target with one distractor) in various ways - for example, by connecting item locations with a simple line segment, by drawing the convex hull of the two items, and so forth. The data show that target merging makes the tracking task far more difficult to varying degrees depending on exactly how the items are merged. The effect is perceptually salient, involving in some conditions a total destruction of subjects' capacity to track multiple items. These studies provide strong evidence for the object-based nature of tracking, confirming that in some contexts attention must be allocated to objects rather than arbitrary collections of features. In addition, the results begin to reveal the types of spatially organized scene components that can be independently attended as a function of properties such as connectedness, part structure, and other types of perceptual grouping.