In four experiments we address the question whether several visual objects can be selected voluntarily (exogenously) and then tracked in a Multiple Object Tracking paradigm and, if so, whether the selection involves a different process. Experiment 1 showed that items can indeed be selected based on their labels. Experiment 2 showed that to select the complement set to a set that is automatically (exogenously) selected--e.g. to select all objects not flashed--observers require additional time and that given 1080 ms they were able to select and track them as well as those selected automatically. Experiment 3 showed that the additional time needed in the previous experiment cannot be attributed solely to time required to disengage attention from the initially automatic selections. Experiment 4 showed that the added time provides a monotonically greater benefit when there are more targets, suggesting a serial process. These results are discussed in relation to the Visual Index (FINST) theory which assumes that visual indexes are captured by a data-driven process. It is suggested that voluntarily allocated attention can be used to facilitate the automatic attention capture by objects of interest.