The optimal brain damage (OBD) scheme of Le Cun, Denker and Solla for pruning of feedforward networks has been implemented and applied to the contiguity classification problem. It is shown that OBD improves the learning curve (the test error as a function of the number of examples). By inspecting the architectures obtained through pruning, it is found that the networks with less parameters have the smallest test error in agreement with "Ockhams Razor". Based on this, we propose a heuristic which selects the smallest successful architecture among a group of pruned networks and we show that it leads to very efficient optimization of the architecture. The validity of the approximations involved in OBD are discussed and it is found that they are surprisingly accurate for the problem studied.