The design of an ideal scoring function for protein-protein docking that would also predict the binding affinity of a complex is one of the challenges in structural proteomics. Such a scoring function would open the route to in silico, large-scale annotation and prediction of complete interactomes. Here we present a protein-protein binding affinity benchmark consisting of binding constants (K(d)'s) for 81 complexes. This benchmark was used to assess the performance of nine commonly used scoring algorithms along with a free-energy prediction algorithm in their ability to predicting binding affinities. Our results reveal a poor correlation between binding affinity and scores for all algorithms tested. However, the diversity and validity of the benchmark is highlighted when binding affinity data are categorized according to the methodology by which they were determined. By further classifying the complexes into low, medium and high affinity groups, significant correlations emerge, some of which are retained after dividing the data into more classes, showing the robustness of these correlations. Despite this, accurate prediction of binding affinity remains outside our reach due to the large associated standard deviations of the average score within each group. All the above-mentioned observations indicate that improvements of existing scoring functions or design of new consensus tools will be required for accurate prediction of the binding affinity of a given protein-protein complex. The benchmark developed in this work will serve as an indispensable source to reach this goal.