We have proposed an extension to the Q-learning algorithm that incorporates the existing clinical expertise into the trial-and-error process of acquiring an appropriate administration strategy of rHuEPO to patients with anemia due to ESRD. The specific modification lies in multiple updates of the Q-values for several dose/response combinations during a single learning event. This in turn decreases the risk of administering doses that are inadequate in certain situations and thus increases the speed of the learning process. We have evaluated the proposed method using a simulation test-bed involving an "artificial patient" and compared the outcomes to those obtained by a classical Q-learning and a numerical implementation of a clinically used administration protocol for anemia management. The outcomes of the simulated treatments demonstrate that the proposed method is a more effective tool than the traditional Q-learning. Furthermore, we have observed that it has a potential to provide even more stable anemia management than the AMP.