Artificial neural networks are often trained by using the back propagation algorithm to compute the gradient of an objective function with respect to the synaptic strengths. For a biological neural network, such a gradient computation would be difficult to implement, because of the complex dynamics of intrinsic and synaptic conductances in neurons. Here we show that irregular spiking similar to that observed in biological neurons could be used as the basis for a learning rule that calculates a stochastic approximation to the gradient. The learning rule is derived based on a special class of model networks in which neurons fire spike trains with Poisson statistics. The learning is compatible with forms of synaptic dynamics such as short-term facilitation and depression. By correlating the fluctuations in irregular spiking with a reward signal, the learning rule performs stochastic gradient ascent on the expected reward. It is applied to two examples, learning the XOR computation and learning direction selectivity using depressing synapses. We also show in simulation that the learning rule is applicable to a network of noisy integrate-and-fire neurons.