A better measure is the log loss. The weatherman gets penalized -log(p) if it rains and -log(1-p) if it doesn't. A weatherman now has the incentive to announce his belief. There are other scoring functions with this property but the log loss has some nice properties such as the best a weather could hope to achieve is exactly the entropy of the distribution. The log loss and other measures are often used to analyze prediction mechanisms such as information markets.
Dean Foster and Rakesh Vohra have a different take looking at a notion called calibration. Here you take all the days that the weatherman predicted 70% chance of rain and check that 70% of those days it actually rained. A prediction algorithm calibrates a binary sequence if for finite set of allowed probabilities, each of the subsequences consisting of predictions of probability p have about a p fraction of ones. Foster and Vohra showed that some probabilistic calibration scheme will calibrate every sequence in the limit. In other words you can be a great weatherman in the calibration sense just by looking at the history of rain and forgoing that pesky meterological training.
Dean Foster and Sham Kakade gave a couple of interesting talks at the Bounded Rationality workshop giving a deterministic scheme that achieves a weak form of calibration and use it to learn Nash equilibirum in infinite repeated games.