Use of effect sizes promotes scientific inquiry because when a particular experimental study has been replicated, the different effect size estimates from those studies can be easily combined to produce an overall best estimate of the size of the intervention effect.
It has been claimed, however, that you cannot merely average effect-sizes as this ignores possible moderating influences. One of the most fascinating aspects of meta-analysis is the opportunity to evaluate the influence of moderators, or factors that may influence this average effect-size. Indeed, the search for moderators has long dominated the educational research literature. For example, does ability grouping work differently in math from music, for 5-year olds compared to 15 year olds? This search, for what is commonly called Aptitude-Treatment Interactions, is as old as the discipline. These interactions have been actively sought by researchers as finding them would indeed be powerful. However, very few have been reported, and hardly any replicated. But the search must continue. As was noted in VL there were very few moderators, and where they did exist they were pointed out (e.g., the differential effects of homework in elementary and high school). If there is no evidence for moderators, then the average across all moderators can be used to make statement about the influence.
There is an established methodology about whether the variance of the effects is so heterogeneous that the average may not be a good estimator. Conducting such tests is basic practice in meta-analyses and readers were encouraged to go to the original studies to see these analyses. An estimator of the variance was included within each influence (see the dial for each influence) and appropriately commented when these were large. Much time has been spent studying many of the influences with large variance (e.g., feedback) and the story is indeed more nuanced than reflected in the average effect.