In the present study, the reliability and validity of judging at the European championship in Berlin 2011 were analysed and the results were compared to a different level gymnastic competition - Universiade 2009 in Belgrade. For reliability and consistency assessment, mean absolute judge deviation from final execution score, Cronbach’s alpha coefficient, intra-class correlations (ICC) and Armor’s theta coefficient were calculated. For validity assessment mean deviations of judges’ scores, Kendall’s coefficient of concordance W and ANOVA eta-squared values were used. For Berlin 2011 in general Cronbach’s alpha was above 0.95, minima of item-total correlations were above 0.8, and the ICC of average scores and Armor’s theta were above 0.94. Comparison with Universiade 2009 identified vault and floor scores at both competitions to have inferior reliability indices. At both competitions average deviations of judges from the final E score were close to zero (p=0.84) but Berlin 2011 competition showed a higher number of apparatuses with significant Kendall’s W (5 vs. 2 for Universiade 2009) and higher eta-squared values indicating higher judge panel bias in all-round and apparatus finals. In conclusion, the quality of judging was comparable at examined gymnastics competitions of different levels. Further work must be done to analyse the inferior results at vault and floor apparatuses.