This work considers strategies to develop accurate and reliable graph neural networks (GNNs) for molecular property predictions. Prediction performance of GNNs is highly sensitive to the change in various parameters due to the inherent challenges in molecular machine learning, such as a deficient amount of data samples and bias in data distribution. Comparative studies with well-designed experiments are thus important to clearly understand which GNNs are powerful for molecular supervised learning. Our work presents a number of ablation studies along with a guideline to train and utilize GNNs for both molecular regression and classification tasks. First, we validate that using both atomic and bond meta-information improves the prediction performance in the regression task. Second, we find that the graph isomorphism hypothesis proposed by [Xu, K.; et al How powerful are graph neural networks? 2018, arXiv:1810.00826. arXiv.org e-Print archive. https://arxiv.org/abs/1810.00826] is valid for the regression task. Surprisingly, however, the findings above do not hold for the classification tasks. Beyond the study on model architectures, we test various regularization methods and Bayesian learning algorithms to find the best strategy to achieve a reliable classification system. We demonstrate that regularization methods penalizing predictive entropy might not give well-calibrated probability estimation, even though they work well in other domains, and Bayesian learning methods are capable of developing reliable prediction systems. Furthermore, we argue the importance of Bayesian learning in virtual screening by showing that well-calibrated probability estimation may lead to a higher success rate.