Genomic prediction models may be used in plant breeding pipelines. They are often calibrated using multi-generation data and there is an open question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Therefore, a study was undertaken to determine whether combining sparse selection indexes (SSIs) and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. This dataset contains the genotypic and phenotypic data from CIMMYT maize doubled haploid lines that were used to perform the analyses. The results of the analyses are presented in the accompanying article.