When genomic selection (GS) is used in breeding schemes, data from multiple generations can provide opportunities to increase sample size and thus the likelihood of extracting useful information from the training data. The Sparse Selection Index (SSI), is is a method for optimizing training data selection. The data files provided with this study include a large multigeneration wheat dataset of grain yield for 68,836 lines generated across eight cycles (years) as well as genotypic data that were analyzed to test this method. The results of the analysis are published in the corresponding journal article.