At Bina, we employ various variant standardization methods to improve variant annotation performance. Previously we’ve shared how the Bina AAiM software applies variant normalization and its impact. In this post, we’ll show another method that we use.
The Bina AAiM software is designed to provide deeper insight into variants identified by next-generation sequencing. After uploading a vcf file to the software, variants are annotated against a number of databases that provide information such as predicted pathogenicity, disease association, population frequency, and more. While at small scale, the operation is usually a simple lookup in a text file or a database, performing this annotation for large datasets and at scale is a challenge. Further, due to the variability in the way variants are represented across data sources and VCF files, finding all matching variants requires careful standardization of how variants are represented. Variant normalization and lift-over to a consistent reference genome are two standardization steps that the Bina AAiM software applies in order to find the maximum number of matching variants, which we will explore in this blog post and the next.