Overview

The DNA structure search engine searches a gene sequence or a sequence uploaded by the user for repeat tracts with the propensity to form H-DNA or Z-DNA. Search results are returned in a web page listing the repeat sequences found along with descriptive information about them, including length, percent G, position on the chromosome, and gene region in which they were found. The search results are enriched with links to NCBI Entrez Gene and GenBank.

Sequences with the Propensity to Form H-DNA

Sequences with the propensity to form H-DNA are mirrored purine runs separated by a relatively short spacer. The search engine allows the user to specify the minimum length for the mirrored arms, the minimum and maximum allowable spacer length, the minimum percent G of the arms, and whether or not a mismatch in the arms is allowed. The user may also choose to omit from the search results mirror repeat elements where the arms are comprised of a simple A repeat or where the entire mirror repeat element is a GA repeat. By default the search engine omits mirror repeat elements consisting solely of poly A or poly G tracts or poly A tracts with a G at the extremities (e.g. GAAAAAAAAAAAAAAAG).

The search results include all possible variations of a tract with the propensity to form H-DNA. The variations are grouped into families, with each family originating from a single purine run that includes an interruption of one or more contiguous pyrimidines. Tracts of the same family number differ by the position of the spacer along the originating purine run. Tracts that arise from adjacent pyrimidine-interrupted purine runs are assigned different family numbers, yet may share an arm.

Sequences with the Propensity to Form Z-DNA

Sequences with the propensity to form Z-DNA are alternating purine/pyrimidine tracts (APPTs), with a preference for GC repeats. The engine scores the APPTs, giving each GC dinucleotide a score of 25 and each GT dinucleotide a score of 3. A user-defined minimum score determines which APPTs are included in the search results.

Reference Data

The repeat sequence search is executed on the NCBI assembled chromosome sequence data. Gene and gene region locations on chromosome sequences are derived from the NCBI Map Viewer sequence-to-gene mapping data, using NCBI Reference mapping data for human and mouse genes. Gene symbols and descriptions are from the NCBI Entrez Gene data.

Versions and release dates for source data are: