Top-Down Crawl: a method for the ultra-rapid and motif-free alignment of sequences with associated binding metrics

Brendon H Cooper; Tsu-Pei Chiu; Remo Rohs

doi:10.1093/bioinformatics/btac653

Top-Down Crawl: a method for the ultra-rapid and motif-free alignment of sequences with associated binding metrics

Bioinformatics. 2022 Nov 15;38(22):5121-5123. doi: 10.1093/bioinformatics/btac653.

Authors

Brendon H Cooper¹, Tsu-Pei Chiu¹, Remo Rohs^{1

2}

Affiliations

¹ Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
² Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA.

Abstract

Summary: Several high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC), an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments (protein binding microarray, SELEX-seq, SMiLE-seq, etc.) without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods.

Availability and implementation: TDC can be run online at https://topdowncrawl.usc.edu or locally as a python package available through pip at https://pypi.org/project/TopDownCrawl.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Binding Sites
Position-Specific Scoring Matrices
Protein Binding
Sequence Analysis, DNA / methods
Software*

Abstract

Publication types

MeSH terms

Grants and funding