当前位置: 面试刷题>> DNA重复问题 (经典算法题500道)


### 题目描述补充 **DNA重复问题**: 给定一个DNA序列(由'A', 'T', 'C', 'G'四种字符组成的字符串),找出所有长度至少为10的重复子序列(不必连续),并返回这些重复子序列的集合(不考虑重复元素)。注意,子序列与子串不同,子序列是指从原序列中删除某些(或不删除)字符而不改变剩余字符顺序得到的序列。 ### 示例 **输入**: ``` "ATGCTATCGCGATGCATCGAT" ``` **输出**(假设存在符合条件的重复子序列): ``` ["ATGC", "CGAT", "TGCAT"] ``` **注意**:上述输出仅为示例,实际输出取决于给定DNA序列中的重复子序列。 ### PHP 示例代码 ```php function findRepeatedDnaSubsequences($s, $minLength = 10) { $set = []; $subsequences = []; // 生成所有可能的子序列 for ($i = 0; $i < strlen($s); $i++) { $temp = ''; for ($j = $i; $j < strlen($s); $j++) { $temp .= $s[$j]; if (strlen($temp) >= $minLength) { $subsequences[$temp] = isset($subsequences[$temp]) ? $subsequences[$temp] + 1 : 1; } } } // 筛选出重复的子序列 foreach ($subsequences as $sub => $count) { if ($count > 1) { $set[] = $sub; } } return $set; } // 测试 $dna = "ATGCTATCGCGATGCATCGAT"; $result = findRepeatedDnaSubsequences($dna); print_r($result); ``` ### Python 示例代码 ```python def find_repeated_dna_subsequences(s, min_length=10): from collections import defaultdict subsequences = defaultdict(int) for i in range(len(s)): for j in range(i, len(s)): subseq = s[i:j+1] if len(subseq) >= min_length: subsequences[subseq] += 1 return [subseq for subseq, count in subsequences.items() if count > 1] # 测试 dna = "ATGCTATCGCGATGCATCGAT" result = find_repeated_dna_subsequences(dna) print(result) ``` ### JavaScript 示例代码 ```javascript function findRepeatedDnaSubsequences(s, minLength = 10) { const subsequences = {}; for (let i = 0; i < s.length; i++) { let temp = ''; for (let j = i; j < s.length; j++) { temp += s[j]; if (temp.length >= minLength) { if (subsequences[temp]) { subsequences[temp]++; } else { subsequences[temp] = 1; } } } } return Object.keys(subsequences).filter(sub => subsequences[sub] > 1); } // 测试 const dna = "ATGCTATCGCGATGCATCGAT"; const result = findRepeatedDnaSubsequences(dna); console.log(result); ``` **码小课**:在解决这类算法问题时,关键在于理解子序列与子串的区别,并有效地生成和统计所有可能的子序列。此外,优化算法以减少不必要的计算也非常重要,比如可以考虑使用动态规划或更高效的哈希方法来减少时间复杂度。在码小课网站上,你可以找到更多关于算法和数据结构的深入讲解和实战案例,帮助你更好地掌握编程技能。
推荐面试题