当前位置: 面试刷题>> DNA重复问题 (经典算法题500道)
### 题目描述补充
**DNA重复问题**:
给定一个DNA序列(由'A', 'T', 'C', 'G'四种字符组成的字符串),找出所有长度至少为10的重复子序列(不必连续),并返回这些重复子序列的集合(不考虑重复元素)。注意,子序列与子串不同,子序列是指从原序列中删除某些(或不删除)字符而不改变剩余字符顺序得到的序列。
### 示例
**输入**:
```
"ATGCTATCGCGATGCATCGAT"
```
**输出**(假设存在符合条件的重复子序列):
```
["ATGC", "CGAT", "TGCAT"]
```
**注意**:上述输出仅为示例,实际输出取决于给定DNA序列中的重复子序列。
### PHP 示例代码
```php
function findRepeatedDnaSubsequences($s, $minLength = 10) {
$set = [];
$subsequences = [];
// 生成所有可能的子序列
for ($i = 0; $i < strlen($s); $i++) {
$temp = '';
for ($j = $i; $j < strlen($s); $j++) {
$temp .= $s[$j];
if (strlen($temp) >= $minLength) {
$subsequences[$temp] = isset($subsequences[$temp]) ? $subsequences[$temp] + 1 : 1;
}
}
}
// 筛选出重复的子序列
foreach ($subsequences as $sub => $count) {
if ($count > 1) {
$set[] = $sub;
}
}
return $set;
}
// 测试
$dna = "ATGCTATCGCGATGCATCGAT";
$result = findRepeatedDnaSubsequences($dna);
print_r($result);
```
### Python 示例代码
```python
def find_repeated_dna_subsequences(s, min_length=10):
from collections import defaultdict
subsequences = defaultdict(int)
for i in range(len(s)):
for j in range(i, len(s)):
subseq = s[i:j+1]
if len(subseq) >= min_length:
subsequences[subseq] += 1
return [subseq for subseq, count in subsequences.items() if count > 1]
# 测试
dna = "ATGCTATCGCGATGCATCGAT"
result = find_repeated_dna_subsequences(dna)
print(result)
```
### JavaScript 示例代码
```javascript
function findRepeatedDnaSubsequences(s, minLength = 10) {
const subsequences = {};
for (let i = 0; i < s.length; i++) {
let temp = '';
for (let j = i; j < s.length; j++) {
temp += s[j];
if (temp.length >= minLength) {
if (subsequences[temp]) {
subsequences[temp]++;
} else {
subsequences[temp] = 1;
}
}
}
}
return Object.keys(subsequences).filter(sub => subsequences[sub] > 1);
}
// 测试
const dna = "ATGCTATCGCGATGCATCGAT";
const result = findRepeatedDnaSubsequences(dna);
console.log(result);
```
**码小课**:在解决这类算法问题时,关键在于理解子序列与子串的区别,并有效地生成和统计所有可能的子序列。此外,优化算法以减少不必要的计算也非常重要,比如可以考虑使用动态规划或更高效的哈希方法来减少时间复杂度。在码小课网站上,你可以找到更多关于算法和数据结构的深入讲解和实战案例,帮助你更好地掌握编程技能。