UVa 1368 DNA Consensus String（贪心+计数）

xiaoxiao2021-04-13 48

原题地址

https://vjudge.net/problem/UVA-1368

题意：定义两个等长字符串的Hamming距离等于字符不同的位置个数。例如ACGT和GCGA的Hamming距离为2。（A-G和T-A不同）。输入m个长度均为n的DNA序列（只含A、C、T、G），求解一个DNA序列，使得该序列到这m个序列的总Hamming距离之和最小。如果存在多个这样的DNA序列，求字典序最小的解。

解题思路

本题是《算法竞赛入门经典》的习题3-7，属于字符串和贪心的结合题。

既然要求最小的总Hamming距离，那就尽可能使每一列的Hamming距离最小（最优子结构）。

让答案序列的每一列都取这m个序列中对应列出现次数最多的字符，剩余其他字符的出现次数之和即该列的损耗，最后输出答案序列并累加这些损耗。结构体每个NODE单元的cnt表示该列elem字符出现的总次数，重载<符号使得之后的min函数取到最大的cnt，在cnt相同时取ASCII码最小的字符，保存在chose结构中。ans[j].cnt = m - chose.cnt;每列字符出现的总次数一定是m，因此除去选定字符的频次，就是该列的距离之和。

ps：刚开始用优先队列priority_queue做，也能AC，后来觉得有点小题大做，毕竟只需要选出cnt最大元素即可，用一个三层min就可以了。

AC代码

#include <iostream> #include <algorithm> #define MAXM 55 #define MAXN 1005 using namespace std; char seq[MAXM][MAXN]; typedef struct node { char elem; int cnt; bool operator < (const node &A) const { if (this->cnt != A.cnt) //cnt降序 return this->cnt > A.cnt; else return this->elem < A.elem; //相同cnt字符升序 } }NODE; int main() { NODE ans[MAXN], a, c, g, t; //ans存放每一列的输出字符和该列的损耗 int T, m, n; cin >> T; while (T--) { cin >> m >> n; for (int i = 0; i < m; ++i) cin >> seq[i]; for (int j = 0; j < n; ++j) ans[j].cnt = 0; for (int j = 0; j < n; ++j) //对每一列检查 { //初始化 a.elem = 'A'; a.cnt = 0; c.elem = 'C'; c.cnt = 0; g.elem = 'G'; g.cnt = 0; t.elem = 'T'; t.cnt = 0; for (int i = 0; i < m; ++i) //对该列的每一行检查 { switch(seq[i][j]) //处理当前字符 { case 'A': a.cnt++; break; case 'C': c.cnt++; break; case 'G': g.cnt++; break; case 'T': t.cnt++; break; default: break; } } NODE chose = min(a, min(c, min(t, g))); //重载后的min ans[j].elem = chose.elem; //取cnt最小的元素 ans[j].cnt = m - chose.cnt; //其他元素的cnt和即这一列的损耗值 } int errors = 0; for (int j = 0; j < n; ++j) { cout << ans[j].elem; errors += ans[j].cnt; //损耗值累加 } cout << endl; cout << errors << endl; } return 0; }

转载请注明原文地址: https://ju.6miu.com/read-668375.html

技术

最新回复(0)