Skip to content

Instantly share code, notes, and snippets.

@netwjx
Created October 18, 2012 03:41
Show Gist options
  • Save netwjx/3909743 to your computer and use it in GitHub Desktop.
Save netwjx/3909743 to your computer and use it in GitHub Desktop.
使用BitSet排除重复的手机号码

说明

号码文件在c:\abc.txt, 排除重复和不合法的号码将输出到def.txt

parsePhone可以进一步解析哪些不完全正确的号码格式, 当然可能会变慢

offset这里只做了130-139 150-159 180-189这些号码段的偏移处理, 按照实际需求还可以做更大范围的

在eclipse中运行如果发生了OutOfMemoryError: Java heap space, 请在启动配置中增加 VM arguments

-Xmx800m -Xms100m

上面适用于可用内存大于800M的机器

这个是以空间换时间, 所以内存越大越好

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.util.BitSet;
public class PhoneNumber {
public static void main(String[] args) throws Exception {
File f = new File("c:\\abc.txt");
BufferedReader r = new BufferedReader(new FileReader(f),
1024 * 1024 * 10);
PrintWriter w = new PrintWriter(new BufferedWriter(new FileWriter(
"c:\\def.txt"), 1024 * 1024 * 10));
BitSet[] bss = new BitSet[10];
try {
String line = null;
int i = 0;
int[] phone = { -1, -1 }, index = { -1, -1 };
while ((line = r.readLine()) != null /* && i++ < 100 */) {
parsePhone(line, phone);
if (phone[0] >= 0) {
offset(phone, index);
if (index != null && index[1] >= 0) {
BitSet bs = bss[index[0]];
if (bs == null) {
bs = new BitSet(10 * 10000); // 可以尝试优化这里的初始化大小
bss[index[0]] = bs;
}
if (!bs.get(index[1])) {
bs.set(index[1]);
w.println(phone[0] + "" + phone[1]);
}
}
}
}
} finally {
r.close();
w.close();
}
}
/**
* 偏移号码到int型可表示的范围内
*
* @param phone
* @param index
*/
static void offset(int[] phone, int[] index) {
int offset = -1, i = -1;
if (130 <= phone[0] && phone[0] <= 139) {
i = 0;
offset = phone[0] - 130; // 0, 0-9
} else if (150 <= phone[0] && phone[0] <= 159) {
i = 0;
offset = phone[0] - 140; // 0, 10-19
} else if (180 <= phone[0] && phone[0] <= 189) {
i = 1;
offset = phone[0] - 180; // 1, 0-9
}
// offset 在0-20之间 包括20
// i 在上面BitSet[]的长度之内
// 上面的代码可以改写成switch以提高速度
index[0] = i;
index[1] = offset * 100000000 + phone[1];
}
/**
* 138 1234 5678 => 138,12345678
*
* @param l
* @param phone
* @return
*/
static void parsePhone(String l, int[] phone) {
int p1 = 0, p2 = 0;
if (l.length() == 11) {
// 进一步解析需要正则,速度会慢
try {
p1 = Integer.parseInt(l.substring(0, 3), 10);
p2 = Integer.parseInt(l.substring(3, 11), 10);
} catch (NumberFormatException e) {
}
}
if (p1 != 0 && p2 != 0) {
phone[0] = p1;
phone[1] = p2;
} else {
phone[0] = -1;
}
}
}
@netwjx
Copy link
Author

netwjx commented Oct 18, 2012

还可以对offset优化 不使用* 而使用位运算, offset可以表示5位2进制长度的数据, 即0-31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment