Skip to content

Instantly share code, notes, and snippets.

@renaud
Last active December 25, 2015 09:39
Show Gist options
  • Save renaud/6955869 to your computer and use it in GitHub Desktop.
Save renaud/6955869 to your computer and use it in GitHub Desktop.
Transforms topic-model input file, from DCA format (space separated) to LDA-C format (column-separated)
'''
Transforms topic-model input file,
from DCA format (space separated)
to LDA-C format (column-separated)
@author renaud@apache.org
'''
import sys
dca_file = sys.argv[1]
out_file = sys.argv[2] #"{}.lda-c".format(dca_file)
print "writing to: " + out_file
out = open(out_file, "a")
with open(dca_file) as f:
#skip first 2 lines, these are the cnts.
for line in f.readlines()[3:]:
l = line.rstrip().split(' ')
out.write(l[0])
# all ints but the first
for x in range(1, len(l)-1, 2):
out.write(" {}:{}".format(l[x], l[x+1]))
out.write("\n")
out.close()
123
1234
27 0 1 1 2 2 2 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 2 16 1 17 1 18 2 19 1 20 2 21 1 22 1 23 1 24 1 25 1 26 1
137 2 1 4 2 5 1 13 7 21 1 25 1 27 1 28 1 29 1 30 1 31 1 32 1 33 1 34 6 35 6 36 1 37 1 38 1 39 1 40 1 41 1 42 1 43 1 44 1 45 2 46 2 47 1 48 1 49 3 50 2 51 1 52 2 53 2 54 2 55 2 56 1 57 2 58 1 59 1 60 3 61 3 62 1 63 4 64 2 65 2 66 1 67 1 68 1 69 1 70 1 71 1 72 2 73 2 74 2 75 3 76 3 77 1 78 1 79 1 80 2 81 2 82 1 83 1 84 1 85 1 86 1 87 1 88 1 89 1 90 1 91 1 92 1 93 1 94 1 95 1 96 1 97 1 98 1 99 1 100 1 101 2 102 1 103 2 104 1 105 1 106 1 107 2 108 1 109 1 110 1 111 1 112 1 113 1 114 1 115 1 116 1 117 2 118 1 119 1 120 1 121 1 122 1 123 1 124 1 125 1 126 1 127 1 128 1 129 1 130 1 131 1 132 1 133 1 134 1 135 1 136 1 137 1 138 1 139 1 140 1 141 1 142 1 143 1 144 1 145 1 146 1 147 1 148 1 149 1 150 1 151 1 152 1 153 1 154 1 155 1 156 1 157 1
37 9 1 23 1 25 1 35 1 69 1 72 1 191 1 220 1 221 3 222 2 223 1 224 4 225 1 226 1 227 1 228 1 229 2 230 1 231 1 232 1 233 1 234 1 235 1 236 1 237 1 238 1 239 1 240 1 241 1 242 1 243 1 244 1 245 1 246 1 247 1 248 1 249 1
98 2 1 11 1 24 2 25 3 27 1 35 1 51 1 54 4 59 2 65 3 106 2 112 1 128 1 134 2 145 1 147 2 149 1 182 1 192 1 237 1 259 1 262 1 264 2 289 1 312 1 339 1 342 1 343 1 344 2 345 2 346 1 347 3 348 9 349 9 350 4 351 2 352 7 353 2 354 1 355 1 356 1 357 1 358 2 359 2 360 3 361 4 362 2 363 2 364 1 365 1 366 5 367 1 368 1 369 1 370 6 371 5 372 1 373 2 374 1 375 1 376 1 377 1 378 1 379 1 380 1 381 1 382 1 383 1 384 1 385 1 386 1 387 1 388 1 389 1 390 1 391 2 392 1 393 1 394 1 395 1 396 1 397 1 398 1 399 1 400 1 401 1 402 3 403 1 404 1 405 1 406 1 407 1 408 1 409 1 410 1 411 1 412 1 413 1
28 33 1 90 1 156 1 190 1 229 1 410 1 414 1 415 1 416 2 417 1 418 1 419 1 420 2 421 1 422 1 423 1 424 1 425 1 426 1 427 1 428 1 429 1 430 1 431 1 432 1 433 1 434 1 435 1
23 27 1 33 1 90 1 173 1 229 1 361 1 415 1 417 1 419 1 420 1 421 1 423 1 436 1 437 1 438 1 439 1 440 1 441 1 442 1 443 1 444 1 445 1 446 1
37 27 1 166 1 238 1 286 1 300 1 403 1 427 1 447 1 448 2 449 1 450 2 451 2 452 2 453 3 454 1 455 1 456 1 457 1 458 1 459 1 460 1 461 1 462 1 463 1 464 2 465 1 466 1 467 1 468 1 469 1 470 1 471 1 472 1 473 1 474 1 475 1 476 1
62 11 1 24 1 65 3 134 1 145 1 200 1 231 1 232 1 259 1 260 2 312 1 392 1 403 1 413 1 422 1 430 1 447 2 450 1 451 2 459 1 468 1 477 2 478 1 479 4 480 2 481 1 482 1 483 1 484 1 485 2 486 1 487 1 488 1 489 1 490 1 491 1 492 1 493 1 494 1 495 1 496 1 497 2 498 1 499 3 500 4 501 1 502 1 503 1 504 1 505 1 506 1 507 1 508 1 509 1 510 1 511 1 512 1 513 1 514 1 515 1 516 1 517 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment