Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@kadin2048
Created November 2, 2021 05:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kadin2048/1e6a1f1204b56d08e6612f9b33dc44f7 to your computer and use it in GitHub Desktop.
Save kadin2048/1e6a1f1204b56d08e6612f9b33dc44f7 to your computer and use it in GitHub Desktop.
Very old Java-based conversion utility for migrating CenterIM flat file logs to XML "Unified Log Format" logs used by Adium. Requires argv and xmlwriter libraries.

CenterIM to XML Log Converter

Usage Notes

This program converts history files produced by CenterIM into "Unified Log Format" XML files used by Adium and some other instant messaging programs. It can be run manually on a single history file, or can use environment variables to easily batch-process an entire .centerim folder at once.

The Unified Log Format is a standard, based on XML, allowing for the interchange of instant messaging logs between various client programs. The current canonical implementation is in the Mac OS X client Adium (http://www.adiumx.com/). More information on the format can be found at http://trac.adiumx.com/wiki/XMLLogFormat, and there is a low-traffic mailing list at http://groups.google.com/group/ulf-discuss.

CenterIM uses a flat-file log format, with messages delimited with ASCII form-feed characters and timestamps expressed as Unix-style seconds since the epoch.

Usage:

In the most basic case, the utility can be invoked:

java -cp ":argv.jar:xmlwriter-2.2.2.jar" CIMtoXML \
-i inputfile -d destdir -n localnickname

Online help for all commandline options and switches can be obtained via:

java -cp ":argv.jar:xmlwriter-2.2.2.jar" CIMtoXML -h

The input file, preceded by the -i flag, must be a CenterIM history file, located in the normal ~/.centerim/ hierarchy. History files located outside of the normal folder hierarchy cannot be processed, because key information is retrieved from the name of the enclosing directory.

The destination directory, preceded by -d, is the destination directory where converted logs should be written. The output file names will be generated automatically.

Known Bug: Neither the input file nor the destination directory should contain path separator characters such as / (Linux) or \\ (Windows), even if those characters are escaped on the command line.

The local nickname, preceded by -n, is your nick, screen name, or ID for the messaging system in use. The other user's screen name can be retrieved from the log file's enclosing folder, but the local name cannot and must be provided. Examples would be "johndoe@gmail.com" for a GTalk account, or "Joe123" for an AIM account. This value will be used when writing the logs. This parameter is not required if the --getenv flag and corresponding environment variables are used.

Summary of Options

Required parameters:

-i    Source 'history' file (Always required)   
-d    Destination directory (Always required)   
-n    Local instant messaging nickname/ID (Required unless --getenv is used)     

Optional parameters:

--help        Print usage notes and exit   
--chatlog     Write Adium-style ".chatlog" extension instead of ".xml"   
--getenv      Use environment variables instead of -n option   
--nounixdate  Don't add "unixdate" attribute to messages   

The --getenv option

When used with the --getenv option, the local nickname is read from one of three environment variables, which should be set in advance of program execution:

AIMUSER   
MSNUSER   
JABBERUSER   

The program will choose the appropriate value based on the leading character in the log file's enclosing folder. Currently, only AIM, MSN, and Jabber/GTalk networks are supported. This option is provided so the progam can be more easily called by scripts, for batch processing of an entire .centerim directory.

An error will result if a log is processed with the --getenv option and the appropriate environment variable has not been set. To prevent errors, it is recommended that wrapper scripts set all three environment variables at runtime.

Note: The --getenv option requires JRE version 1.5 or later, and will probably fail messily with older versions. JRE 1.5 is also referred to by Sun as "Java 2 SE 5" or "J2SE 5.0" and can be downloaded at http://java.sun.com/j2se/1.5/index.jsp. On the Macintosh platform, the JRE can be most easily updated via the built-in Software Update utility.

The --chatlog option

The --chatlog option changes output files from the default .xml to the format used by Adium, .chatlog. (This is not used by the newest versions of Adium, which encapsulate .xml files in per-logfile directories. However, .chatlog files will still be associated with Adium.)

The --nounixdate option

The --nounixdate option suppresses the non-standard (but typically not harmful) "unixtime" attributes. These attributes are not a part of the ULF standard, but are included to reduce ambiguity, ensure all timestamp information from the original CIM logs is retained in the XML versions, and aid in further conversion. If these attributes are not desired, the --nounixdate option will prevent their inclusion, and include only the ISO-style date.

Known Issues and Bugs

Although the program has been tested against typical scenarios and use cases, it should be considered experimental. Although when used correctly there is very little chance of it causing harm to your stored logs, it is strongly recommended that you make a backup copy of your ~/.centerim directory before proceeding with conversion.

On Linux and other Unix-like systems, a backup can be made by running:

tar -czvf centerim.bkup.tgz ~/.centerim

Limitations

Although the converter does not read complete logfiles into memory before writing them, it may fail when the log files are extremely large relative to the amount of memory available to the Java virtual machine. If this occurs, either increase the memory available to the JVM or break the log into multiple files (breaking at an ASCII Form Feed character).

Filenames containing spaces, special characters (even when escaped), or remote URLs may cause unexpected behavior. The program has not been tested against them and their use is not recommended.

The program was tested on Debian Linux and Mac OS X systems; it has not been tested on Windows with Cygwin.

Licensing

This program is licensed and made available to you under the GNU General Public License (GPL), version 3.0 or later. A text version of this license is provided in the file LICENSE, or it can be obtained from http://www.gnu.org/licenses/gpl.txt.

This program also makes use of the XmlWriter 2.2.2 library (http://sourceforge.net/projects/xml-writer/) and Oliver Goldman's argv library (http://software.charlie-dog.com/argv/argv.html), which are licensed separately and distributed in unmodified form with the program for convenience.

argv Licensing

Oliver Goldman's argv library is Copyright (C) 2001 - 2002 by Oliver Goldman.

Redistribution and use in source and binary forms, with or without modification, are permitted.

XmlWriter Licensing

XmlWriter 2.2.2 is Copyright (c) 2003 by Henri Yandell.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of XmlWriter nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

/** This program converts history files produced by CenterIM
* into "Unified Log Format" XML files used by Adium and some
* other instant messaging programs.
*
* Basic Usage:
* $ java CIMtoXML -i inputfile -d destdir -n localnickname
*
* Inputfile must be a CenterIM history file, located in
* the normal ~/.centerim/ hierarchy.
*
* Destdir is the destination directory where converted logs should
* be written. The output file names will be generated according
* to the Unified Log Format.
*
* Localnickname is your nickname, screen name, or ID. The other
* user's screen name can be retrieved from the log file's
* enclosing folder, but the local name cannot and must be
* provided. Examples would be "johndoe@gmail.com" for a GTalk
* account, or "Snax123" for an AIM account.
*
* When used with the --getenv option, the localnickname is read from
* one of the three environment variables, which should be set in
* advance of program execution:
* AIMUSER
* MSNUSER
* JABBERUSER
* If the program is being run from within a shellscript wrapper
* these environment variables and the --getenv option should
* probably be used, to aid in batch processing.
*
* Note:
* The --getenv option requires JRE version 1.5 or later, and will
* probably fail with older versions.
*
*/
import java.io.*;
import com.generationjava.io.xml.*; // for producing XML output
import java.util.Date;
import java.text.SimpleDateFormat;
import com.charliedog.argv.*; // for processing arguments
import java.util.List;
class CIMtoXML {
public static void main(String[] argv) {
// INTERPRET ARGUMENTS
StringArgument sSourceFile = new StringArgument("-i", "Source 'history' file (Always required)");
StringArgument sDestdir = new StringArgument("-d", "Destination directory (Always required)");
StringArgument sNickName = new StringArgument("-n", "Local instant messaging nickname/ID (Required unless --getenv is used)");
BooleanArgument bHelp = new BooleanArgument("--help", "Print usage notes and exit");
BooleanArgument bChatlog = new BooleanArgument("--chatlog", "Write Adium-style \".chatlog\" extension instead of \".xml\"");
BooleanArgument bGetenv = new BooleanArgument("--getenv", "Use environment variables instead of -n option for local nickname; see documentation");
BooleanArgument bNoUnixDate = new BooleanArgument("--nounixdate", "Don't add \"unixdate\" attribute to messages");
ArgumentParser parser = new ArgumentParser();
parser.addArgument(sSourceFile);
parser.addArgument(sDestdir);
parser.addArgument(sNickName);
parser.addArgument(bHelp);
parser.addArgument(bChatlog);
parser.addArgument(bGetenv);
parser.addArgument(bNoUnixDate);
List extraargs = parser.parse(argv);
// If user passes -h flag or if options are obviously invalid, print usage notes and exit with zero status
if ( argv.length == 0 || !extraargs.isEmpty() || bHelp.getValue() ) {
PrintWriter stdout = new PrintWriter(System.out);
parser.printUsage(stdout);
stdout.close();
System.exit(0);
}
// TODO: Remove debugging output from final version
System.out.println("Debug: sSourceFile: " + sSourceFile.getValue() );
System.out.println("Debug: sDestdir: " + sDestdir.getValue() );
// Check for presence, but not validity, of mandatory arguments
if ( (sSourceFile.getValue() == null) ||
(sDestdir.getValue() == null) ||
( sNickName.getValue() == null && !bGetenv.getValue() ) // sNickName is only required if bGetenv isn't true
) {
System.err.println("Error: One or more required arguments was not specified.");
System.exit(1);
}
// NEAR END NICKNAME
String sNearEndName = sNickName.getValue(); // set nearEndName to sNickName to provide a default
String sMSNName = ""; // Initialize to empty strings, this prevents annoying Java errors
String sAIMName = "";
String sJabberName = "";
if ( bGetenv.getValue() ) { // if the --getenv flag is used
try{
sAIMName = System.getenv("AIMUSER");
sMSNName = System.getenv("MSNUSER");
sJabberName = System.getenv("JABBERUSER");
}
catch (Exception e) {
System.err.println("Error while attempting to read environment variables.");
System.err.println("Error was: " + e.getMessage() );
System.exit(1);
}
}
try{ // begin main try loop to catch errors, especially I/O
File fSourceFile = new File( sSourceFile.getValue().trim() ); // source file into File() object
BufferedReader brSourceFile = new BufferedReader(new FileReader( fSourceFile ));
String sEnclosingDir = fSourceFile.getParent().trim();
// For debugging:
System.out.println("Debug: Source's enclosing directory is: " + sEnclosingDir);
/*
* The IM service used for the selected history file is determined by looking
* at the first character in the parent directory's name.
* a = AOL Instant Messenger
* j = Jabber (used by Google Talk)
* m = Microsoft Network Messenger
* This list is not necessarily exhaustive and should be appended as needed.
*/
// IM SERVICE
char cServiceID = sEnclosingDir.charAt(0);
String sIMService;
if (cServiceID == 'a') {
sIMService = "AIM";
if ( bGetenv.getValue() ) {
sNearEndName = sAIMName;
}
} else if (cServiceID == 'j') {
sIMService = "Jabber";
if ( bGetenv.getValue() ) {
sNearEndName = sJabberName;
}
} else if (cServiceID == 'm') {
sIMService = "MSN";
if ( bGetenv.getValue() ) {
sNearEndName = sMSNName;
}
} else {
sIMService = "Unknown";
System.out.println("Warning: Messaging service not recognized, writing service=\"unknown\".");
if ( bGetenv.getValue() ) { // If the messaging type is unknown and environment variables are used, we don't know which one to use...
System.err.println("Error: Cannot use environment variables with unknown messaging service.");
System.exit(1); // ...So we just stop and force the user to go back and provide it explicitly with the -n option.
}
}
System.out.println("Debug: IM Service is " + sIMService);
System.out.println("Debug: Near end name is " + sNearEndName);
// Test to make sure that nearEndName isn't empty or null
if ( sNearEndName == null ) {
System.err.println("Error: Local messaging name is null, check environment variables.");
System.exit(1);
}
// FAR END NICKNAME
String farEndName = sEnclosingDir.substring(1); // we get the far end name from the log's enclosing folder
System.out.println("Debug: Far end name is: " + farEndName);
// OUTPUT NAME CONSTRUCTION
File destdir = new File( sDestdir.getValue().trim() ); // make dest directory into File() object
if ( !destdir.isDirectory() ) { // If it's *not* a directory...
System.err.println("Error: Specified destination must be a directory.");
System.exit(1); // Exit with error status
}
System.out.println("Debug: Destination directory path is " + destdir.getPath() );
String extension = "xml"; // default to using ".xml" extension
if (bChatlog.getValue() ) { // if the user invokes with the --chatlog option...
extension = "chatlog"; // ... write ".chatlog" for Adium instead
}
Date curdate = new Date(); // defaults to current time, which is OK
SimpleDateFormat outputdf = new SimpleDateFormat("yyyy-MM-dd'T'HH.mm.ssZ");
String outfile = destdir.getPath() + File.separator + farEndName + " (" + outputdf.format(curdate) + ")." + extension;
System.out.println("Debug: Attempting to write to " + outfile);
BufferedWriter out = new BufferedWriter(new FileWriter( outfile )); // create a buffered writer for performance
XmlWriter xmlwriter = new SimpleXmlWriter(out); // wrap it in an XML writer
PrettyPrinterXmlWriter xmlout = new PrettyPrinterXmlWriter(xmlwriter); // then wrap the XML writer in a pretty printer
xmlout.writeXmlVersion();
xmlout.writeEntity("chat"); // open <chat>, the root element
xmlout.writeAttribute("account", sNearEndName);
xmlout.writeAttribute("service", sIMService);
int msgs = 0; // create a counter to hold number of messages
String line; // used in the loop below
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
// TODO: This is not the exact date format used by ULF; timezone should have a : separator in it.
int unixdate; // used to hold the Unix-format date
long javadate; // used to hold the millisecond Java date
Date msgdate; // used to hold the converted message date
while ( (line = brSourceFile.readLine()) != null ) {
if ( line.indexOf("\f") != -1 ) {
// If we're looking at a formfeed, skip it by re-running the loop
continue; // This goes back to the top
}
xmlout.writeEntity("message");
msgs++; // Increment the message counter
String direction = line; // First real line should be "IN" or "OUT"
xmlout.writeAttribute("direction", direction);
if (direction.indexOf("IN") != -1) {
// If it's an incoming message...
xmlout.writeAttribute("sender", farEndName);
} else {
// If it's an outgoing message...
xmlout.writeAttribute("sender", sNearEndName);
}
brSourceFile.readLine(); // Then is the string "MSG", we skip it
String timestamp = brSourceFile.readLine(); // Next should be timestamp
if ( !bNoUnixDate.getValue() ) {
xmlout.writeAttribute("unixtime", timestamp); // Write it unconverted
}
unixdate = Integer.parseInt(timestamp);
javadate = (long) unixdate * 1000; // Java uses ms, Unix uses secs
msgdate = new Date(javadate); // Convert the long to a date object
xmlout.writeAttribute("time", df.format(msgdate));
brSourceFile.readLine(); // Then skip the second, redundant timestamp
String message = brSourceFile.readLine(); // Then read the message
xmlout.writeText(message); // write the message out
xmlout.endEntity(); // close the </message>
System.out.println("Direction: " + direction);
System.out.println("Timestamp: " + timestamp);
System.out.println("Message: " + message);
System.out.println("MESSAGES PROCESSED: " + msgs);
} // end while
xmlout.endEntity(); // close </chat>
brSourceFile.close(); // Close the buffered reader
xmlout.close(); // close the writer
} // end of the try block
catch (Exception e) {
System.err.println("Error: " + e.getMessage());
System.exit(1); // Exit with a nonzero status
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment