A Convertor from Eland2 to expression Files

Mailing list - this is a general purpose mailing list for deep sequencing/sequencing by synthesis.
Download - download the pacakge from our server
Bugtracker - Choose Analysis Tools | Eland2Exp

The aim of this program is to convert our Illumina genome analyzer 2 to an expensive microarray scanner. This is done by counting the occurences of fragments in the sample and associating them with particular genes. The advantage of doing so is that we obtain more information than by probing for specific sequences. The disadvantage is of course that there are somewhat more metrics involved than first meets the eye. For isntance, fo we wnat the average gene expression? The maximum occurence sequence occurence ? Do we only count the exons, or do we include the introns as well.

The program below makes it possible to obtain all this information in a ready-to-use format. The input consist of a gene location file that describes the genome with three columns. The first column is the gene identifier, this can be anything. The second column contains 'transcript' or 'exon'. 'Transcript' implies that the following last two columns contain the start and stop position of the transcript. If the second column lists 'exon' then the last two columns list the start and stop position of the exon related to that gene. The third column contains the chromosome on which this gene occurs.

An interesting problem with this kind of program is splice-variants. Many genes can have different transcriptions and they might overlap. This means that it is not necearily easy to specify whether a specific gene position relates to the exon or introns. We currently assume that it belongs to an exon if it is present in any of the possible exons. If one is specifically interested in transcription variants then the location file should instead of using the gene-id in the first column use some form of transcription-id.

The program also does not reclaim short reads that jump from one exon to another because these are not alligned by Eland. This means that around 18% of the reads will be missing in any case.

Usage: eland2exp <positionfile> <fragmentsize> <strand> An example of a location file:
gid     chrom   tid     dir     start   stop    rank
1       2L      1       1       7529    8116    1
1       2L      1       1       8229    8589    2
1       2L      1       1       8668    9491    3
2       2L      2       -1      9836    11344   9
2       2L      2       -1      11410   11518   8
2       2L      2       -1      11779   12221   7
2       2L      2       -1      12286   12928   6
2       2L      2       -1      13520   13625   5
2       2L      2       -1      13683   14874   4
To run the program on lane 7 for instance we can use:eland2exp drosmel-geneid2loc.tsv 150 0 <s_7_export.txt >b.tsv