Skip to content

Instantly share code, notes, and snippets.

@elvismdev
Last active July 4, 2023 05:30
Show Gist options
  • Save elvismdev/12ba4e6efc01730e193c to your computer and use it in GitHub Desktop.
Save elvismdev/12ba4e6efc01730e193c to your computer and use it in GitHub Desktop.
A Java small class to find all the genes from a DNA string stored in a plain text file. The library edu.duke is a dependency for the class to work, it should be added into the Java IDE to compile with no errors. Download link http://www.dukelearntoprogram.com/downloads/archives/courserajava.jar
/**
* Find all the genes from a DNA string file and using StorageResource class.
*
* @author (Elvis Morales)
* @version (1.0)
*/
import edu.duke.*;
import java.io.File;
public class FindMultipleGenesStorage {
public int findStopIndex(String dna, int index) {
int stop1 = dna.indexOf("tga", index);
if ( stop1 == -1 || ( stop1-index ) % 3 != 0 ) {
stop1 = dna.length();
}
int stop2 = dna.indexOf("taa", index);
if ( stop2 == -1 || ( stop2-index ) % 3 != 0 ) {
stop2 = dna.length();
}
int stop3 = dna.indexOf("tag", index);
if ( stop3 == -1 || ( stop3-index ) % 3 != 0 ) {
stop3 = dna.length();
}
return Math.min( stop1, Math.min(stop2, stop3) );
}
public StorageResource storeAll(String dna) {
String dnaLow = dna.toLowerCase();
int start = 0;
StorageResource genes = new StorageResource();
while (true) {
int loc = dnaLow.indexOf( "atg", start );
if ( loc == -1 ) {
break;
}
int stop = findStopIndex( dnaLow, loc+3 );
if ( stop != dna.length() ) {
genes.add( dna.substring(loc, stop+3) );
start = stop + 3;
} else {
start = start + 3;
}
}
return genes;
}
public void testStorageFinder() {
FileResource dnaFile = new FileResource();
StorageResource genesFound = storeAll( dnaFile.asString() );
System.out.println( "Number of genes found: "+genesFound.size() );
printGenes( genesFound );
}
public float cgRatio( String dna ) {
String dnaLow = dna.toLowerCase();
int cgCount = 0;
int start = 0;
while (true) {
int pos = dnaLow.indexOf("c", start);
if (pos == -1) {
start = 0;
break;
}
cgCount += 1;
start = pos + 1;
}
while (true) {
int pos = dnaLow.indexOf("g", start);
if (pos == -1) {
start = 0;
break;
}
cgCount += 1;
start = pos + 1;
}
return ( (float) cgCount ) / dna.length();
}
public void printGenes( StorageResource sr ) {
int sixtyCharQty = 0;
int highCgRatioQty = 0;
float cgRatioConst = (float) 0.35;
for ( String s : sr.data() ) {
if ( s.length() > 60 ) {
System.out.println( "String longer than 60 characters: "+s );
sixtyCharQty++;
}
if ( cgRatio(s) > cgRatioConst ) {
System.out.println( "String with C-G-ratio higher than 0.35: "+s );
highCgRatioQty++;
}
}
System.out.println( "60 characters qty: "+sixtyCharQty );
System.out.println( "Strings with C-G-ratio higher than 0.35: "+highCgRatioQty );
}
}
@abhaypande023
Copy link

Q.1. ans=1
Q.2 ans=1
Q.3 ans=1

Correct Answer

Aapne bachaliya bhai

@Haniket
Copy link

Haniket commented Jun 22, 2020

/**

  • Write a description of part1 here.
  • @author (your name)
  • @Version (a version number or a date)
    */

import edu.duke.*;
public class part1 {
public int findStopCodon(String dna,int startIndex,String stopcodon ){

  int currIndex = dna.indexOf(stopcodon,startIndex+3);
  while(currIndex!=-1){
     if((currIndex-startIndex)%3==0){
        return currIndex;
        
        }
        else{
         currIndex = dna.indexOf(stopcodon,currIndex+1);
        }
   }
   return dna.length();
}

public String findGene(String dna ){
 int startIndex = dna.indexOf("ATG");
 if(startIndex==-1){
  return "" ;  
 }
 int indexTAA = findStopCodon(dna,startIndex,"TAA");
 int indexTAG = findStopCodon(dna,startIndex,"TAG");
 int indexTGA = findStopCodon(dna,startIndex,"TGA");
 int temp= Math.min(indexTAA,indexTAG);
 int firstStopCodon = Math.min(temp,indexTGA);
 if(firstStopCodon==dna.length()){
     return "";
    }
  String resultString = dna.substring(startIndex,firstStopCodon+3);//line 40
   return resultString;
}

public StorageResource getAllGene(String dna){
StorageResource geneList = new StorageResource();
int startIndex = 0;
while(true){
String currentGene = findGene(dna);//line 50
if(currentGene.length()==0){
    break;
}else{
  geneList.add(currentGene);
  startIndex = dna.indexOf(currentGene,startIndex)+currentGene.length();
  dna = dna.substring(startIndex,dna.length());
  }
}
return geneList;
}
public double cgRatio(String dna){
    StorageResource gene = getAllGene(dna);
    double cgratio = 0;
    for(String g: gene.data()){
       int cOcc=0,oOcc=0,startIndexC=0,startIndexO=0;
       double count=0;

       while(cOcc!=-1||oOcc!=-1){
          cOcc=dna.indexOf("C",startIndexC); 
          if(cOcc!=-1){
            count++;
            startIndexC=cOcc+1;
            }
          oOcc=dna.indexOf("G",startIndexO);
          if(oOcc!=-1){
            count++;
            startIndexO=oOcc+1;
            }
          
        }
       double dnaLength = dna.length();
         cgratio = count/dnaLength;
      return cgratio;
   }
    return cgratio;
}
public int countCTG(String dna){
int count =0, ctgOcc =0,startIndexCTG=0;
while (ctgOcc!=-1){
   ctgOcc=dna.indexOf("CTG",startIndexCTG);
   if(ctgOcc!=-1){
    count++;
    startIndexCTG=ctgOcc+3;
}
}
return count;
}

public void processGenes(StorageResource sr){
 int count = 0,countCg = 0,countgene=1;
 int temp=0;
  for( String g: sr.data()){
     StorageResource geneList=getAllGene(g); //line 116
     
     for(String gList:geneList.data()){
         System.out.println("gene "+countgene +" cointained is "+gList);
         if(gList.length()>60){
        System.out.println("The gene with a length longer then 60 is "+gList);
        count++;
       }
       
       if(temp<gList.length()){
        temp=gList.length();
        }
        else{
        temp=temp;
        }
        countgene++;
        
        double cgRatio=cgRatio(gList);
      if( cgRatio>0.35)
      {
       
       countCg++;
       System.out.println("the "+countCg+" cg ratio is "+cgRatio);
      }
    }  
    int numberOfCTG=countCTG(g);
    System.out.println("the number of time CTG appears is "+numberOfCTG);
  }
 System.out.println("the number of string with length longer then 60 is "+count);
 System.out.println("the no of string with cgratio greater then 0.35 "+countCg);
 System.out.println("the longest length of gene in the dna is "+temp);
}
public void testProcessGene(){
     FileResource fr=new FileResource("reviewTest.fa");
     String dna1 = fr.asString();      
     String dna = dna1.toUpperCase();
     System.out.println(dna);
     StorageResource sr= new StorageResource();
     sr.add(dna);
     processGenes(sr);//line 152
}

}

// reviewTest.fa=https://users.cs.duke.edu/~rodger/GRch38dnapart.fa
//please help me why my answere is not correct where is the bug in my code
// I am getting total no of gene 34 and the answere is 69 help me
//Total number of strings with length greater than 60 : 23(I am getting 11)
//Total number of strings with CG Ratio greater than 0.35 : 40(i am getting 25)
//Longest gene length is : 489 (i got this right)
//Total number of occurances of CTG is : 224 (i got this right)
please help me

@divyanshukla777
Copy link

divyanshukla777 commented Jul 1, 2020

TOTAL GENES = 69
Total number of strings with length greater than 60 : 23
Total number of strings with CG Ratio greater than 0.35 : 40
Longest gene length is : 489
Total number of occurances of CTG is : 224

Cheers!

Thanks! All correct. Actually I was stuck at the longest gene length.

@Harshitha1026
Copy link

can i get week 3 practice quiz and final quiz answer ?

@bbainwar
Copy link

TOTAL GENES = 69
Total number of strings with length greater than 60 : 23
Total number of strings with CG Ratio greater than 0.35 : 40
Longest gene length is : 489
Total number of occurances of CTG is : 224

Cheers!

Can I get your code bro?

@Ananya-Rai
Copy link

Week 3 assignment answers please

@smit-1923
Copy link

How many genes are there in the file brca1line.fa?
Ans 1
How many genes are there in the file brca1line.fa that are longer than 60?
Ans 1
How many genes are there in the file brca1line.fa that have a C-G-ratio greater than 0.35?
Ans 1

Copy link

ghost commented Aug 15, 2020 via email

@anubhavgupta1012
Copy link

How many genes are there in the file brca1line.fa?
Ans 1
How many genes are there in the file brca1line.fa that are longer than 60?
Ans 1
How many genes are there in the file brca1line.fa that have a C-G-ratio greater than 0.35?
Ans 1

Right one

@aaditi594
Copy link

TOTAL GENES = 69
Total number of strings with length greater than 60 : 23
Total number of strings with CG Ratio greater than 0.35 : 40
Longest gene length is : 489
Total number of occurances of CTG is : 224

Cheers!

Hey, can you please send the code...

@Abhey-crypto
Copy link

TOTAL GENES = 69
Total number of strings with length greater than 60 : 23
Total number of strings with CG Ratio greater than 0.35 : 40
Longest gene length is : 489
Total number of occurances of CTG is : 224

Cheers!

Wrong answers

@IlPreteRosso
Copy link

I don't understand why the answers are all ones, I'm finding some valid genes with my code and having them verified with my bare eyes.

@sumitkar02
Copy link

sumitkar02 commented Aug 10, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment