Help with part of my java program.

Joined
May 24, 2003
Messages
5,794
Reaction score
0
Here is my task.

You are to design and implement a program that can find matches for a given pattern in a portion of a DNA sequence. The file DNAtest will contain DNA sequences and patterns. Each will be on a separate line, with the DNA sequence on one line and the pattern on the following line. Your program should:

-Reach in each pair, DNA sequence and pattern, one pair at a time.
-For each pair, output the positions within the sequence of where the pattern occurs; one output per line.
-The wild card (i.e., "_") must match some base -i.e., cannot match "nothing".
-If the pattern does not match any part of the sequence, a message should be generated indicating such.
-Make sure that each sequence and each pattern contain only the letters A,T,C and G; if not generate an appropriate message.
The textfile to read from:
AATTGCCTTTTAAAAA
ATTG
AATTGCCTTTTAAAAA
TG_
AATTGCCTTTTAAAAA
AAG
AATTGCCTTTTAAAAA
TT_C
AATTGCCTTTTAAAAA
TTT
AATTGCCTTTTAAAAA
AA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GC
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
CGGTA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGT
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
T__T
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
C_T_T
I have two problems, one I can't figure out how to deal with these "wildcards" (the "_"'s). Two I can't figure out how to make the program search in a string again if it contains more than one instance of the pattern it is searching for.

For example, in this sequence: "AATTGCCTTTTAAAAA" I am searching for the pattern "AA", however I can only figure out how to have it return one position that it exists in even though it exists in multiple positions. I realize the method I am using only returns an integer so it can't be used to find all of the positions but then what method can I use?

Here is my code so far, sorry about the bad format:
import java.util.*;
import java.io.*;
import java.lang.*;

public class Main {

public Main() {
}

public static void main(String[] args)throws IOException {

System.out.println("DNA testing program");
System.out.println("");

String fileName= "C:/Programming/Projects/DNA/DNAtest.txt";
BufferedReader fileIn = new BufferedReader (new FileReader(fileName));
String currentLine = "";


String thePattern = "";
for(int x = 0;thePattern != null;x++){
currentLine = fileIn.readLine();
thePattern = fileIn.readLine();

if (thePattern != null){
int theNum = currentLine.lastIndexOf(thePattern);

if (theNum == -1){
System.out.println(thePattern + " Does not occur in this sequence.");
}
else{
System.out.println(thePattern + " Occurs at character position: " + theNum);
}


}
}

}}
 
Hmm, being unfamiliar with java I can't fully comment on things, but to indicate multiple instances of a pattern I'd return an array of ints each pointing to the starting index of the pattern. From a C perspective I'd just have a pair of pointers walking through, one to verify a pattern and another to indicate the start of the pattern. Basically you would have the start of the pattern go through the string linearly and each time a matching start is found you would then run the pattern pointer ahead until it didn't match or the pattern ended. If the pattern ends you then increase the counter and add the index of the current pattern to the array of indices. Remember the array of indices can be no longer than the string.

The simple way to check if a character is valid is to kick each read character into a switch statement the default of the switch is the error message, but there is a case for each possible letter. This guarantees compatibility with multiple char formats.

I'm guessing you are somewhere that teaches Java first and C later ? I mention pointers but it is possible to do the same thing with indices just more difficult to my thinking and theoretically more costly in processing.
 
Oh and I'd also set it up so you can run the search in reverse for extra points! remember though that you need double the possible indices in that case.
 
If I understand the assignment corectly, the _ is probably like a * aka anything.

Code:
AATTGCCTTTTAAAAA
TG_
here _ is C

and here
AATTGCCTTTTAAAAA
TT_C
it's G

as for string search, I don't know enough java, and I don't know if you can use pointers or if you can access individual elements of the strings, but my suggestion is to start a loop and take a substring of the current line, starting from the last found position.
I made a little change to the else, but in C++, figure out how you would do that in Java
Code:
else
{
	

	// C++ code ..
	string sub = currentLine;
	while(theNum != -1)
	{
			//jave line
			System.out.println(thePattern + " Occurs at character position: " + theNum);
			
			sub = sub.substr(theNum+1); //c++
			
			//jave line
			int theNum = currentLine.lastIndexOf(thePattern);
	}



}
I used LastIndexOf but you need to look for the first occurance of the result. in c++ I would use find, but I'mnot sure if find would return -1 for not-found.

anyway maybe writing your own search function would be a better idea.
trace through the line and record all the places you find the pattern.
 
Back
Top