The Mullinator
Newbie
- Joined
- May 24, 2003
- Messages
- 5,794
- Reaction score
- 0
Here is my task.
For example, in this sequence: "AATTGCCTTTTAAAAA" I am searching for the pattern "AA", however I can only figure out how to have it return one position that it exists in even though it exists in multiple positions. I realize the method I am using only returns an integer so it can't be used to find all of the positions but then what method can I use?
Here is my code so far, sorry about the bad format:
You are to design and implement a program that can find matches for a given pattern in a portion of a DNA sequence. The file DNAtest will contain DNA sequences and patterns. Each will be on a separate line, with the DNA sequence on one line and the pattern on the following line. Your program should:
-Reach in each pair, DNA sequence and pattern, one pair at a time.
-For each pair, output the positions within the sequence of where the pattern occurs; one output per line.
-The wild card (i.e., "_") must match some base -i.e., cannot match "nothing".
-If the pattern does not match any part of the sequence, a message should be generated indicating such.
-Make sure that each sequence and each pattern contain only the letters A,T,C and G; if not generate an appropriate message.
I have two problems, one I can't figure out how to deal with these "wildcards" (the "_"'s). Two I can't figure out how to make the program search in a string again if it contains more than one instance of the pattern it is searching for.The textfile to read from:
AATTGCCTTTTAAAAA
ATTG
AATTGCCTTTTAAAAA
TG_
AATTGCCTTTTAAAAA
AAG
AATTGCCTTTTAAAAA
TT_C
AATTGCCTTTTAAAAA
TTT
AATTGCCTTTTAAAAA
AA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GC
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
CGGTA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGT
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
T__T
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
C_T_T
For example, in this sequence: "AATTGCCTTTTAAAAA" I am searching for the pattern "AA", however I can only figure out how to have it return one position that it exists in even though it exists in multiple positions. I realize the method I am using only returns an integer so it can't be used to find all of the positions but then what method can I use?
Here is my code so far, sorry about the bad format:
import java.util.*;
import java.io.*;
import java.lang.*;
public class Main {
public Main() {
}
public static void main(String[] args)throws IOException {
System.out.println("DNA testing program");
System.out.println("");
String fileName= "C:/Programming/Projects/DNA/DNAtest.txt";
BufferedReader fileIn = new BufferedReader (new FileReader(fileName));
String currentLine = "";
String thePattern = "";
for(int x = 0;thePattern != null;x++){
currentLine = fileIn.readLine();
thePattern = fileIn.readLine();
if (thePattern != null){
int theNum = currentLine.lastIndexOf(thePattern);
if (theNum == -1){
System.out.println(thePattern + " Does not occur in this sequence.");
}
else{
System.out.println(thePattern + " Occurs at character position: " + theNum);
}
}
}
}}