0:00 Hi. Today we're looking at file input and output. Input and output, by the way, is frequently abbreviated I/O. Usually there's a slash in between, but not always. Why do we need files at all? After all, we have variables, we can remember things. Now we have arrays, we can put lots of stuff in the array. 0:27 The reason we want to keep files is to remember things long term. 0:33 Because after your program is done running, everything's forgotten. That's in the variable. Variables only keep their values as long as the program is running. 0:45 And so persistence means simply that the value remains after the program is done running. By the way, apps that run on smartphones and mobile devices, they also use files, they have a different kind of file system typically, than the one found on a desktop, but they still use files for persistence. That's how your app can remember something, even after you turn off your phone. 1:17 Now, when you do have files that contain data that you want to remember for a while, it's a good idea to back them up. You're becoming a computer scientist, computer scientists know that computer hardware can fail, and they make appropriate backups. 1:37 So that's important. But the other thing is you can copy files around, you can send them to your friends, you can do lots of things with files. Files always have a name. That's how we can tell one file from another, is because they have different names. And files may or may not have data. Now, the overwhelming majority of files have data, but there are files that are empty. Today, we're going to talk about two particular types of data, text data, which we're going to talk a lot about, and binary data, which we're only going to talk a little bit about. 2:25 slide three. 2:28 Let's say we have a text file and in this example, and I believe the book also calls it input.txt. But it could be any file that contains text data. 2:41 And so we create a variable type File. And it's actually a file name, it doesn't actually access the file, it represents the file name, 2:56 Then we can create a new scanner that reads from the file with this name. So in this case, it reads from the file input.txt. And if we want to read the file, line by line, for example, we can have a while loop that says while there is a next line, we read the next line and then we do something with it. At the end, once we're done using the file, we close the file. Now for files that we're reading, closing is not essential: it's good form, but it doesn't really make any difference. If we were writing to a file, then close is essential, because some of the data that you write to the file is actually kept in memory. And you don't know which part of what you wrote is still in memory and which part is already in the file. And so when you close the file, all that data that's still in memory gets written back to the file. 3:59 And so closing output files is very important. 4:04 Now, we talked about closing file, that must mean we must be opening the file. And that's what happens when you create the new scanner, that opens the file. And in fact, that could fail if the file doesn't exist. 4:29 Now interestingly, the file name doesn't have to have the .txt. If you're using a scanner, Java assumes it's a text file. 4:47 Now, on slide four, 4:54 we may want to use a dialog box. This is something you've had lots of experience using. It is relatively simple to create a dialog box. It's called the JFileChooser. 5:06 It's part of javax's swing group of interface libraries. And the file chooser is fairly simple. 5:20 You put up a show open dialog. 5:27 And if that is successful, and it might not be -- the user isn't required to choose a file, when you pop up a file chooser. If that is successful, then you can call getSelectedFile from that same file chooser. And that gives you a variable of type file, which again, you can use to create the Scanner and read the file. If you're going to create the file, if you're going to let the user choose a file name for output, you should use showSaveDialog which is slightly different. And it allows you to create a new file of course. 6:07 And I encourage you to take a few minutes and write a small program that does this and see what this looks like in your computer, not just mine. 6:21 Slide five. 6:24 Sometimes when we're specifying a file name, especially in on a Windows machine, we have to use backslashes as part of the filename. And backslashes are normal characters, but when they are used in Java strings, they are escape characters, they say, if you see a backslash in the string, they say to the compiler, right, the compiler has to convert the string you specified in your source code to 6:54 a string used internally in the Java program. And so when the Java compiler sees a backslash, it says, oh, the next character is special. And so \n means for example a newline, rather than the character n or the character \. Now the tricky part is if you do want to have a backslash in your string, then you have to put backslash, backslash \\. And once the compiler has processed that, that turns out to be a single backslash: just like \n is a new line, \\ is a single backslash in your string. An escape character is a general concept in programming languages. It just says the next character has to be understood differently than it would be if the escape character didn't precede it. 7:56 But because backslash in the string has a special meaning, when we want to put it in a string, we have to escape the backslash itself. This is something common in programming languages. It's not just Java. 8:13 And in particular, at the very end of the slide five, I have an example where you might want to include double quotes inside the string. But if you put the double quote directly in the string, that's the end of the string, right, so you have to escape the, quote character. 8:32 And you do that again with a backslash, because in Java strings, backslash is the escape character. Some languages have different escape characters, but backslash is fairly common. 8:47 Slide six, 8:51 instead of just reading a text file, occasionally we might want to write a text file. And the there are many ways of doing this, but one way to do it is using a printWriter. And we have to be a little bit careful because when we open a file by creating a new printWriter we actually create the file if it didn't exist, but if it did exist, opening it removes all the contents that it already had, and gives you an empty file. 9:30 And that's fine if you really want to write to that file. But it's not fine if that file had something important. 9:39 So you do want to be a little bit careful when you're writing, make sure you want to write to that file. 9:46 I mentioned we do have to close output files otherwise data loss is likely or possible. 9:58 Between the open and the close, we can print text to the file, just the way we print it to the screen. So these two lines of code print two lines to the file, we could of course have loops, we could print hundreds or thousands of lines of file, we can, with an infinite loop, we can easily make a file that will fill up all our disk space. 10:28 One thing to note is not just that we are required to close output files, but also that we should have input files and output files be different files. 10:40 And if you think about it, let's say you open the same file for input and output. Well, as soon as you open it for output, the entire file is empty. What happens with the input stuff? Well, if you're lucky, the input just fails. 10:55 If you're not lucky, you might get a partial input or something. But most likely, you'll just get nothing. So avoid using the same file for input and output. And this is [true] at this point in your development as programmers. 11:14 Eventually, you might use a file as a database, for example, and then you might want to read from it and write to it at the same time. And then you would not use printWriter, you would use other primitives. 11:32 Slide seven, 11:34 I mentioned this lecture is mostly about text data. But I did want to mention binary data, and it's also in the book. And as far as the file is concerned, there is not much difference between text data and binary data. Both are sequences of bytes. The main difference is, in binary data, each byte may have any value. If a file is a text file, the bytes should have values that are printable characters. There are many byte values that are not printable characters. As you can see, if you try to print to the screen a binary file, you'll get something funny like in this example. And so so binary data can be any byte value, and a byte being eight bits, it can take values between zero and 255. 256 is too large to fit to the byte, you would need nine bits. 12:40 So if you have a text file, and you're using binary data operations, that's fine, no problem, 12:56 because bytes used in text files are a subset of the bytes used in binary files. But if you try to use text file operations to read and write a binary file, you will get into trouble, 13:11 in particular, when you're trying to read. Binary data simply means we don't know what the data means. We treat it just as stuff we need to copy exactly, for example, 13:30 or we, can check to see whether two binary files have the same content, or whether parts of the binary file have the same contents, or whether one binary blob is contained in the bigger binary blob, that kind of things. We can do comparisons, we can copy. But binary data does have meaning, right? images are binary data, they have a meaning. BUt if we're just looking at the file as binary data, we cannot tell what the meaning is. Now another program that 14:12 understands the structure of the file will be able to for example, display the picture or play the audio. 14:21 slide eight. 14:24 This is how we read and write binary data we use an input stream. Input stream has constructors. So we can build an input stream that reads from a file and input stream that reads from a URL. The book has a nice example of reading an image from their website and saving it to a file. Input streams has a read method. It returns an integer between zero and 255 inclusive 15:02 if the read operation is successful, otherwise it returns the integer -1. Here -1 is used as a sentinel to say the read operation cannot be completed, it does not succeed. OutputStream has quite a few choices, the only one we'll 15:26 look at for now is java.io.FileOutputStream. And the write operation is totally symmetrical to the read operation, it takes as parameter byte represented as an integer. And again, as a final reminder, do close output streams just like you close output files, otherwise, you may lose data. 15:55 slide nine. 15:59 Now we're back to text files, we're done with binary data. 16:05 So we can create a scanner using a file. 16:12 I give it a file name and it creates a scanner. Scanners are very general, and they can parse strings instead of files. After all, most of the internal machinery for a scanner is 16:25 going to be the same, because the text file is a sequence of characters. 16:30 And a string is a sequence of characters. So why not apply that same machinery to give us more flexibility. Likewise, when you load a web page from a web server, that will get you a sequence of characters. And you can read that just as you would a file or a string. 17:01 Slide ten 17:03 There are several methods, and I think we've seen all of these before, so this is just a summary and a reminder. The next() method returns a string that contains the next word. And the next word ends at the first blank following the word -- first blank or new line, or the end of the input. nextLine goes all the way to the end of the line or the end of the input, So it may return more than next would return. 17:36 And we've seen two methods to parse numbers, one parses them as floating point numbers, and the other one as integers. Now, for each of these four methods, we have a corresponding has method. And that 17:57 has method returns a Boolean and it's true if we are able to call for example, in.next to get another string. If we aren't able to, then it returns false. So here's the four corresponding has methods: in.hasNext(), in.hasNextLine(), in.hasNextDouble(), and in.hasNextInt(). And all of these return a Boolean even hasNextDouble and hasNextInt, they just say, if you try to read, you will get something. Let's think a bit about in.next. And I said in.next stops reading when it reaches a blank or a new line or the end of the input. 18:50 The name for something like this is a delimiter. And a delimiter delimits, [which] means it puts the start and the end at the for a word. 19:04 Now, scanners in Java are designed to be flexible, right you already saw, we can read equally well from a file or a string or a URL. And we can change the delimiters. In particular, we can specify that the delimiter is a full string. So then the string is used to break up the input. Here I give you a funny string that uses the substring yes, with a space before and a space after, uses that string three times. 19:45 And so when we call in.next [4 times], we'll get four strings, none of which contains yes, because yes is the delimiter. 19:54 So it's not returned by in.next 20:00 There is a special way of doing this, we can use the empty string to say don't use any delimiters -- just give me back one character at a time. When I call in.next, of course, in.next returns a string. And so, the string that is returned will have length one and will contain the next character, whatever [character] that may be, it may be a letter is maybe a digit, it may be a space, it may be a new line. 20:32 Slide 12. 20:36 In the default case, we use a space or a new line as a delimiter. When we call useDelimiter, we can say don't just use one character as a delimiter. Use any one of several characters, and the characters that I want you to use are listed in square brackets. So I can say use any digit as a delimiter by saying useDelimiter of open square brackets zero, hyphen, nine, close square bracket. 21:16 And zero to nine is a regular expression. 21:23 It says all the digits between zero and nine. Now you do have to be a little careful, you could just list all the digits 0123456789, that will be fine. But you cannot say nine hyphen zero. And that's based on the way characters are defined, zero is less than nine. So when you use the hyphen notation, you have to put the lower character first and the upper character 21:51 later, we see that below. In the example A to Z, a hyphen z is correct, z hyphen a would not work. And so in the lowercase, the uppercase, and the digits, we always put the lower valued character first and the higher value character second. 22:13 Now back to the bullet in the middle. 22:15 We can just list the characters we're interested in using as a delimiter. Here I put period comma semicolon and then colon, which are punctuation marks. And in.next will read the entire string until it reaches one of these marks or of course until it reaches the end of the input. Now in the example where I gave a to z, A to Z, and zero to nine, 22:46 I put an up-arrow or caret character first. 22:53 And that's used as a negation in regular expressions. So it means I want the set of all characters that are not alphanumeric, I want the set of all those characters to be my delimiters. 23:10 So for example, after that call to in.useDelimiter, space is a delimiter, all the punctuation marks we've looked at are delimiters, question mark, exclamation mark, ampersand, the pound sign, all of those are delimiters because they're not alphanumeric. 23:33 Ah, regular expressions. This is just a very simple example. It's useful. That's why we're talking about it. Regular expressions are very general. They are used fairly widely, not just in Java, they are used in shell scripts. They're used in languages such as Python, they're found fairly commonly. They are theoretically sound. They've been analyzed since the early days of computing because the people who are writing compilers found that using a regular expression 24:14 was a good way to formally define part of the syntax of a language. 24:25 This brings us to the idea of character classes. And there's a class called Character in Java, and it has several Boolean methods that tell us whether a character belongs to a particular group of characters. For example, whitespace includes newline, tab, and space. Digits you know about. Letters are upper or lowercase letters, or you can test specifically whether a character is an uppercase or lowercase letter. 25:05 There is, as usual with Java, there's full documentation online. So feel free to look at that. 25:15 There's also another method that's good to know about. If you have a string, and it may have initial or final blanks and you want to get rid of those, just call that String.trim(). 25:31 And that returns a new string without those initial or final blanks. In this example, the string with blanks begins with two blank characters and ends with one blank character. And when we trim it, all of those blanks at the beginning and end are removed, but the blank in the middle remains. 26:00 Slide 14. 26:03 This is a few more details about something we've looked at already. We can have parseDouble, and that's from the class Double. We can have parseInt from the class Integer. 26:17 You should know that the 26:23 number that we're parsing should fill the entire string. The string may have initial or final blanks, but it should not have extraneous things, it's not enough to just have the number at the beginning of the string, 26:37 the number should pretty much be the only thing in the string. 26:42 When we're using scanners, we can use hasNextInt, and hasNextDouble. 26:51 Slide 15. 26:53 A little more about text output now, and specifically about printf. Again, you've seen most of this, some of it is a review, but some of it is new. I think I mentioned using %s in the format string to print the string. 27:15 I think I've mentioned it, but I I don't think I've emphasized it enough. So now you know %s to print the string. And of course, for each of these percent somethings we need the corresponding argument after the format string. And the arguments after the format string has to be in the order the percent commands inside the format string. %d prints a decimal integer. If you need to remember, the d stands for decimal. %f is a floating point number. And the argument is usually a double, that's fine. %e, I don't think we've seen before, is like %f, but printed here with the E notation, so with the exponent following the mantissa, as it's called. %g gives the system the freedom to pick between the %e notation and the %f notation. So it may print it either way. And something I don't think we've looked at before, %x is the same as %d, but prints it in hexadecimal. And by the way, if you want to print the percent sign, you just put %%, then that one does not need a matching argument following the format string. 28:45 So kind of another example of escaping. 28:52 slide 16. 28:57 We don't simply use %d, we might say %3d. So between the percent and the format character, we can specify a size. And that's a minimum size, things can always be longer than that. If I print a number greater than 999 with %3d, all the digits are printed. 29:26 But if it's a number less than 100, then %3d will add one or two blanks. 29:32 And if we want to 29:36 put the blanks to the right rather than the left of the number we use %-3d. And that by the way is called left justified, the number is pushed as far to the left as possible. As opposed to right justified, where the number is pushed as far as possible to the right. If we want the print an integer with leading zeros we can use %03d. 30:03 For accounting purposes, sometimes negative numbers are written with parentheses around them, and we can make it %(3d or %(5d. And by the way, we can use %(d, that's fine, it won't specify a minimum field width. Floating point numbers have a slightly more complicated notation, we can give a field width and optionally, we can give a point and then the number of digits that should be printed after the decimal point. 30:46 So this is all stuff, again, this should be a reminder, 30:50 some of it is a reminder that %5.2f you've seen before, but some of the others you haven't. So those are good to remember. They're useful from time to time. 31:04 Slide 17. 31:07 We have seen command line arguments already. This is just both as a reminder, because most of us need to hear things more than once to remember them, so it's a useful reminder. But it's also to give you the idea that very often command line arguments represent files. 31:31 Now, that's not true if we start the command line argument with a minus. Many commands, take a minus v option, or minus v flag, or minus v switch. All those words are used: switch option or flag and minus v means Hey, print more information, so I can tell what the program is doing. 31:55 Sometimes that's written with two minus signs, and then it's spelled out, as --verbose. 32:04 Now, we have seen that if we don't care about the order of the argument, we can just use the enhanced for loop to go through all the arguments and process them. 32:18 And on slide 18, 32:22 I've shown a very simple program to just print the contents of every file named on the command line. 32:32 And we use an enhanced for because we just process the arguments one at a time, we don't need the index, we don't need to relate one argument to the others. 32:43 And all we do with each argument is we print the contents of the file whose name is that argument. 32:55 The body of printFileContents is just five lines. It basically does what we did in earlier slides today, it creates a File corresponding to the file name, it creates a new scanner to look at the contents of that file. 33:19 And while there is a new line, print that line. That line, we get with in.nextLine after testing that in.hasNextLine returns true. Now, before you move to slide 19, I would like you to try and run this at home. And it should only take you a minute. And I'm guessing it will not work. But please try it. If it works, it will be great, but I'm guessing it will notwork. And please try it and see if you can understand what the error message says. It's a slightly complicated error message. But take a minute to try and understand that because it will be explained in the next slide. Okay, stop here, pause and then come back. Once you've done that, and I'll continue. [blank text to give you a chance to try to compile the code on slide 18, text continues below] 34:12 slide 19, 34:16 We find out the code won't compile because the operation of creating a new scanner from a file name (or the type File in Java) may cause an exception. In particular, if you create the scanner on a file 34:37 with the file name that doesn't correspond to an actual file, because the file doesn't exist, you'll get the file not found exception. And in particular, in Java and in many languages, you say that the program throws an exception. Now, I think in the next lecture, we'll learn how to catch exceptions. So this is just like throwing a baseball or softball, you throw the ball for it to be caught. 35:10 In the same way we throw an exception. 35:13 Most of the time the exception is thrown, we will want to catch it. But we won't learn that today, we will learn that next time. On the other hand, we're still left with how do we compile our program to print the files? And the answer is, it's easy, we just tell the compiler, we notify the compiler that the method throws the exception. 35:38 And we do that in the method header. 35:40 Right after the argument list, we say the method throws in this case that the method throws the file not found exception, we have to do it because we don't catch the exception anywhere, we have to do it to both methods. Because if printFileContents throws the exception, then since we don't catch that exception in main, main will be throwing it also. So we need both methods to declare that they throw java.io.FileNotFoundException. And with this, and again, I encourage you to go back to your own development system, please test that this should compile this should run. And if you give it any number of file arguments, it will actually 36:30 print their content. And I strongly encourage you to do this with text files because if you do it with binary files, your screen will look rather messed up. Slide 20 this is just something fun. And it's also in the book. It talks about the Caesar code. And Caesar lived more than 2000 years ago, before mathematics was as advanced as it is now. And he came up with a clever way of writing his letters, so the recipients could read them given the key. But everybody else would think they're jibberish. And in particular, we simply methodically replace each letter with the letter three positions later, where the three could vary, right? It could be 10 positions later, or whatever. But for now, let's do it with three positions later. So then hello world becomes something very incomprehensible. When we want to decrypt that, we use the same substitution, but we go back three letters (or however many letters) instead of forward three letters. Now while this is fun, and I encourage you to experiment with this, don't use it for anything you seriously want to keep secret, because mathematics has had great strides in the last 2000 years. And these days, it's relatively easy to decrypt a message encrypted with a Caesar cipher. And in particular, people have studied things like letter frequencies, and here the same letter always gets encrypted with the same other letter. So if you reuse a letter 38:29 that will appear always as the same letter in the encrypted version. And you can see it in our example, hello world has three L's. All of those are encrypted as O's, and that gives someone who's tried to break your cipher a lot of hints. A special form of the Caesar Cipher is to 38:59 use an offset of 13. 39:03 Because English has 26 letters, that makes decryption the same as encryption, so A becomes M, and M becomes A. B becomes N and N becomes B. And that's true whether we're decrypting or encrypting. 39:23 Okay, we're at the end of today's lecture material. You have seen some of these things before, only some of it is new. But I'm sure most of you needed that reminder. My apologies to those of you who didn't. But even those of you who didn't need the reminder, you've learned something new. So that's useful. The designers of these libraries in Java made a conscious decision to make reading from files and writing to files very much -- or as much as possible -- the same as reading text input from the user or printing messages to the user. 40:09 So this is made very similar for several reasons. 40:12 One is they were able to write the same code just once: with only minor tweaks, it can do rather different things. But the other reason is, people writing programs, they only have to learn one way of doing things. They don't have to learn: this is how we print the message to the user, and we have a completely different way of doing things when we put things in a file, when we put text in the file. One thing you learned today is we have lots of different ways of creating scanners. And although they, in some sense, in the general sense all work the same way, they can work with very different forms of input. And this can be very useful. 40:56 And similar things. For printf. Now you have a lot of mechanisms at your disposal to produce text input, the text output, whether to files, whether from files, whether to the user, whether from the user. We can do that, pretty much as you may want to do it. I encourage you to try reading a web page from a web server. And you're welcome to read any web page from this course. And it's easy to do and maybe you can do two things with the contents. One is output it to the screen and the other is save it to the file at the same time. You can write a program to do both. That's easy enough. 41:46 That's it for today. I'll see you in class.