0:00
Hi. Today we're looking at file input and output. Input and output,
by the way, is frequently abbreviated I/O. Usually there's a slash in
between, but not always.

Why do we need files at all? After all, we have variables, we can remember
things. Now we have arrays, we can put lots of stuff in the array.

0:27
The reason we want to keep files is to remember things long term.

0:33
Because after your program is done running, everything's forgotten. That's
in the variable. Variables only keep their values as long as
the program is running.

0:45
And so persistence means simply that the value remains after the program
is done running. By the way, apps that run on smartphones and mobile
devices, they also use files, they have a different kind of file system
typically, than the one found on a desktop, but they still use files for
persistence. That's how your app can remember something, even after you
turn off your phone.

1:17
Now, when you do have files that contain data that you want to remember
for a while, it's a good idea to back them up. You're becoming a computer
scientist, computer scientists know that computer hardware can fail,
and they make appropriate backups.

1:37
So that's important.  But the other thing is you can copy files around,
you can send them to your friends, you can do lots of things with
files.

Files always have a name. That's how we can tell one file from another,
is because they have different names. And files may or may not have
data. Now, the overwhelming majority of files have data, but there are
files that are empty.

Today, we're going to talk about two particular types of data, text data,
which we're going to talk a lot about, and binary data, which we're only
going to talk a little bit about.

2:25
slide three.

2:28
Let's say we have a text file and in this example, and I believe the book
also calls it input.txt. But it could be any file that contains text data.

2:41
And so we create a variable type File.  And it's actually a file name,
it doesn't actually access the file, it represents the file name,

2:56
Then we can create a new scanner that reads from the file with this
name. So in this case, it reads from the file input.txt. And if we
want to read the file, line by line, for example, we can have a while loop
that says while there is a next line, we read the next line and then we do
something with it. At the end, once we're done using the file, we close
the file.

Now for files that we're reading, closing is not essential: it's good
form, but it doesn't really make any difference.

If we were writing to a file, then close is essential, because some of
the data that you write to the file is actually kept in memory. And you
don't know which part of what you wrote is still in memory and which
part is already in the file.  And so when you close the file, all that
data that's still in memory gets written back to the file.

3:59
And so closing output files is very important.

4:04
Now, we talked about closing file, that must mean we must be opening
the file. And that's what happens when you create the new scanner, that
opens the file. And in fact, that could fail if the file doesn't exist.

4:29
Now interestingly, the file name doesn't have to have the .txt. If you're
using a scanner, Java assumes it's a text file.

4:47
Now, on slide four,

4:54
we may want to use a dialog box. This is something you've had lots of
experience using. It is relatively simple to create a dialog box. It's called
the JFileChooser.

5:06
It's part of javax's swing group of interface libraries. And the file
chooser is fairly simple.

5:20
You put up a show open dialog.

5:27
And if that is successful, and it might not be -- the user isn't required
to choose a file, when you pop up a file chooser. If that is successful,
then you can call getSelectedFile from that same file chooser. And that
gives you a variable of type file, which again, you can use to create the
Scanner and read the file.

If you're going to create the file, if you're going to let the user
choose a file name for output, you should use showSaveDialog which is
slightly different. And it allows you to create a new file of course.

6:07
And I encourage you to take a few minutes and write a small program that
does this and see what this looks like in your computer, not just mine.

6:21
Slide five.

6:24
Sometimes when we're specifying a file name, especially in on a
Windows machine, we have to use backslashes as part of the filename. And
backslashes are normal characters, but when they are used in Java strings,
they are escape characters, they say, if you see a backslash in the
string, they say to the compiler, right, the compiler has to convert
the string you specified in your source code to

6:54
a string used internally in the Java program. And so when the Java
compiler sees a backslash, it says, oh, the next character is special. And
so \n means for example a newline, rather than the character n or the
character \.

Now the tricky part is if you do want to have a backslash in your string,
then you have to put backslash, backslash \\. And once the compiler
has processed that, that turns out to be a single backslash: just like
\n is a new line, \\ is a single backslash in your string.

An escape character is a general concept in programming languages. It
just says the next character has to be understood differently than it
would be if the escape character didn't precede it.

7:56
But because backslash in the string has a special meaning, when we want
to put it in a string, we have to escape the backslash itself. This is
something common in programming languages. It's not just Java.

8:13
And in particular, at the very end of the slide five, I have an example
where you might want to include double quotes inside the string. But if
you put the double quote directly in the string, that's the end of the
string, right, so you have to escape the, quote character.

8:32
And you do that again with a backslash, because in Java strings, backslash
is the escape character. Some languages have different escape characters,
but backslash is fairly common.

8:47
Slide six,

8:51
instead of just reading a text file, occasionally we might want to write
a text file. And the there are many ways of doing this, but one way to
do it is using a printWriter. And we have to be a little bit careful
because when we open a file by creating a new printWriter we actually
create the file if it didn't exist, but if it did exist, opening it
removes all the contents that it already had, and gives you an empty file.

9:30
And that's fine if you really want to write to that file. But it's
not fine if that file had something important.

9:39
So you do want to be a little bit careful when you're writing, make sure
you want to write to that file.

9:46
I mentioned we do have to close output files otherwise data loss is
likely or possible.

9:58
Between the open and the close, we can print text to the file,
just the way we print it to the screen. So these two lines of code print
two lines to the file, we could of course have loops, we could print
hundreds or thousands of lines of file, we can, with an infinite loop,
we can easily make a file that will fill up all our disk space.

10:28
One thing to note is not just that we are required to close output files,
but also that we should have input files and output files be different files.

10:40
And if you think about it, let's say you open the same file for input
and output. Well, as soon as you open it for output, the entire file
is empty. What happens with the input stuff? Well, if you're lucky,
the input just fails.

10:55
If you're not lucky, you might get a partial input or something. But
most likely, you'll just get nothing. So avoid using the same file for
input and output. And this is [true] at this point in your development
as programmers.

11:14
Eventually, you might use a file as a database, for example, and then
you might want to read from it and write to it at the same time. And
then you would not use printWriter, you would use other primitives.

11:32
Slide seven,

11:34
I mentioned this lecture is mostly about text data. But I did want to
mention binary data, and it's also in the book. And as far as the file
is concerned, there is not much difference between text data and binary
data.  Both are sequences of bytes. The main difference is, in binary data,
each byte may have any value. If a file is a text file, the bytes should
have values that are printable characters. There are many byte values
that are not printable characters. As you can see, if you try to print
to the screen a binary file, you'll get something funny like in
this example. And so so binary data can be any byte value, and a byte
being eight bits, it can take values between zero and 255.

256 is too large to fit to the byte, you would need nine bits.

12:40
So if you have a text file, and you're using binary data operations,
that's fine, no problem,

12:56
because bytes used in text files are a subset of the bytes used in
binary files. But if you try to use text file operations
to read and write a binary file, you will get into trouble,

13:11
in particular, when you're trying to read.

Binary data simply means we don't know what the data means. We treat it
just as stuff we need to copy exactly, for example,

13:30
or we, can check to see whether two binary files have the same content,
or whether parts of the binary file have the same contents, or whether
one binary blob is contained in the bigger binary blob, that kind of
things. We can do comparisons, we can copy.

But binary data does have meaning, right? images are binary data, they
have a meaning. BUt if we're just looking at the file as binary data,
we cannot tell what the meaning is. Now another program that

14:12
understands the structure of the file will be able to for example,
display the picture or play the audio.

14:21
slide eight.

14:24
This is how we read and write binary data we use an input stream.
Input stream has constructors. So we can build an input stream that
reads from a file and input stream that reads from a URL. The book has
a nice example of reading an image from their website and saving it to
a file. Input streams has a read method. It returns an integer
between zero and 255 inclusive

15:02
if the read operation is successful, otherwise it returns the integer -1.
Here -1 is used as a sentinel to say the read operation cannot be
completed, it does not succeed. OutputStream has quite a few choices, the
only one we'll

15:26
look at for now is java.io.FileOutputStream. And the write operation
is totally symmetrical to the read operation, it takes as parameter
byte represented as an integer. And again, as a final reminder, do close
output streams just like you close output files, otherwise, you may
lose data.

15:55
slide nine.

15:59
Now we're back to text files, we're done with binary data.

16:05
So we can create a scanner
using a file.

16:12
I give it a file name and it creates a scanner. Scanners
are very general, and they can parse strings instead of files. After all,
most of the internal machinery for a scanner is

16:25
going to be the same, because the text file is a sequence of characters.

16:30
And a string is a sequence of characters. So why not apply that same
machinery to give us more flexibility. Likewise, when you load a web
page from a web server, that will get you a sequence of characters. And
you can read that just as you would a file or a string.

17:01
Slide ten

17:03
There are several methods, and I think we've seen all of these before, so
this is just a summary and a reminder. The next() method returns a string
that contains the next word. And the next word ends at the first blank
following the word -- first blank or new line, or the end of the input. 
nextLine goes all the way to the end of the line or the end of the input,
So it may return more than next would return.

17:36
And we've seen two methods to parse numbers, one parses them as floating
point numbers, and the other one as integers. Now, for each of these
four methods, we have a corresponding has method.  And that

17:57
has method returns a Boolean and it's true if we are able to call
for example, in.next to get another string. If we aren't able to,
then it returns false. So here's the four corresponding has methods:
in.hasNext(), in.hasNextLine(), in.hasNextDouble(), and in.hasNextInt().
And all of these return a Boolean even hasNextDouble and hasNextInt,
they just say, if you try to read, you will get something.

Let's think a bit about in.next. And I said in.next stops reading when
it reaches a blank or a new line or the end of the input.

18:50
The name for something like this is a delimiter. And a
delimiter delimits, [which] means it puts the start and the end at the
for a word.

19:04
Now, scanners in Java are designed to be flexible, right you already saw,
we can read equally well from a file or a string or a URL. And we can
change the delimiters. In particular, we can specify that the delimiter is
a full string. So then the string is used to break up the input. Here
I give you a funny string that uses the substring yes, with a space
before and a space after, uses that string three times.

19:45
And so when we call in.next [4 times], we'll get four strings, none of
which contains yes, because yes is the delimiter.

19:54
So it's not returned by in.next

20:00
There is a special way of doing this, we can use the empty string to
say don't use any delimiters -- just give me back one character at a
time. When I call in.next, of course, in.next returns a string. And so,
the string that is returned will have length one and will contain the
next character, whatever [character] that may be, it may be a letter is
maybe a digit, it may be a space, it may be a new line.

20:32
Slide 12.

20:36
In the default case, we use a space or a new line as a delimiter. When
we call useDelimiter, we can say don't just use one character as a
delimiter. Use any one of several characters, and the characters that I
want you to use are listed in square brackets. So I can say use any digit
as a delimiter by saying useDelimiter of open square brackets zero,
hyphen, nine, close square bracket.

21:16
And zero to nine is a regular expression.

21:23
It says all the digits between zero and nine. Now you do have to be a
little careful, you could just list all the digits 0123456789, that will
be fine. But you cannot say nine hyphen zero. And that's based on the
way characters are defined, zero is less than nine. So when you use
the hyphen notation, you have to put the lower character first and the
upper character

21:51
later, we see that below. In the example A to Z, a hyphen z is correct,
z hyphen a would not work. And so in the lowercase, the uppercase, and the
digits, we always put the lower valued character first and the higher
value character second.

22:13
Now back to the bullet in the middle.

22:15
We can just list the characters we're interested in using as a
delimiter. Here I put period comma semicolon and then colon, which are
punctuation marks. And in.next will read the entire string until it reaches
one of these marks or of course until it reaches the end of the input. Now
in the example where I gave a to z, A to Z, and zero to nine,

22:46
I put an up-arrow or caret character first.

22:53
And that's used as a negation in regular expressions. So it means I want
the set of all characters that are not alphanumeric, I want the set of
all those characters to be my delimiters.

23:10
So for example, after that call to in.useDelimiter, space is a delimiter,
all the punctuation marks we've looked at are delimiters, question mark,
exclamation mark, ampersand, the pound sign, all of those are delimiters
because they're not alphanumeric.

23:33
Ah, regular expressions. This is just a very simple example. It's
useful. That's why we're talking about it. Regular expressions are very
general. They are used fairly widely, not just in Java, they are used in
shell scripts. They're used in languages such as Python, they're found
fairly commonly. They are theoretically sound. They've been analyzed
since the early days of computing because the people who are writing
compilers found that using a regular expression

24:14
was a good way to formally define part of the syntax of a language.

24:25
This brings us to the idea of character classes. And there's a class
called Character in Java, and it has several Boolean methods that tell
us whether a character belongs to a particular group of characters. For
example, whitespace includes newline, tab, and space. Digits you
know about.  Letters are upper or lowercase letters, or you can test
specifically whether a character is an uppercase or lowercase letter.

25:05
There is, as usual with Java, there's full documentation online. So
feel free to look at that.

25:15
There's also another method that's good to know about. If you have a
string, and it may have initial or final blanks and you want to get rid
of those, just call that String.trim().

25:31
And that returns a new string without those initial or final blanks. In
this example, the string with blanks begins with two blank characters and
ends with one blank character. And when we trim it, all of those blanks
at the beginning and end are removed, but the blank in the middle remains.

26:00
Slide 14.

26:03
This is a few more details about something we've looked at already. We
can have parseDouble, and that's from the class Double. We can have
parseInt from the class Integer.

26:17
You should know that the

26:23
number that we're parsing should fill the entire string. The string may
have initial or final blanks, but it should not have extraneous things,
it's not enough to just have the number at the beginning of the string,

26:37
the number should pretty much be the only thing in the string.

26:42
When we're using scanners, we can use hasNextInt, and hasNextDouble.

26:51
Slide 15.

26:53
A little more about text output now, and specifically about printf. Again,
you've seen most of this, some of it is a review, but some of it is new. I
think I mentioned using %s in the format string
to print the string.

27:15
I think I've mentioned it, but I I don't think I've emphasized it
enough. So now you know %s to print the string. And of course, for
each of these percent somethings we need the corresponding argument after
the format string. And the arguments after the format string has to be in
the order the percent commands inside the format string.

%d prints a decimal integer. If you need to remember, the d stands for
decimal. %f is a floating point number. And the argument is usually a
double, that's fine. %e, I don't think we've seen before, is like %f,
but printed here with the E notation, so with the exponent following
the mantissa, as it's called. %g gives the system the freedom to pick
between the %e notation and the %f notation. So it may print it either
way.

And something I don't think we've looked at before, %x is the
same as %d, but prints it in hexadecimal.

And by the way, if you want to print the percent sign, you just put %%,
then that one does not need a matching argument following the format
string.

28:45
So kind of another example of escaping.

28:52
slide 16.

28:57
We don't simply use %d, we might say %3d. So between the percent and
the format character, we can specify a size. And that's a minimum size,
things can always be longer than that. If I print a number greater than
999 with %3d, all the digits are printed.

29:26
But if it's a number less than 100, then %3d will add one or
two blanks.

29:32
And if we want to

29:36
put the blanks to the right rather than the left of the number we use %-3d.
And that by the way is called left justified, the number is pushed as far to
the left as possible. As opposed to right justified, where the number is
pushed as far as possible to the right.

If we want the print an integer with leading zeros we can use %03d.

30:03
For accounting purposes, sometimes negative numbers are written with
parentheses around them, and we can make it %(3d or %(5d. And by the way,
we can use %(d, that's fine, it won't specify a minimum
field width. Floating point numbers have a slightly more complicated notation,
we can give a field width and optionally, we can give a point and then
the number of digits that should be printed after the decimal point.

30:46
So this is all stuff, again, this should be a reminder,

30:50
some of it is a reminder that %5.2f you've seen before, but some of the
others you haven't. So those are good to remember. They're useful from
time to time.

31:04
Slide 17.

31:07
We have seen command line arguments already. This is just both as
a reminder, because most of us need to hear things more than once to
remember them, so it's a useful reminder. But it's also to give you the
idea that very often command line arguments represent files.

31:31
Now, that's not true if we start the command line argument with a minus.
Many commands, take a minus v option, or minus v flag, or minus v switch.
All those words are used: switch option or flag and minus v means
Hey, print more information, so I can tell what the program is doing.

31:55
Sometimes that's written with two minus signs, and then it's spelled
out, as --verbose.

32:04
Now, we have seen that if we don't care about the order of the argument,
we can just use the enhanced for loop to go through all the arguments
and process them.

32:18
And on slide 18,

32:22
I've shown a very simple program to just print the contents of every
file named on the command line.

32:32
And we use an enhanced for because we just process the arguments one at
a time, we don't need the index, we don't need to relate one argument
to the others.

32:43
And all we do with each argument is we print the contents of the file
whose name is that argument.

32:55
The body of printFileContents is just five lines. It basically does what
we did in earlier slides today, it creates a File corresponding to the
file name, it creates a new scanner to look at the contents of that file.

33:19
And while there is a new line, print that line. That line, we get with
in.nextLine after testing that in.hasNextLine returns true. Now,
before you move to slide 19, I would like you to try and run this at
home. And it should only take you a minute. And I'm guessing it will not
work. But please try it. If it works, it will be great, but I'm guessing
it will notwork. And please try it and see if you can understand what the
error message says. It's a slightly complicated error message. But take
a minute to try and understand that because it will be explained in the
next slide. Okay, stop here, pause and then come back. Once you've done
that, and I'll continue.

[blank text to give you a chance to try to compile the code on slide 18,
text continues below]


34:12
slide 19,

34:16
We find out the code won't compile because the operation of creating
a new scanner from a file name (or the type File in Java) may cause an
exception. In particular, if you create the scanner on a file

34:37
with the file name that doesn't correspond to an actual file, because
the file doesn't exist, you'll get the file not found exception. And
in particular, in Java and in many languages, you say that the program
throws an exception. Now, I think in the next lecture, we'll learn how
to catch exceptions. So this is just like throwing a baseball or softball,
you throw the ball for it to be caught.

35:10
In the same way we throw an exception.

35:13
Most of the time the exception is thrown, we will want to catch it. But
we won't learn that today, we will learn that next time.

On the other hand, we're still left with how do we compile our program to
print the files? And the answer is, it's easy, we just tell the compiler,
we notify the compiler that the method throws the exception.

35:38
And we do that in the method header.

35:40
Right after the argument list, we say the method throws in this case
that the method throws the file not found exception, we have to do
it because we don't catch the exception anywhere, we have to do it
to both methods. Because if printFileContents throws the exception,
then since we don't catch that exception in main, main will be
throwing it also. So we need both methods to declare that they throw
java.io.FileNotFoundException.  And with this, and again, I encourage you
to go back to your own development system, please test that this should
compile this should run. And if you give it any number of file arguments,
it will actually

36:30
print their content. And I strongly encourage you to do this with text
files because if you do it with binary files, your screen will look
rather messed up.

Slide 20

this is just something fun. And it's also in the book. It talks
about the Caesar code. And Caesar lived more than 2000 years ago,
before mathematics was as advanced as it is now.  And he came up with
a clever way of writing his letters, so the recipients could read them
given the key. But everybody else would think they're jibberish. And
in particular, we simply methodically replace each letter with the
letter three positions later, where the three could vary, right? It
could be 10 positions later, or whatever. But for now, let's do it
with three positions later. So then hello world becomes something
very incomprehensible.

When we want to decrypt that, we use the same substitution, but we go
back three letters (or however many letters) instead of forward three
letters.

Now while this is fun, and I encourage you to experiment with this,
don't use it for anything you seriously want to keep secret, because
mathematics has had great strides in the last 2000 years. And these
days, it's relatively easy to decrypt a message encrypted with a Caesar
cipher. And in particular, people have studied things like letter
frequencies, and here the same letter always gets encrypted with the
same other letter. So if you reuse a letter

38:29
that will appear always as the same letter in the encrypted version. And
you can see it in our example, hello world has three L's. All of those
are encrypted as O's, and that gives someone who's tried to break your
cipher a lot of hints.  A special form of the Caesar Cipher is to

38:59
use an offset of 13.

39:03
Because English has 26 letters, that makes decryption the same as
encryption, so A becomes M, and M becomes A.  B becomes N and N
becomes B. And that's true whether we're decrypting or encrypting.

39:23
Okay, we're at the end of today's lecture material. You have seen some
of these things before, only some of it is new. But I'm sure most of
you needed that reminder. My apologies to those of you who didn't. But
even those of you who didn't need the reminder, you've learned something
new. So that's useful.  The designers of these libraries in Java made a
conscious decision to make reading from files and writing to files very
much -- or as much as possible -- the same as reading text input from the user
or printing messages to the user.

40:09
So this is made very similar for several reasons.

40:12
One is they were able to write the same code just once: with only minor
tweaks, it can do rather different things. But the other reason is, people
writing programs, they only have to learn one way of doing things. They
don't have to learn: this is how we print the message to the user, and
we have a completely different way of doing things when we put things
in a file, when we put text in the file. One thing you learned today is
we have lots of different ways of creating scanners. And although they,
in some sense, in the general sense all work the same way, they can work
with very different forms of input. And this can be very useful.

40:56
And similar things.  For printf.

Now you have a lot of mechanisms at your disposal to produce text input,
the text output, whether to files, whether from files, whether to the
user, whether from the user.  We can do that, pretty much as you
may want to do it.

I encourage you to try reading a web page from a web server. And you're
welcome to read any web page from this course. And it's easy to do and
maybe you can do two things with the contents. One is output it to the
screen and the other is save it to the file at the same time. You can write
a program to do both. That's easy enough.

41:46
That's it for today. I'll see you in class.