Java IOs

https://github.com/heig-vd-dai-course

Web · PDF

L. Delafontaine and H. Louis, with the help of GitHub Copilot.

Based on the original course by O. Liechti and J. Ehrensberger.

This work is licensed under the CC BY-SA 4.0 license.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Objectives

  • Know the different types of data (binary vs. text)
  • Understand the abstract notion of sources, streams and sink
  • Use the different IO types for different use-cases
  • Use the Java IO API to read and write files
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Prepare and setup your environment

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Check and run the code examples

  • Check the code examples
  • Run the code examples
  • Help to understand the concepts
  • Play with the code examples
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Sources, streams and sinks of data

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Sources, streams and sinks of data

  • Abstraction of data flow
  • Source: where data comes from (input)
  • Sink: where data goes to (output)
  • Stream: data flows between source and sink
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

The Java IO API

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

The Java IO API

  • Part of java.base module
    • java.io package
    • java.nio package
  • Different classes for different IO types:
    • Binary data
    • Text data
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Types of data

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Types of data

  • Two types of data:
    • Binary
    • Text
  • Both are 0s and 1s - the difference is in interpretation:
    • Binary data - raw data
    • Text data - interpretation
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Processing binary data with the Java IO API

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Processing binary data with the Java IO API

  • Most basic type of data processing:
    1. Open a file
    2. Read/write/modify the bytes as they are
    3. Close the file
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Reading binary data

  • Most simple way is to read byte by byte (not efficient)
  • InputStream and FileInputStream classes are used to read binary data
  • Let's have a look at the code example BinaryReadFileExample
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
class BinaryReadFileExample {

  public static void main(String[] args) throws IOException {
    InputStream fis = new FileInputStream("binary-file.bin");

    // -1 indicates the end of the file
    int b;
    while ((b = fis.read()) != -1) {
      System.out.print(b);
    }

    fis.close();
  }
}
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Writing binary data

  • Most simple way is to write byte by byte (not efficient)
  • OutputStream and FileOutputStream classes are used to write binary data
  • Let's have a look at the code example BinaryWriteFileExample
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
class BinaryWriteFileExample {

  public static void main(String[] args) throws IOException {
    OutputStream fos = new FileOutputStream("binary-file.bin");

    for (int i = 0; i < 256; i++) {
      fos.write(i);
    }

    fos.close();
  }
}
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Reading and writing binary data with buffers

  • Reading and writing byte by byte is not efficient: each read() or write() call results in a system call every time
  • Buffers can be used to read write multiple bytes at once
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Use a buffer to read multiple bytes at once:

  1. First time, a system call is made to read a block of data
  2. Subsequent reads are done from the buffer
  3. When the buffer is empty, a new block is read
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

The same applies for writing:

  1. First time, a buffer is created
  2. Data is written to the buffer
  3. When the buffer is full, a system call is made to write the block
  4. The buffer is then emptied
  5. Bytes can remain in the buffer
    • A flush might be needed to empty the buffer
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
  • BufferedInputStream and BufferedOutputStream classes are used to read/write binary data with buffers
  • Let's have a look at the code examples BinaryBufferReadFileExample and BinaryBufferWriteFileExample
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
   public static void main(String[] args) throws IOException {
     InputStream fis = new FileInputStream("binary-file.bin");
+    InputStream bis = new BufferedInputStream(fis);

     // -1 indicates the end of the file
     int b;
-    while ((b = fis.read()) != -1) {
+    while ((b = bis.read()) != -1) {
       System.out.print(b);
     }

-    fis.close();
+    // Closing the BufferedInputStream automatically closes the FileInputStream
+    bis.close();
   }
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
   public static void main(String[] args) throws IOException {
     OutputStream fos = new FileOutputStream("binary-file.bin");
+    OutputStream bos = new BufferedOutputStream(fos);

     for (int i = 0; i < 256; i++) {
-      fos.write(i);
+      bos.write(i);
     }

-    fos.close();
+    // Flush the buffer to write the remaining bytes
+    bos.flush();
+    bos.close();
   }
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

A quick note on little endian vs. big endian

  • Little endian: least significant byte first
  • Big endian: most significant byte first
  • Java uses big endian by default
  • The class ByteBuffer can be used to convert between the two (not covered in this course)
  • Example: 12345678
    • Little endian: 0x78 0x56 0x34 0x12
    • Big endian: 0x12 0x34 0x56 0x78
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Processing text data with the Java IO API

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Processing text data with the Java IO API

  • Text data: interpretation of binary data
  • Different character encodings
  • Different end of line characters
  • Different IO classes for text data
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Ancestor of character representations: ASCII

  • ASCII: 128 binary values
  • Mapping binary to characters
  • Published in 1963 and meant for English
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Extended ASCII: codes pages

  • Extended ASCII (code pages)
  • Support for more characters using the remaining 128 values
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Unicode

  • Unicode: solves ASCII limitations
  • Standard to support all languages
  • Different implementations:
    • UTF-8
    • UTF-16
    • UTF-32
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

UTF-8

  • UTF-8: variable-length encoding
  • Most common Unicode implementation
  • ASCII compatible
  • Quite the standard for web and software development
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

What happens if you ignore the character encoding?

  • Not stored in the file itself
  • Misinterpretation leads to issues
  • Check, compile and run the TextCharacterEncodingsExample code example!
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Reading and writing text data

  • Reader and Writer classes are used to read/write text data
  • Always specify the encoding! If not set, it can be incompatible with other systems
  • Let's have a look at the code example TextReadAndWriteFileExample
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
class TextReadAndWriteFileExample {

  public static void main(String[] args) throws IOException {
    Reader reader = new FileReader("file.java", StandardCharsets.UTF_8);
    Writer writer = new FileWriter("file.txt", StandardCharsets.UTF_8);

    // -1 indicates the end of the file
    int c;
    while ((c = reader.read()) != -1) {
      writer.write(c);
    }

    writer.close();
    reader.close();
  }
}
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
  • Just like with binary data, reading and writing text data byte by byte is not efficient
  • BufferedReader and BufferedWriter classes are used to read/write text data with buffers
  • Let's have a look at the code example TextBufferReadAndWriteFileExample
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
   public static void main(String[] args) throws IOException {
     Reader reader = new FileReader("TextReadAndWriteFileExample.java", StandardCharsets.UTF_8);
+    BufferedReader br = new BufferedReader(reader);
+
     Writer writer = new FileWriter("TextReadAndWriteFileExample.txt", StandardCharsets.UTF_8);
+    BufferedWriter bw = new BufferedWriter(writer);

     // -1 indicates the end of the file
     int c;
-    while ((c = reader.read()) != -1) {
-      writer.write(c);
+    while ((c = br.read()) != -1) {
+      bw.write(c);
     }

-    writer.close();
-    reader.close();
+    // Flush the buffer to write the remaining bytes
+    bw.flush();
+    bw.close();
+    br.close();
   }
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

End of line characters

  • Different end of line characters on different systems
    • Unix/Linux/macOS: \n, called "Line feed" (LF)
    • Windows: \r\n, called "Carriage Return + Line feed" (CR+LF)
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
class TextEndOfLineCharactersExample {

  public static String END_OF_LINE = "\n";

  public static void main(String[] args) throws IOException {
    InputStream is = new FileInputStream("file.java");
    Reader reader = new InputStreamReader(is, StandardCharsets.UTF_8);
    BufferedReader br = new BufferedReader(reader);

    OutputStream os = new FileOutputStream("file.txt");
    Writer writer = new OutputStreamWriter(os, StandardCharsets.UTF_8);
    BufferedWriter bw = new BufferedWriter(writer);

    String line;
    while ((line = br.readLine()) != null) {
      // Careful: line does not contain end of line characters
      bw.write(line + END_OF_LINE);
    }

    bw.flush();
    br.close();
    is.close();
  }
}
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Dealing with errors

  • All kinds of errors can occur when reading/writing files
  • Files must be properly opened and closed
  • Ensure no files are corrupted
  • Two main ways to handle exceptions:
    • try-catch-finally blocks
    • try-with-resources blocks
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
// Bad example: try-catch without finally
public static void tryCatchWithoutFinallyExample() {
  try {
    Reader reader = new FileReader("missing.file");
    Writer writer = new FileWriter("missing.file");

    writer.write(reader.read());
  } catch (IOException e) {
    System.out.println("Exception: " + e);
  }
}
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
// Better example: try-catch with finally
public static void tryCatchFinallyExample() {
  Reader reader = null;
  Writer writer = null;

  try {
    reader = new FileReader("missing.file");
    writer = new FileWriter("missing.file");

    writer.write(reader.read());
  } catch (IOException e) {
    System.out.println("Exception: " + e);
  } finally {
    if (writer != null) {
      try {
        writer.close();
      } catch (IOException e) {
        System.out.println("Exception in close writer: " + e);
      }
    }

    if (reader != null) {
      try {
        reader.close();
      } catch (IOException e) {
        System.out.println("Exception in close reader: " + e);
      }
    }
  }
}
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0
// Best example: try-with-resources
public static void tryWithResourcesExample() {
  try (Reader reader = new FileReader("missing.file");
      Writer writer = new FileWriter("missing.file")) {
    writer.write(reader.read());
  } catch (IOException e) {
    System.out.println("Exception: " + e);
  }
}
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

When to use which IO?

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

When to use which IO?

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Common pitfalls

More details for this section in the course material. You can find other resources and alternatives as well.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Common pitfalls

  • Not using buffers
  • Not closing the streams
  • Not handling exceptions properly
  • Not specifying the character encoding
  • Not specifying the end of line characters
  • Do not use PrintWriter - it swallows exceptions
  • Do not use System.lineSeparator() - it is platform dependent
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Questions

Do you have any questions?

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Practical content

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

What will you do?

Benchmark the different types of streams you have learned:

  • Assemble all the code examples to satisfy the use-cases
  • Run some benchmarks to determine the best IOs for the given use-cases
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Find the practical content

You can find the practical content for this chapter on GitHub.

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Finished? Was it easy? Was it hard?

Can you let us know what was easy and what was difficult for you during this chapter?

This will help us to improve the course and adapt the content to your needs. If we notice some difficulties, we will come back to you to help you.

➡️ GitHub Discussions

You can use reactions to express your opinion on a comment!

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

What will you do next?

In the next chapter, you will learn the following topics:

  • Docker and Docker Compose: how to containerize your applications
    • What is an image?
    • What is a container?
    • How to try out new software without installing it?
HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0

Sources

HEIG-VD - DAI Course 2024-2025 - CC BY-SA 4.0