In Rust, most byte streams implement Read:

pub trait Read {
  fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
}

This works by reading some number of bytes from the source (a file, or a network socket) and storing them in buf, which the program can then operate on.

But this is awkward for transforming; programmers usually think of bytes either as the full content in a single buffer or as a stream. Buffers containing the whole content are great when you can work with them, but it typically isn't safe to do so. So, what's a Rust programmer to do?

In this blog post we’ll cover a handy way to think about Rust readers that plays nicely with the way Rust programmers naturally think about streaming values.

Iterator Is a Stream

It turns out that Iterator is a natural way to think about streams: the API is designed to force us to treat the values like a stream already! The only minor bit of annoyance is iterating over fallible values, as we’re doing here, but this isn’t too bad to work with.

So, first up, it's easy to get an Iterator over the bytes in a Read instance using bytes():

let file = File::open(path)?;
let bytes = file.bytes(); // <- `bytes` is an `Iterator<Item = io::Result<u8>>`!

Quick note: reading byte-by-byte is expensive on most implementers of Read — this specific example results in a syscall on every iteration! Wrap it with BufReader first:

let file = File::open(path)?;
let bytes = BufReader::new(file).bytes();

OK, so now we have an iterator over each byte. What can we do with that? Anything that an iterator can do!

Just keep in mind it's a little more complicated since it's iterating over results, not purely bytes (depending on your needs, you can work around this).

Make it peekable:

BufReader::new(file)
  .bytes()
  .peekable()

Map, filter, or any other combinator:

BufReader::new(file)
  .bytes()
  .map(|i| {
    i.map(|byte| to_hex(byte))
  })

Example: Remove Carriage Returns

We can also make our own iterators to powerfully modify the content however we want. Here, we'll create an iterator that trims \r characters from the byte stream if they're at the end of the stream or they're followed by a \n (which is what inspired this post!)

First, we wrap a peekable iterator over io::Result<u8>:

/// Implements the ability to drop `\r\n` byte pairs from a stream, converting each instance to a single `\n`.
pub struct CRLFToLF<I>
where
    I: Iterator<Item = io::Result<u8>>,
{
    iter: Peekable<I>,
}

"Peekable" iterators expose an additional method, peek, which lets us preview the next item in the iterator without consuming it.

Then, implement the iterator itself:

impl<I> Iterator for CRLFToLF<I>
where
    I: Iterator<Item = io::Result<u8>>,
{
    type Item = io::Result<u8>;

    fn next(&mut self) -> Option<Self::Item> {
        // If the read byte is `\r`, check the byte after that (the "upcoming" byte):
        // - If this is the end of the stream, just drop the `\r`.
        // - If the upcoming byte is `\n`, drop the currently read `\r` by re-running.
        // - If the upcoming byte is neither of those things, emit the `\r`.
        match self.iter.next()? {
            Ok(byte) => {
                if byte == CR_CHAR {
                    if let Ok(next) = self.iter.peek()? {
                        if next == &LF_CHAR {
                            return self.next();
                        }
                    }
                }
                Some(Ok(byte))
            }
            Err(e) => Some(Err(e)),
        }
    }
}

The ? operators in this case desugar such that if their attached symbol evaluates to None, they return None from the function.

Finally, we implement the ability to chain this new iterator off an existing Iterator that is iterating over the correct type:

impl<I> CRLFToLF<I>
where
    I: Iterator<Item = io::Result<u8>>,
{
    fn new(iter: I) -> Self {
        Self {
            iter: iter.peekable(),
        }
    }
}

pub trait ConvertCRLFToLF {
    /// Implements the ability to drop `\r\n` byte pairs from a stream, converting each instance to a single `\n`.
    fn convert_crlf_lf(self) -> CRLFToLF<Self>
    where
        Self: Sized,
        Self: Iterator<Item = io::Result<u8>>;
}

impl<I> ConvertCRLFToLF for I
where
    I: Iterator<Item = io::Result<u8>>,
{
    pub fn convert_crlf_lf(self) -> CRLFToLF<Self> {
        CRLFToLF::new(self)
    }
}

And we use it!

#[test]
fn crlf_to_lf_works() {
    let content = b"hello\r\neveryone\nin\r\nthe\nworld\r";
    let expected = b"hello\neveryone\nin\nthe\nworld".to_vec();

    let processed = Cursor::new(content)
        .bytes()
        .convert_crlf_lf()
        .collect::<io::Result<Vec<u8>>>()
        .expect("should not error");

    assert_eq!(expected, processed);
}

Convert Back to a Read

That’s all well and good, but how can I take this transformed stream and convert it back to a reader? Luckily, the Rust community has us covered for most use cases: just use IterRead!

/// Copies a file, converting all `\r\n` sequences to `\n`.
fn copy_file_normalize_line_endings(from: &Path, to: &Path) -> Result<(), io::Error> {
    let from = File::open(from)?;
    let from = BufReader::new(from).bytes().crlf_to_lf().fuse();

    let mut to = File::open(to)?;
    io::copy(&mut IterRead::new(from), &mut to)?;

    Ok(())
}

IterRead doesn’t work in all scenarios: specifically, it requires a fused iterator. This is because it assumes the stream is EOF (end of file) when the iterator returns None.

If this doesn’t work for your use case, you’ll need to build your own version of IterRead, but most of the time this is a perfectly valid assumption to make and is always a great starting point.

Wrapping it Up

Hopefully, this was a useful and interesting peek into one of Rust’s slightly out-of-the-way but not super-advanced concepts. I think working with streaming bytes often feels complicated, and I hope this post helps to reframe the idea of working with byte streams into something a little more familiar!