RUST: String manipulation

Rust provides a rich set of methods for string manipulation, primarily through the &str (string slice) primitive type and the String (owned, growable string) type from the standard library. Since String implements Deref<Target = str>, most methods available on &str can also be called directly on String.

Here’s a comprehensive list, categorized for clarity:

I. Creation & Conversion

  • String::new(): Creates a new, empty String.
  • String::with_capacity(n): Creates a new, empty String with pre-allocated capacity for at least n bytes.
  • String::from(“…”) / “…”.to_string()/”…”.to_owned(): Creates an owned String from a string literal (&str) or another string slice.
  • format!(…) (Macro): Creates a formatted String similar to sprintf or Python’s f-strings.
let name = "Alice";
let age = 30;
let s = format!("User: {}, Age: {}", name, age); // s is a String
  • .to_string(): Available on many types (via the ToString trait) to convert them to a String.
  • String::from_utf8(Vec<u8>): Creates a String from a byte vector if it contains valid UTF-8. Returns a Result<String, FromUtf8Error>.
  • String::from_utf8_lossy(&[u8]): Creates a String from a byte slice, replacing invalid UTF-8 sequences with U+FFFD �. Returns a Cow<str>.
  • String::from_utf8_unchecked(Vec<u8>) (unsafe): Creates a String from bytes without checking UTF-8 validity. Use with extreme caution.
  • str::from_utf8(&[u8]): Creates a &str slice from a byte slice if valid UTF-8. Returns Result<&str, Utf8Error>.
  • str::from_utf8_unchecked(&[u8]) (unsafe): Creates &str without checking UTF-8. Use with extreme caution.

II. Length & Capacity

  • .len() (on &str and String): Returns the length of the string in bytes (not characters).
assert_eq!("hello".len(), 5);
assert_eq!("你好".len(), 6); // Each Chinese char is 3 bytes in UTF-8
  • .chars().count() (Iterator method): Returns the number of Unicode scalar values (often perceived as ‘characters’). Can be less efficient than .len() as it requires iteration.
assert_eq!("hello".chars().count(), 5);
assert_eq!("你好".chars().count(), 2);
  • .is_empty() (on &str and String): Returns true if the string has a length of 0 bytes.
  • .capacity() (Specific to String): Returns the total number of bytes the String can hold without reallocating.

III. Appending & Inserting (Primarily String)

  • .push_str(&str) (String): Appends a string slice to the end of the String.
let mut s = String::from("hello ");
s.push_str("world"); // s is now "hello world"
  • .push(char) (String): Appends a single character to the end of the String.
let mut s = String::from("he");
s.push('l');
s.push('l');
s.push('o'); // s is now "hello"
  • .insert(idx, char) (String): Inserts a character at the given byte index. Panics if idx is not on a character boundary or out of bounds.
  • .insert_str(idx, &str) (String): Inserts a string slice at the given byte index. Panics if idx is not on a character boundary or out of bounds.
  • + operator (String + &str): Concatenates a String (taken by value) and a &str, returning a new String. Often less efficient than push_str for multiple appends due to reallocations.
let s1 = String::from("hello");
let s2 = " world";
let s3 = s1 + s2; // s1 is moved here, s3 is "hello world"
  • .extend(iterator) (String): Extends the string with elements from an iterator (e.g., Iterator<Item=char> or Iterator<Item=&str>).

IV. Removing & Clearing (Primarily String)

  • .pop() (String): Removes and returns the last character as an Option<char>. Returns None if the string is empty.
  • .remove(idx) (String): Removes and returns the character at the specified byte index. Panics if idx is not on a character boundary or out of bounds. Shifts subsequent characters left. O(n).
  • .truncate(len) (String): Shortens the string, keeping the first len bytes. Panics if len is not on a character boundary or larger than the current length.
  • .clear() (String): Removes all contents of the string, making it empty. Capacity remains unchanged.
  • .drain(range) (String): Creates an iterator that removes the specified range of characters (given by byte indices) and yields them. Panics on invalid boundaries. Shifts subsequent characters. O(n).

V. Slicing & Accessing Parts (Primarily &str)

  • Slicing [a..b]: Creates a string slice (&str) from a byte range [a..b). Panics if the start or end indices are not on character boundaries or are out of bounds.
let s = "hello world";
let hello = &s[0..5]; // "hello"
let world = &s[6..11]; // "world"
// let invalid = &s[0..2]; // Panics if slicing "你好" because 2 is mid-character
  • .get(a..b): Like slicing, but returns an Option<&str> instead of panicking on invalid indices/boundaries.
  • .get_unchecked(a..b) (unsafe): Slices without bounds or character boundary checks. Use with extreme caution.
  • .chars(): Returns an iterator over the chars of the string.
  • .char_indices(): Returns an iterator over (byte_index, char) pairs. Useful for finding character boundaries.
  • .bytes(): Returns an iterator over the raw bytes (u8) of the string.
  • .as_bytes(): Returns a byte slice (&[u8]) of the string’s contents.

VI. Searching & Checking Contents (Primarily &str)

  • .contains(pattern): Returns true if the string contains the given pattern (&str, char, slice of chars, or predicate function).
  • .starts_with(pattern): Returns true if the string starts with the given pattern.
  • .ends_with(pattern): Returns true if the string ends with the given pattern.
  • .find(pattern): Returns the starting byte index of the first occurrence of the pattern as an Option<usize>.
  • .rfind(pattern): Returns the starting byte index of the last occurrence of the pattern as an Option<usize>.
  • .match_indices(pattern): Returns an iterator over the (byte_index, matched_slice) for all non-overlapping matches of the pattern.
  • .rmatch_indices(pattern): Like match_indices, but searches from the end of the string.
  • .is_char_boundary(idx): Checks if a given byte index lies on a char boundary. Useful before slicing.

VII. Replacing (Returns new String)

  • .replace(from_pattern, to_str): Returns a new String with all non-overlapping occurrences of from_pattern replaced by to_str.
let s = "this is old";
let new_s = s.replace("old", "new"); // "this is new"
  • .replacen(from_pattern, to_str, count): Returns a new String replacing at most count occurrences.
  • .replace_range(range, replace_with_str) (String method): Replaces the specified byte range in the original String with the given string slice. Panics on invalid boundaries. Modifies in-place.

VIII. Splitting & Joining (Returns Iterators or String)

  • .split(pattern): Returns an iterator over substrings separated by the pattern. The pattern itself is not included. Empty strings may be produced.let parts:
let parts: Vec<&str> = "a,b,c".split(',').collect(); // ["a", "b", "c"]
let parts: Vec<&str> = ",a,".split(',').collect(); // ["", "a", ""]
  • .rsplit(pattern): Like split, but starts splitting from the end.
  • .split_terminator(pattern): Like split, but does not produce an empty string after a trailing separator.
let parts: Vec<&str> = "a,b,c,".split_terminator(',').collect(); // ["a", "b", "c"]
  • .rsplit_terminator(pattern): Like split_terminator, but starts splitting from the end.
  • .splitn(count, pattern): Returns an iterator splitting at most count times (producing count + 1 items max). The last item contains the remainder of the string.
  • .rsplitn(count, pattern): Like splitn, but starts splitting from the end.
  • .split_whitespace(): Returns an iterator over substrings separated by any amount of whitespace. Leading/trailing whitespace and multiple whitespace separators are ignored.
let words: Vec<&str> = " hello \t world ".split_whitespace().collect(); // ["hello", "world"]
  • .lines(): Returns an iterator over the lines of the string, separated by \n or \r\n. Line endings are not included.
  • slice.join(separator) (Method on &[&str], Vec<&str>, etc.): Joins a slice of string slices with a separator string, returning a new String.
let words = ["hello", "world"];
let sentence = words.join(" "); // "hello world"

IX. Case Conversion (Returns new String)

  • .to_lowercase(): Returns a new String with all characters converted to lowercase according to Unicode rules.
  • .to_uppercase(): Returns a new String with all characters converted to uppercase according to Unicode rules.
  • .to_ascii_lowercase(): Converts to lowercase, but only for ASCII characters. Faster but doesn’t handle non-ASCII correctly. Returns String.
  • .to_ascii_uppercase(): Converts to uppercase, but only for ASCII characters. Returns String.
  • .make_ascii_lowercase() (String specific): Converts ASCII characters to lowercase in-place.
  • .make_ascii_uppercase() (String specific): Converts ASCII characters to uppercase in-place.

X. Trimming (Returns &str)

  • .trim(): Returns a string slice with leading and trailing whitespace removed.
  • .trim_start() / .trim_left(): Returns a string slice with leading whitespace removed.
  • .trim_end() / .trim_right(): Returns a string slice with trailing whitespace removed.
  • .trim_matches(pattern): Returns a string slice with leading and trailing characters matching the pattern (char, slice of chars, or predicate) removed.
  • .trim_start_matches(pattern): Removes only leading matching characters.
  • .trim_end_matches(pattern): Removes only trailing matching characters.

XI. Parsing

  • .parse::<T>(): Parses the string slice into another type T (if T implements FromStr). Returns a Result<T, T::Err>.
let num_str = "42";
let num: Result<i32, _> = num_str.parse(); // Ok(42)
assert_eq!(num.unwrap(), 42);

let bad_num: Result<i32, _> = "abc".parse(); // Err(...)

XII. Comparison

  • Standard comparison operators (==, !=, <, <=, >, >=) work lexicographically on &str and String.
  • .eq_ignore_ascii_case(other_str): Compares two strings ignoring the case of ASCII characters only.

This list covers the vast majority of standard string manipulation methods. Remember the key difference: &str methods typically return new &str slices (borrowing) or require conversion to String for owned results, while String methods often modify the string in-place or return new Strings. Always be mindful of UTF-8 character boundaries when working with byte indices!

EXAMPLES:

fn main() {
    // 1. Creating Strings

    // Creating a String from a string literal.
    let string_literal = "Hello, world!"; // This is a &str (string slice).
    let string_from_literal = String::from(string_literal); // Convert to String.
    println!("String from literal: {}", string_from_literal);

    // Creating a String using to_string().
    let string_to_string = string_literal.to_string(); // Another way to convert to String.
    println!("String using to_string(): {}", string_to_string);

    // Creating an empty String.
    let empty_string = String::new();
    println!("Empty string: '{}'", empty_string);

    // 2. Appending to Strings

    // Appending a string slice to a String.
    let mut greeting = String::from("Hello");
    greeting.push_str(", Rust!"); // Modifies the String in place.
    println!("Appended string: {}", greeting);

    // Appending a single character to a String.
    greeting.push('!'); // Modifies the String in place.
    println!("Appended character: {}", greeting);

    // 3. Concatenating Strings

    // Using the + operator (moves ownership of the first String).
    let s1 = String::from("tic");
    let s2 = String::from("tac");
    let s3 = String::from("toe");
    let s4 = s1 + "-" + &s2 + "-" + &s3; // s1 is moved here and is no longer valid. s2 and s3 are borrowed.
    println!("Concatenated string: {}", s4);
    // println!("s1: {}", s1); // this would cause a compile error, as s1 is no longer valid.

    // Using the format! macro (does not take ownership).
    let s5 = String::from("tic");
    let s6 = String::from("tac");
    let s7 = String::from("toe");
    let s8 = format!("{}-{}-{}", s5, s6, s7); // s5, s6, and s7 are still valid.
    println!("Formatted string: {}", s8);
    println!("s5: {}", s5); // s5 is still valid.

    // 4. String Length and Capacity

    // Getting the length of a String (number of characters).
    let len = s8.len();
    println!("Length of '{}': {}", s8, len);

    // Getting the capacity of a String (allocated memory).
    let capacity = s8.capacity();
    println!("Capacity of '{}': {}", s8, capacity);

    // 5. String Slicing

    // Creating a string slice from a String.
    let hello = String::from("Hello, world!");
    let hello_slice = &hello[0..5]; // Slice from index 0 up to (but not including) 5.
    println!("String slice: {}", hello_slice);

    // 6. Iterating Over Characters

    // Iterating over the characters of a String.
    for c in hello.chars() {
        print!("{} ", c);
    }
    println!();

    // 7. Replacing Parts of a String

    // Replacing a substring.
    let mut replace_string = String::from("Hello, world!");
    let replaced_string = replace_string.replace("world", "Rust"); // Returns a new String.
    println!("Replaced string: {}", replaced_string);
    println!("Original string: {}", replace_string); // the original string is unchanged.

    // 8. Trimming Whitespace

    // Trimming leading and trailing whitespace.
    let whitespace_string = String::from("   leading and trailing whitespace   ");
    let trimmed_string = whitespace_string.trim(); // Returns a &str (string slice).
    println!("Trimmed string: '{}'", trimmed_string);

    // 9. Splitting Strings

    // Splitting a string into multiple parts.
    let split_string = String::from("apple,banana,orange");
    let parts: Vec<&str> = split_string.split(',').collect(); // Collects into a Vec of &str.
    println!("Split string parts: {:?}", parts);

    // 10. Converting to Uppercase/Lowercase

    // Converting to uppercase.
    let uppercase_string = String::from("hello").to_uppercase();
    println!("Uppercase string: {}", uppercase_string);

    // Converting to lowercase.
    let lowercase_string = String::from("WORLD").to_lowercase();
    println!("Lowercase string: {}", lowercase_string);

    // 11. Checking if a String Contains a Substring

    // Checking if a string contains a substring.
    let contains_string = String::from("The quick brown fox");
    let contains_fox = contains_string.contains("fox");
    println!("Contains 'fox': {}", contains_fox);
    let contains_dog = contains_string.contains("dog");
    println!("Contains 'dog': {}", contains_dog);

    // 12. Checking if a String Starts or Ends With a Substring

    // Checking if a string starts with a substring.
    let starts_with_string = String::from("Rust is great");
    let starts_with_rust = starts_with_string.starts_with("Rust");
    println!("Starts with 'Rust': {}", starts_with_rust);

    // Checking if a string ends with a substring.
    let ends_with_string = String::from("Learning Rust");
    let ends_with_rust = ends_with_string.ends_with("Rust");
    println!("Ends with 'Rust': {}", ends_with_rust);
}

Explanation of the Examples:

  1. Creating Strings:
    • Shows how to create String objects from string literals (&str) using String::from() and .to_string().
    • Demonstrates creating an empty String using String::new().
  2. Appending to Strings:
    • push_str(): Appends a string slice (&str) to a String.
    • push(): Appends a single character (char) to a String.
  3. Concatenating Strings:
    • + operator: Concatenates strings, but it moves ownership of the first String.
    • format! macro: A more flexible way to concatenate strings without taking ownership.
  4. String Length and Capacity:
    • .len(): Returns the number of characters in the String.
    • .capacity(): Returns the amount of memory allocated for the String.
  5. String Slicing:
    • Shows how to create a string slice (&str) from a String using range syntax ([start..end]).
  6. Iterating Over Characters:
    • .chars(): Returns an iterator over the characters of the String.
  7. Replacing Parts of a String:
    • .replace(): Replaces all occurrences of a substring with another substring.
  8. Trimming Whitespace:
    • .trim(): Removes leading and trailing whitespace from a string.
  9. Splitting Strings:
    • .split(): Splits a string into multiple parts based on a delimiter.
  10. Converting to Uppercase/Lowercase:
    • .to_uppercase(): Converts the string to uppercase.
    • .to_lowercase(): Converts the string to lowercase.
  11. Checking if a String Contains a Substring:
    • .contains(): Checks if a string contains a substring.
  12. Checking if a String Starts or Ends With a Substring:
    • .starts_with(): Checks if a string starts with a substring.
    • .ends_with(): Checks if a string ends with a substring.

"Strong men also cry"