Why files matter in automation
Many “useful scripts” become truly practical when they can read existing information from a file and write results back out. Files let your program work with data that already exists (logs, exports, notes, configuration files) and produce outputs you can share (reports, cleaned data, renamed lists, summaries). In this chapter you will learn how to open files safely, read text in different ways, write new files, append to existing ones, and handle common issues like missing files and character encoding.
File paths: telling Python where the file is
Before reading or writing, you must identify the file using a path. A path can be relative (based on your current working folder) or absolute (the full location on your computer).
Relative paths
A relative path points to a file “near” your script. If your script is in a folder and the file is in the same folder, you can usually use just the filename.
filename = "notes.txt"If the file is inside a subfolder, include the folder name:
filename = "data/notes.txt"Absolute paths
An absolute path includes the full location. On Windows it might look like C:\Users\Name\Documents\file.txt. On macOS/Linux it might look like /Users/name/Documents/file.txt.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
When writing Windows paths in Python strings, remember that backslashes can act like escape characters. Two common solutions are to use a raw string or forward slashes.
path1 = r"C:\Users\Name\Documents\notes.txt" # raw string keeps backslashes literal
path2 = "C:/Users/Name/Documents/notes.txt" # forward slashes also work in most casesUsing pathlib for cleaner paths
Python’s pathlib module helps you build paths in a way that works across operating systems. This is especially useful for automation scripts you might run on different machines.
from pathlib import Path
base = Path("data")
file_path = base / "notes.txt" # joins paths safely
print(file_path) # data/notes.txt (or data\notes.txt on Windows)Opening files with with: the safest pattern
The most important habit for file work is to open files using a context manager (with). It automatically closes the file even if an error happens, which prevents corrupted output and “file is in use” issues.
with open("notes.txt", "r", encoding="utf-8") as f:
text = f.read()The open() function takes:
- file: path or filename
- mode: how you want to use the file (read, write, append, etc.)
- encoding: how text is decoded/encoded (use
utf-8in most modern cases)
Common modes
"r": read (file must exist)"w": write (creates file; overwrites if it exists)"a": append (creates file; adds to end if it exists)"x": create (fails if file already exists)"rb","wb": binary read/write (for images, PDFs, etc.)
Reading text files: three practical approaches
Text files are common in automation: exported CSV-like text, logs, lists of names, configuration snippets, and notes. Python gives you multiple ways to read them depending on your goal.
1) Read the entire file at once
Use read() when the file is small enough to fit comfortably in memory and you want the whole content as one string.
with open("notes.txt", "r", encoding="utf-8") as f:
content = f.read()
print(content)Automation example: quickly search for a keyword in a small report.
keyword = "ERROR"
with open("app.log", "r", encoding="utf-8") as f:
content = f.read()
if keyword in content:
print("Found errors in the log")2) Read line by line (recommended for large files)
When files can be large (logs, exports), read them line by line. This is memory-friendly and fits many automation tasks such as filtering, counting, or extracting matching lines.
with open("app.log", "r", encoding="utf-8") as f:
for line in f:
if "ERROR" in line:
print(line.strip())strip() removes the trailing newline so printed lines look clean.
3) Read all lines into a list
readlines() returns a list of lines. This can be convenient if you need random access or want to process lines multiple times, but it loads everything into memory.
with open("names.txt", "r", encoding="utf-8") as f:
lines = f.readlines()
print("Total lines:", len(lines))Writing text files: overwrite vs append
Writing is where automation becomes a time-saver: you can generate cleaned versions of files, create summaries, or produce new files for other tools.
Writing a new file (or overwriting an existing one)
Mode "w" creates the file if it does not exist, and overwrites it if it does. Use this when you want a fresh output each run.
report = "Daily summary\n- Items processed: 120\n- Errors: 0\n"
with open("summary.txt", "w", encoding="utf-8") as f:
f.write(report)write() returns the number of characters written, but you usually do not need it.
Appending to a file
Mode "a" adds new content to the end. This is useful for logs your script maintains, or for building a running history.
from datetime import datetime
line = f"{datetime.now().isoformat()} - Script ran successfully\n"
with open("automation.log", "a", encoding="utf-8") as f:
f.write(line)Writing multiple lines
If you have several lines to write, you can write them one by one or use writelines(). Remember that writelines() does not add newline characters automatically; include "\n" yourself.
lines = ["apple\n", "banana\n", "cherry\n"]
with open("fruits.txt", "w", encoding="utf-8") as f:
f.writelines(lines)Step-by-step automation: clean a messy text list
A common beginner-friendly automation task is cleaning a text file that contains one item per line, but with extra spaces, blank lines, or duplicates. The goal: read an input file, clean it, and write a new file.
Example input file
Imagine raw_emails.txt contains:
alice@example.com
BOB@example.com
bob@example.com
carol@example.com
Goal
- Remove blank lines
- Trim spaces
- Normalize case (lowercase)
- Remove duplicates
- Write cleaned results to
clean_emails.txt
Step 1: read and clean line by line
input_path = "raw_emails.txt"
output_path = "clean_emails.txt"
seen = set()
cleaned = []
with open(input_path, "r", encoding="utf-8") as f:
for line in f:
email = line.strip().lower()
if not email:
continue
if email in seen:
continue
seen.add(email)
cleaned.append(email)Step 2: write the cleaned list
with open(output_path, "w", encoding="utf-8") as f:
for email in cleaned:
f.write(email + "\n")Step 3: add a small status message
It is helpful for automation scripts to report what they did.
print(f"Wrote {len(cleaned)} unique emails to {output_path}")This pattern (read → transform → write) is the backbone of many practical file-based automations.
Step-by-step automation: filter a log file into a smaller report
Another common task is extracting only the lines you care about from a large log file. For example, you may want to keep only warnings and errors and write them to a separate file for quick review.
Goal
- Read
app.log - Keep lines containing
WARNINGorERROR - Write them to
issues.log
Implementation
input_path = "app.log"
output_path = "issues.log"
keywords = ("WARNING", "ERROR")
count = 0
with open(input_path, "r", encoding="utf-8") as fin, open(output_path, "w", encoding="utf-8") as fout:
for line in fin:
if any(k in line for k in keywords):
fout.write(line)
count += 1
print(f"Extracted {count} issue lines to {output_path}")Notice that you can open two files in one with statement. This is a clean way to build “pipeline” scripts that transform one file into another.
Encoding and Unicode: avoiding weird characters
Text files are stored as bytes, and an encoding tells Python how to interpret those bytes as characters. Most modern text files are UTF-8, but you may encounter files saved in other encodings, especially on Windows or from older systems.
Best practice: specify encoding explicitly
When reading and writing text, include encoding="utf-8" unless you have a reason not to. This makes your script more predictable across machines.
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read()What if you get a UnicodeDecodeError?
If reading fails with a decoding error, the file may not be UTF-8. You can try a different encoding (common ones include "utf-8-sig" for UTF-8 with a BOM, or "cp1252" for some Windows text). Another approach is to use error handling to replace problematic characters, which can be acceptable for logs and rough text processing.
with open("legacy.txt", "r", encoding="utf-8", errors="replace") as f:
content = f.read()Using errors="replace" prevents crashes, but it may change characters. For critical data, you should identify the correct encoding instead of replacing.
Handling missing files and other common errors
Automation scripts should fail gracefully. Two common issues are missing input files and permission problems (trying to write somewhere you do not have access, or trying to overwrite a file that is open in another program).
Check existence with pathlib
from pathlib import Path
path = Path("data/input.txt")
if not path.exists():
print("Input file not found:", path)
else:
with path.open("r", encoding="utf-8") as f:
print(f.read())Use try/except around file operations
When your script is meant to run unattended, catching errors and printing a helpful message is better than a confusing traceback.
try:
with open("input.txt", "r", encoding="utf-8") as f:
data = f.read()
except FileNotFoundError:
print("Could not find input.txt. Put it in the same folder as the script.")
except PermissionError:
print("Permission error: close the file if it is open, or choose a different folder.")Creating folders for outputs
When writing output files, you often want them in a dedicated folder like output/. If the folder does not exist, writing will fail. Create it first.
from pathlib import Path
out_dir = Path("output")
out_dir.mkdir(parents=True, exist_ok=True)
out_file = out_dir / "report.txt"
with out_file.open("w", encoding="utf-8") as f:
f.write("Report generated\n")exist_ok=True means “do not error if the folder already exists.”
Working with CSV-like text (without special libraries)
Many automation tasks involve simple “comma-separated” files. While Python has a dedicated csv module (and it is recommended for real CSV), you can still do basic tasks by splitting lines. This is acceptable only when your data is simple and does not contain commas inside quoted fields.
Example: sum values from a simple file
Suppose sales.txt contains:
2026-01-01,12.50
2026-01-02,9.99
2026-01-03,20.00
You can parse it like this:
total = 0.0
with open("sales.txt", "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
date_str, amount_str = line.split(",")
total += float(amount_str)
with open("sales_total.txt", "w", encoding="utf-8") as f:
f.write(f"Total sales: {total:.2f}\n")If you later encounter more complex CSV files, switch to the csv module to handle quoting and edge cases correctly.
Reading and writing binary files (when text is not enough)
Not all files are text. Images, PDFs, and many document formats are binary. If you open them in text mode, Python will try to decode bytes into characters and fail or corrupt the data. Use binary modes "rb" and "wb".
Example: copy a file
This is a simple but useful automation building block: copying a file without importing extra modules.
source = "photo.jpg"
dest = "backup_photo.jpg"
with open(source, "rb") as fin, open(dest, "wb") as fout:
chunk_size = 1024 * 1024 # 1 MB
while True:
chunk = fin.read(chunk_size)
if not chunk:
break
fout.write(chunk)Reading in chunks is important for large files so you do not load the entire file into memory.
Practical patterns for file-based automation
Pattern 1: transform one file into another
This is the most common pattern: read input, process each line, write output. Keep it streaming (line by line) when possible.
with open("input.txt", "r", encoding="utf-8") as fin, open("output.txt", "w", encoding="utf-8") as fout:
for line in fin:
fout.write(line.upper())Pattern 2: build a timestamped output filename
Automation often produces outputs you want to keep. A timestamp avoids overwriting previous results.
from datetime import datetime
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
output_name = f"report_{ts}.txt"
with open(output_name, "w", encoding="utf-8") as f:
f.write("Generated report\n")Pattern 3: safe creation with x mode
If overwriting would be dangerous, use "x" to ensure you do not accidentally replace an existing file.
try:
with open("important_output.txt", "x", encoding="utf-8") as f:
f.write("First run output\n")
except FileExistsError:
print("important_output.txt already exists; not overwriting.")Pattern 4: process many files in a folder
Real automation often means “do the same thing to every file in a folder.” pathlib makes this straightforward.
from pathlib import Path
input_dir = Path("incoming")
output_dir = Path("processed")
output_dir.mkdir(exist_ok=True)
for path in input_dir.glob("*.txt"):
out_path = output_dir / path.name
with path.open("r", encoding="utf-8") as fin, out_path.open("w", encoding="utf-8") as fout:
for line in fin:
fout.write(line.strip() + "\n")This example trims whitespace on every line for every .txt file in incoming/ and writes cleaned versions to processed/.
Checklist: avoiding common beginner mistakes
- Forgetting
with: can leave files open and cause locked-file issues. - Using
"w"when you meant"a":"w"overwrites the file. - Not adding newlines:
write()does not automatically add"\n". - Not specifying encoding: may work on your machine but fail on another.
- Reading huge files with
read(): prefer line-by-line iteration for large inputs. - Using naive splitting for complex CSV: fine for simple data, but switch to the
csvmodule when fields can contain commas or quotes. - Hardcoding fragile paths: prefer
pathliband keep inputs/outputs in predictable folders.