Bloomberg 面试题：遍历文件夹并汇总 CSV 第二列的总和

题目描述

给定一个文件夹路径，该文件夹及其子文件夹中散布着多个 CSV 文件，每个 CSV 包含交易信息。实现函数 process_path，找到所有 CSV 文件并计算第二列所有整数值的总和。

def process_path(path: str) -> int:
    """处理指定路径下所有 CSV 文件，返回第二列的总和"""
    pass

示例

文件结构：
.
|-- a
|   `-- b
|       `-- ex.csv
|-- example.csv
`-- ignoreme.log

# ex.csv 内容:
a,5,-1,-1,0
a,10,-1,-1,0

# example.csv 内容:
b,0,-1,-1,0
b,-3,-1,-1,0

预期输出: 12 (5 + 10 + 0 + (-3) = 12)

注意：只处理 .csv 文件，忽略其他类型文件。

解题思路

递归遍历目录：使用 os.walk 或递归函数遍历所有子目录
过滤 CSV 文件：只处理 .csv 结尾的文件
解析并累加：读取每行，提取第二列并累加

Python 实现

import os
import csv

def process_path(path: str) -> int:
    total = 0
    
    for root, dirs, files in os.walk(path):
        for filename in files:
            if filename.endswith('.csv'):
                filepath = os.path.join(root, filename)
                total += process_csv(filepath)
    
    return total

def process_csv(filepath: str) -> int:
    total = 0
    with open(filepath, 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            if len(row) >= 2:
                try:
                    total += int(row[1])
                except ValueError:
                    pass  # 跳过无法解析的行
    return total

不使用 csv 模块的版本

import os

def process_path(path: str) -> int:
    total = 0
    
    if os.path.isfile(path):
        if path.endswith('.csv'):
            return process_csv_file(path)
        return 0
    
    for item in os.listdir(path):
        item_path = os.path.join(path, item)
        total += process_path(item_path)  # 递归处理
    
    return total

def process_csv_file(filepath: str) -> int:
    total = 0
    with open(filepath, 'r') as f:
        for line in f:
            parts = line.strip().split(',')
            if len(parts) >= 2:
                try:
                    total += int(parts[1])
                except ValueError:
                    pass
    return total

复杂度分析

时间复杂度：O(F × L)，F 是 CSV 文件数，L 是平均行数
空间复杂度：O(D)，D 是目录最大深度（递归调用栈）

进阶讨论

面试官可能会追问：

如何处理大文件？ 使用流式读取，不一次性加载整个文件
如何并行处理？ 使用多线程/多进程处理不同文件
如何处理编码问题？ 指定文件编码，捕获编码异常

# 并行版本示例
from concurrent.futures import ThreadPoolExecutor
import os

def process_path_parallel(path: str) -> int:
    csv_files = []
    for root, dirs, files in os.walk(path):
        for f in files:
            if f.endswith('.csv'):
                csv_files.append(os.path.join(root, f))
    
    with ThreadPoolExecutor() as executor:
        results = executor.map(process_csv, csv_files)
    
    return sum(results)

需要面试辅助服务？联系我们

📧 Email: [email protected]
📱 Phone: +86 17863968105

需要面试真题？立刻联系微信 Coding0201，获得真题。