Value alignment, Singularity, etc.¶
Understanding the alignment problem
- Section 1 of Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng, K. Y., Dai, J., Pan, X., O’Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., … Gao, W. (2023, October 30). AI Alignment: A Comprehensive Survey. arXiv.Org. https://arxiv.org/abs/2310.19852v2
Extreme risks and model evaluation
- Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J., Kokotajlo, D., Marchal, N., Anderljung, M., Kolt, N., Ho, L., Siddarth, D., Avin, S., Hawkins, W., Kim, B., Gabriel, I., Bolina, V., Clark, J., Bengio, Y., … Dafoe, A. (2023). Model evaluation for extreme risks (arXiv:2305.15324). arXiv. https://doi.org/10.48550/arXiv.2305.15324
Evaluating model alignment for assurances - Section 4 of Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng, K. Y., Dai, J., Pan, X., O’Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., … Gao, W. (2023, October 30). AI Alignment: A Comprehensive Survey. arXiv.Org. https://arxiv.org/abs/2310.19852v2
Singularity - Chalmers, D. J. (2010). The Singularity: A Philosophical Analysis. Journal of Consciousness Studies, 17(9–10), 9–10.